HYXT Blog

we produce valuable software for K12.
clock September 28, 2009 16:13 by author Sky Jia (贾超)

Just a quick reminder for myself

It is ok if something is complex so long as it is not complicated.

complex: composed of many interconnected parts; compound; composite

complicated: difficult to analyze or understand

Many problems require complexity to solve. Calculating the discounted value for a 30 year financial instrument using a predicted rate model with a monthly granularity requires a lot of work. You have to generate the rate model, calculate the cash flows from the instrument, then apply the discount to the flows. If this seems simple to you its because you understand the reasons behind each one of these steps.

I don't think any problem requires 'complicatedness' in order to be solved. This is like the reoccuring geek joke, "Well we could call the RateManager and then send the results in a JSON document in an email to the InstumentClass that then faxes the ... and finally a suite of monkeys types the result on your screen". Does your problem need that kind of solution?

Anyways, nothing new, just a note to me.


clock September 16, 2009 23:58 by author Sky Jia (贾超)

clock April 14, 2009 15:48 by author Sky Jia (贾超)

作者 Shane Hastie译者 金明 发布于 2009年4月12日 下午7时32分

在这个经济动荡的年代,越来越多的组织选择拥抱敏捷开发作为自己的生存战略。这吸引了大量学者对组织内团队迈向成功所该具备的态度和特征进行研究。业务敏捷(“察觉变化,并高效地响应变化”的能力)是非常重要的,但如何才能达到这种敏捷?

从纷繁冗杂的资料里面只挑选三个主题,我们可以发现价值观、激励以及极限面试(帮助挑选出合适的人选)的重要性。

价值观和道德观

Michele Sliger(The Software Project Manager's Bridge to Agility一书的合著者)认为敏捷是关于业务经营的道德观,通过关注如下八项道德准则从而使组织达到成功:

  1. 承诺 只做与交付业务价值相关的事情
  2. 专注 只做能交付业务价值的事情
  3. 开放 诚实展示项目的真实状态
  4. 沟通 与每个人交谈,及时回答问题,帮助团队成员协调工作
  5. 简单 目标明确,以最小代价交付最大价值,尽早交付价值
  6. 反馈 通过利益相关者的反馈让团队专注于交付中的价值
  7. 勇气 敢于作出决定,在价值交付逾越底线时敢于说不
  8. 尊重 尊重每个人以及直接团队之外的利益相关者,理解我们构建的产品的所有者,关心他们的需求

(细心的读者很容易发现以上这些正是极限编程的价值观,并且很好地契合了敏捷宣言的价值观和原则。)

餐馆老板的激励

Enthiosys上有一篇题为“厨师和敏捷的餐馆老板”的时讯比较了敏捷开发和餐饮的异同,进而讨论了业务敏捷的必要性。这篇文章提出了很多有用的对比,敏捷团队从中能大受裨益:

  • 只有在客户购买和使用我们的解决方案的时候才能创造利润,而不是在我们发布(解决方案)的时候。厨师拥有华丽的菜单并不是成功;只有人们进来点餐,才能算成功
  • 发布并不等同于利润。没协调好的厨房上错了菜式的顺序,只会让顾客不爽;只有准确地上菜才能让付钱的食客高兴,赢得回头客
  • 协调一致的发布能更快赢得利润。厨房诸项如果行云流水,则我们可以更快地摆放桌子,更快地赚到更多的钱。

挑选合适的人

如何挑选具备合适特征的人?CIO 杂志采访了两个(这方面的)先行者,他们应用“极限面试”过滤候选人,发现那些持有敏捷态度的候选人。他们关注于(候选人的)合作、创造性探索、学习态度以及团队技巧。

这种面试流程要求严格,可能会吓跑一些申请者,但它保证了加入团队的人适合团队并且拥有适当的技巧,这些对于团队的成功是非常必要的。个体胜于过程,而且雇佣合适的人提供了商业成功的最佳基础。

世上并无存在点金之术,可以保证在这个动荡时期生存下来并且获得成功。但越来越多的商业组织认识到敏捷态度和实践提供了一个框架让他们可以寄寓希望,以及可以快速响应不断变化的市场需求的工具。

查看英文原文http://www.infoq.com/news/2009/03/Achieving-Agility


clock April 14, 2009 13:54 by author Sky Jia (贾超)

According to the Cambridge Dictionary an apprentice is "someone who works for an expert to learn a particular skill or job". Merriam Webster says: "one who is learning by practical experience under skilled workers a trade, art, or calling". Uncle Bob Martin recently wrote about his experience with apprentices and what he considers key to progressing from apprentice to journeyman.

He describes two hypothetical apprentices: Sam, a developer who has apprenticed with the same master and had the same year fifteen years in a row. The other, Jasmine has changed jobs (and therefore masters) a number of times - growing her skills along the way. The following diagram illustrates the difference in their progress.

Bob’s point is that Sam, who has never changed masters, will always be a student and his growth is limited. Whereas Jasmine, who’s path has been varied,  really is a journeyman – travelling from master to master learning new things from each. Eventually Jasmine herself can become a master.

One commenter JMiller suggests that with a large enough company that you don’t need to leave your employer to change masters companies the size of Google or Microsoft, etc.

Corey Haines points out that while there are companies that are large enough to support Journeyman tours inside the company, none that he knows encourage it.

From her experience at Tektronix, Rebecca Wirfs-Brock remarks: “To me, moving around in the same company is roughly equivalent to changing employers, especially if the company is big enough…and I did several job shifts in my 13 years at Tektronix.”

Corey Haines is starting to have some ideas about how one transitions from apprentice to journeyman:

During the apprentice phase, a person is busy learning. They are practicing specific techniques, rigorously applying rules and procedures. Over time, having been influenced by many mentors, an apprentice starts to develop their own toolbox, the set of practices that they systematically apply. These practices form a basis for further development, a core that an apprentice can build upon.

Paul reports that in the UK companies use a similar approach hiring and training mechanical apprentices. After 6-12 months the apprenticeship is complete and people often move on somewhere else in the industry. Even though the company may not retain that person they benefit as they have a larger pool of well trained people to hire from in the future.


clock March 19, 2009 09:16 by author Sky Jia (贾超)

Posted by James Leigh on Mar 15, 2009 12:30 PM

The ACID properties are one of the cornerstones of database theory. ACID defines four properties that must be present if a database is considered reliable: Atomicity, Consistency, Isolation, and Durability. While all four properties are important, isolation in particular is interpreted with the most flexibility. Most databases provide a number of isolation levels to choose from, and many libraries today add additional layers which create even more fine-grained degrees of isolation. The main reason for this wide range of isolation levels is that relaxing isolation can often result in scalability and performance increases of several orders of magnitude.

Serializable consistency is one of the oldest and highest isolation levels that is generally available, and many choose it due to the simple programming model it provides - only one transaction can execute at a time against a given resource, and many potential sources of problems are removed. However, most applications (particularly web applications) cannot assume this very high level of isolation because it is impractical from the end user perspective - any application with a non-trivial number of users would quickly experience delays of several minutes accessing shared resources, which would rapidly reduce the number of users of that application back to a trivial number. Weak and eventual consistency are common in large distributed data sources such as the Web, and several very large and successful web-based applications (e.g. eBay and Amazon) have shown that optimistic weak consistency is much more scalable than traditional pessimistic mechanisms. This article takes a look at eight different isolation levels that you can use to potentially gain more performance and scalability in your applications by learning to relax data consistency constraints.

The main goal of concurrency control is to ensure transactions are isolated and do not interfere with one another. Higher degrees of isolation are achieved at the expense of potential performance gains. Concurrency control is implemented by a pessimistic or optimistic mechanism. Most relational databases, which are write-optimised, use a pessimistic mechanism. Pessimistic mechanisms use locks and may block operations or use some form of conflict detection. Pessimistic blocking is done when a table, page, or row has been modified, preventing other transactions from accessing potentially modified resources. However, optimistic mechanisms do not use any locks and rely solely on conflict detection to maintain transaction isolation. Conflict detection, as used by optimistic mechanisms, permits all read operations and verifies consistency at the end of the transaction. If a conflict is detected then the transaction is rolled back or repeated. Most web servers are read-optimised and thus use an optimistic mechanism. By permitting all read operations, optimistic mechanisms can achieve a higher read and write throughput while still preserving data consistency when resources are not continually changing.

The isolation levels listed below are here to help Web developers better understand the constraints placed on their programming models, and to engage system architects and developers in discussions to choose the most efficient isolation levels while maintaining necessary data consistency. They are listed from the least isolated (Read Uncommitted) to the most isolated (Serializability).

1 Read Uncommitted

Read uncommitted isolation level requires little isolation between transactions. Every read operation may see pending write operations from any transaction (dirty reads). However, committed write operations must have a serial order to prevent dirty writes. A pessimistic mechanism will block conflicting write operations until others are committed or rolled back. An optimistic mechanism will not lock and will allow everything to go through. If a connection is rolled back, all other connections that made subsequent modifications to the same data will also be rolled back. Shared caches are permitted in this level without validation. This isolation level is best used when transactions are not needed (such as a read-only dataset) or are modified with exclusive access to the database.

Example: An archive database that is only updated while offline, or an audit/logging table that is not used within a transaction

2 Read Committed

Read committed may read any committed state of the system and may be cached without validation (mixed states) as long as changes in the current connection are reflected in the results. Pessimistic mechanisms implement this as a Monotonic View. Optimistic transactions store all changes in isolation, making them only available to itself until committed. Read committed is implemented with an overly optimistic mechanism that delays writing all changes until the transaction is committed. This form of optimistic isolation permits complicated write operations without blocking read operations and has no validation schema. Shared caches are permitted only for committed states. This isolation level is best used when older values are permitted in results and transactions are only use for write operations.

Example: An online forum, when the absolute latest postings may not necessarily be shown and posts don't conflict with each other

3 Monotonic View

Monotonic view is an extension to read committed where transactions observes a monotonically increasing state of the database as it executes. In this level, a pessimistic transaction may be blocked during read operations if there is an outstanding write transaction. Optimistic transactions behave like read committed, keeping their changes in isolation, but validate their cache to ensure it is still valid. Periodically synchronized database clones are permitted in this level. This isolation level is best used when transactions are not needed or transactions only contain write operations.

Example: A user preference tables that are modified only by one person

4 Snapshot Reads

Snapshot Reads extends monotonic view and guarantees that query results reflect a consistent snapshot of the database. A pessimistic mechanism will block other write operations from affecting the results while they are being read. An optimistic mechanism will allow other write operations and inform the reading transaction if any of the results have changed and may roll it back. To implement an optimistic mechanism, a validation must be performed at the end of the read operation to detect if any concurrent write operations modified the result, and if so the result maybe repeated or rolled back. This validation may simply check if write operations occurred in the same table, or it might check the query results for the changes. This optimistic isolation level can detect conflicts easily and favours write operations, while permitting concurrent read operations. This level permits periodically synchronized database clones so long as they provide snapshot reads. It is best used when write operations are low or unexpected to conflict with concurrent read operations and when query results need to be consistent.

Example: A currency conversion or lookup table that is queried more often then it is modified and only the newest values are kept,

5 Cursor Stability

Cursor Stability isolation extends read committed and is the default isolation level of many relational databases. In this isolation level, a pessimistic transaction must indicate which records it will modify when reading them, if done in a separate statement. This is often done using 'FOR UPDATE' keywords appended to the end of a 'SELECT' query. In this case, other conflicting read or write pessimistic transactions will be blocked until the transaction is finished. An optimistic transaction tracks the version number of all modified records/entities to be verified when committed. This is the most popular optimistic isolation level and is provided by all major object-relational mapping libraries. In the Java Persistence API, this level can closely be achieved using FLUSH_ON_COMMIT (although queries may not reflect local changes), and if a conflict is detected an OptimisticLockException is thrown. This isolation can also be used with the HTTP headers If-Match or If-Unmodified-Since that compare a previous resource's version or time-stamp before updating. This level of isolation is best used for entities that are modified based on external information (not read from the database) and changes must not overwrite each other.

Example: A shared company directory or a wiki

6 Repeatable Read

Repeatable Read isolation extends cursor stability and guarantees that any data read within the transaction will not be modified or removed during the transaction. A pessimistic transaction will acquire read locks on all records and block other transactions from modifying them. An optimistic transaction will track all records or entities and verify they have not been modified when committed. This level of isolation is best used when entity states can affect other entities and transactions are made up of read and write operations.

Example: An order-tracking database, where values are read from one entity and used to compute values for other entities.

7 Snapshot Isolation

Snapshot isolation extends snapshot reads and repeatable read and guarantees that all read operations made in a transaction will see a consistent snapshot of the database. Any read operation performed in a transaction will have the same result regardless of whether it was performed earlier or later in the transaction. This differs from repeatable read isolation because it prevents phantom reads (range query results changing). This level is supported by many relational databases in the form of multi-version concurrency control (maybe called SERIALIZABLE), which is pessimistically implemented using a combination of locks and conflict detection. In this level, transactions must be prepared to be rolled back due to conflicts from either a pessimistic mechanism or an optimistic mechanism. A pessimistic mechanism will try to reduce the chances of a conflict by locking resources, but must merge changes when transactions are committed. An optimistic mechanism may also use a multi-version concurrency control, but would not block other transactions from engaging in potentially conflicting operations, instead it would roll back transactions that were found to conflict. This level of isolation is best used for transactions that read and modify multiple records.

Example: A workflow system, with rules based on the state of the system.

8 Serializability

Serializability is an extension of snapshot isolation that specifies all transactions must occur as if they had executed serially, one after the other. A pessimistic mechanism acquires range locks for all evaluated queries, preventing write operations from affecting these results. An optimistic mechanism tracks all evaluated queries and either uses a backwards validation scheme or a forwards validation scheme at the end of the transaction to detect if any concurrent write operations affect concurrent read operations, and if so, rolls back all but one of the conflicting transactions. In this isolation level, the apparent state of the system by any committed transaction will not have changed. This level of isolation is used for transactions that require complete data consistency.

Example: An accounting system that performs range queries to compute new values.

Summary

Below is a summary of the isolation levels outlined in this article, to help you find the level that is most appropriate for your application.

Types of possible collisions between transactions in different isolation levels:

  

Dirty Writes

Dirty Reads

Mixed states

Inconsistent reads

Overwrites

Non-repeatable

Phantom Reads

Inconsistency

Read Uncommitted

Not permitted

Permitted

Permitted

Permitted

Permitted

Permitted

Permitted

Permitted

Read Committed

Not permitted

Not permitted

Permitted

Permitted

Permitted

Permitted

Permitted

Permitted

Monotonic View

Not permitted

Not permitted

Not permitted

Permitted

Permitted

Permitted

Permitted

Permitted

Snapshot Reads

Not permitted

Not permitted

Not permitted

Not permitted

Permitted

Permitted

Permitted

Permitted

Cursor Stability

Not permitted

Not permitted

Permitted

Permitted

Not permitted

Permitted

Permitted

Permitted

Repeatable Reads

Not permitted

Not permitted

Permitted

Permitted

Not permitted

Not permitted

Permitted

Permitted

Snapshot Isolation

Not permitted

Not permitted

Not permitted

Not permitted

Not permitted

Not permitted

Not permitted

Permitted

Serializability

Not permitted

Not permitted

Not permitted

Not permitted

Not permitted

Not permitted

Not permitted

Not permitted

Optimistic requirements for different isolation levels:

  

Cache

Data Sync

Optimistic Conflict Scheme

Suggested Operations

Example

Read Uncommitted

Cache permitted

Sporadic

Detect dirty writes

No concurrent read and write

Archive

Read Committed

Cache permitted

Sporadic

No conflict detection

Monotonic read/write

Web Forum

Monotonic View

Must be validated

Periodic

No conflict detection

Combined reads

User preferences

Snapshot Reads

Must be validated

Periodic

Compare modifications to reads

Consistent reads

Lookup table

Cursor Stability

Cache permitted

Sporadic

Compare modified entity versions

CRUD services

Directory

Repeatable Reads

Cache permitted

Sporadic

Compare read entity versions

Read/write entities

Order tracking

Snapshot Isolation

Must be validated

Periodic

Compare read entity versions

Synchronized entities

Work-flow

Serializability

Must be validated

Full Sync

Compare queries with modifications

Complete data consistency

Accounting

Data consistency is vital in database applications -- it allows developers to make sense of data within a concurrent environment. Although strong consistency levels such as serializability provide a simple programming model, they can cause excess overhead, blocked operations, or transaction rollbacks and may be unnecessary for many applications. Being aware of other, potentially more appropriate isolation levels can help ensure that developers and system architects understand the data consistency needs, while balancing performance tradeoffs.


clock March 12, 2009 18:21 by author Sky Jia (贾超)

作者 章昱恒 发布于 200935上午1230

数据迁移是指在系统软件开发中,将具有实际业务价值的数据,依据功能需求或系统开发的要求,在不同存储媒介、存储形式或计算机系统之间转移的过程。

数据迁移是系统开发经常涉及到的一项工作。在企业级应用系统中,新系统的开发,新旧系统的升级换代,以及正常的系统维护,不可避免地涉及到大量的迁移工作。而在一个以数据为核心的业务系统中,数据的迁移更是无处不在。比如:在以数据仓库为架构原型的系统设计中,ETL(抽取,转换,装载)部分的实现就是一种数据迁移;对大型数据系统的分布式实施,数据迁移就是整个实施过程的主要部分。而在敏捷实践中,渐进式的数据库开发,更是涉及到大量的数据迁移和同步工作。

我们时常会听到用户提出这样的要求"我们并不过于关心应用的好坏,但需务必保证数据准确"。的确,在以数据为运营基础的行业里,数据质量本身就是软件质量的权重部分,尤其在电信、金融和控制领域里,这一特征表现的格外明显。数据迁移也是敏捷开发中相当重要的环节,它影响着各个发布版本的数据质量,而数据质量又决定着系统的有效性和可靠性,因此高质量地完成数据迁移不容忽视。

数据迁移往往被视为一件很简单的工作。在很多人眼里,数据迁移仅仅是用sql语句向相应数据表装载数据的过程。但在实际操作中,数据迁移涉及到很多层面的因素,如用户需求,系统功能,数据库建模等,若出现问题,将导致开发进展缓慢或质量不高。常见问题有业务系统逻辑模糊、脏数据、遗留系统的技术债和管理债等。那么如何有效的避免这些问题,提高迁移质量呢?

本文将以ThoughtWorks中国公司与客户合作的CRM项目为背景,为读者介绍如何在敏捷开发中高质量地处理数据迁移工作,从而在数据层面提高系统质量。

开发背景

A系统(旧系统)是客户原有的一套CRM(客户关系管理)系统。系统采用B/S 架构,使用sql server 2005做为后台数据库。旧系统的数据建模设计采用了高度范式化的设计思路,其目的是极度追求灵活性。业务数据被大量拆分并散布存储在上百张数据表里。数据表内和表之间不存在参照约束。大量的业务逻辑采用存储过程封装以提高效率。存储过程体系相当庞大,且存在复杂的相互调用。数据库中存在一些脏数据,可能是长期的使用、维护或误操作导致,但没人知道它们有多少,具体存在哪里。应用界面可用性不理想而且系统效率较低,用户常抱怨系统反映迟缓或无反应。数据库存储的业务数据约50G左右。

ThoughtWorks 团队将为客户提供一套新的CRM系统用以替换旧系统主要功能。新系统精简整理旧系统功能,并整合了客户的最新需求。在设计上做了巨大变更,以改善界面可用性,同时为了保障终端用户对系统服务的需要,新旧系统要求能够同时运行并实现数据同步,当终端用户全部过度到新系统后,终止旧系统。在这个过程中,DBA 团队需给予足够的数据保障。

以下为项目版本的发布图。

数据迁移开发方法

1. DBA需要制定目标并且管理自己的任务

尽管在每个迭代中,团队都会讨论决定如何组织'需求故事'story),但是DBA仍然需要有自己的'故事墙'story wall),并且花时间组织自己的story。在实际开发中,数据迁移仅仅是DBA工作的一部分,DBA还要完成相应的story开发和数据分析,有时还要给开发人员提供数据支持。混乱的管理会带来开发上的冲突。因此,有效管理任务是做好数据迁移的首要环节。

故事墙是管理这些任务的最好方法。尽管这个故事墙对客户提供的商业价值是间接的,但从整个团队角度来看,任何需要数据的人或程序都是DBA的用户,故事墙有利于管理每个story包含的数据需求,避免数据迁移任务与其它数据库开发任务之间的冲突,从而减少重复性工作或修复性工作。DBA有必要将这种方法引入到数据库开发中。

DBA要从商业价值角度决策数据迁移的需求。系统开发中,客户和开发员常常会向DBA提出自己的数据迁移要求,但往往这些要求并不具有全局性和决定性,毕竟他们仅仅是针对一个story的需要而提出。如果DBA盲目执行,将起到事倍功半的效果。DBA应当积极参加IPM(迭代计划会议。它是在每个迭代开始时的会议,全体成员共同讨论story计划完成数量)。无论是直接与用户交互,还是参与团队合作,DBA有必要将每个story内容了解的清清楚楚。通常,DBA可以不必像开发人员一样去了解story的开发细节,但通过与业务分析师和开发员的沟通,潜在的数据需求自然浮出水面。针对这些数据需求,通过再次组织并加以优先级,我们很容易回答这些问题:接下来应该完成的任务是什么?它的实际商业价值是什么?谁将需要它?什么时候需要?实践证明,多花些时间和团队或客户沟通是事半功倍的好方法,而且DBA通过了解业务数据可以给开发员更好的指导,减少开发员对数据的误解,有利于提高整体团队的开发效率。

通过对每个story的了解,我们总结并制定了针对当前发布版本需要的7个数据迁移story,并且确认了它们的确不存在任务上的重复,也邀请项目经理和客户一起确认了这份计划。如此我们的目标已经制定。

2. 思考实施策略

我们已经管理好所有数据迁移的任务,接下来考虑如何实现。通过以往的经验,我们发现如果没有仔细思考全局和细节问题而直接编写代码,带来的后果是无法控制的。我们应该首先充分了解这个过程可能存在的风险,然后决定采用什么样的策略,是否可以借助工具提高效率。这里的潜在风险主要包括:

2.1 数据质量

旧系统的数据库建模是一个高度范式化的结构,每个表之间存在相当大的依赖关系。一旦一个表存在脏数据,我们如何保证得到正确的查询结果?

2.2 对原有系统的了解

旧系统的应用程序引入了面向对象的设计方法,并且继承关系数据也被存储在若干张数据表里,如何正确区分这些业务对象和关系,保证在迁移过程中不会制造脏数据?

2.3 业务数据映射

旧系统和新系统之间存在着相当大的业务逻辑差异,我们是否能够将业务逻辑、数据映射到新系统?是否存在不可实现的转换?

在未充分了解这些问题之前,我们无法进一步制定计划,即时给予客户反馈是解决这些问题的最好方法。经过进一步沟通后,我们发现问题的复杂程度远远超过想象,尽管客户对旧系统非常了解,但他们对于某些数据也不能给出明确答案。鉴于这些情况,我们制定了初步的解决策略:

  1. 更多的了解旧系统,即时给予反馈。对于那些无法找到答案的问题,考虑是否可以寻求其它资源或忽略没有价值的数据。
  2. 尽量细化分割每一个复杂需求,形成多个任务。小粒度任务能够帮助暴露更多问题。
  3. 采用测试驱动,确保一套可靠的测试机制。
  4. 制定实现框架和阶段性目标。
  5. 不要过于乐观的估计进展,每一阶段要留有充分的单元测试。
  6. 调整每个迭代的内容,对有较强依赖关系的任务可以放在今后的迭代周期里。

3. 实施数据迁移

新系统的数据迁移包含两个部分:一次性数据迁移和数据同步迁移

一次性数据迁移

一次性数据迁移指仅仅发生在某一个发布版本上线安装时,新旧系统同时处于脱机状态,一部分数据将从旧系统中转移到新系统的过程。

数据同步迁移

数据同步迁移过程发生在新系统运行时,新旧系统同时处于工作状态,双方通过交换数据保证彼此数据的一致性。

同为数据迁移,但因两类迁移各具特点,因此在共同的处理方式上也略有不同。

  

一次性数据迁移

数据同步迁移

特点

  1. 数据量大。
  2. 使用频率低(一次性使用)。
  3. 转换逻辑复杂,需大量定制映射转换数据。

  

  1. 数据量小
  2. 使用频率高(以分钟为单位,周期性运行)。
  3. 转换逻辑复杂,少量定制映射转换数据。
  4. 需要事务处理以保证数据一致性

共同处理方式

  1. 细化任务。
  2. 测试驱动。
  3. 持续集成

不同处理方式

  1. 在执行测试驱动中,应侧重数据质量的测试。应依据不同环境的测试结果,增强测试体系。
  2. 工具选择。避免使用第三方工具,直接使用sql脚本以提高迁移效率。
  3. 保留中间处理结果
  1. 在执行测试驱动中,应侧重逻辑映射方面的测试。
  2. 工具选择。可考虑使用第三方工具,增强事务控制。
  3. 可不保留中间结果
  1. 细化任务

依据最初制定的开发策略,当我们遇到复杂的迁移需求时,首先分解每个需求为若干个模块,然后画出整体结构图。以下是某一处数据迁移脚本的模块分割:

最初由于这个部分的迁移逻辑过于复杂,以至于客户对它的处理结果没有信心。但当共同完成这个图表后,大家一致认为它没有像想象中的困难。总而言之,立刻解决一个复杂的问题很困难,但解决其中一个小问题却很容易。

  1. 测试驱动

如同编写程序代码一样,我们不仅为实现数据迁移脚本使用了测试驱动,还引入了针对数据库设计的一些方法。在程序设计中,当代码本身结构良好,单元(类、方法)之间关系清晰,可以直接添加单元测试。现在,我们有了很好的脚本逻辑结构,可以很容易添加每一步结果的单元测试,这就如同形成了一道安全网,保证异常数据出现时,能够立即发现并加以处理。在实际编写迁移脚本之前,应首先明确测试内容,准备好测试脚本。

测试内容包括:

  • 应产生的符合期望的数据

基于给予的原始测试数据,这一测试过程测试脚本的数据转换逻辑是否正确。以下举例说明:

测试环境:旧系统中存在某个名为'Jason'的客户信息,他的personId 1000101

测试目的:当某一客户的信息迁移到新系统的CUSTOMERS表后,新系统应该存在该客户信息。

新系统上要运行的测试代码:

DECLARE @personName NVARCHAR(250),

  

SELECT

@personName = personName

FROM

CUSTOMERS

WHERE

personId = 1000101

 

IF (@personName <> 'Jason') or (@personName is NULL)

BEGIN

INSERT INTO LoadTestErrorLog (errorDescription)

VALUES ('personName for personId 1000101 is not Jason')

END

Go

 

这里常用的原则是:一段sql语句仅用来测试一处期望数据,这样可以减少代码之间的相互依赖性,更准确的定位错误数据。

  • 不应当产生的异常数据

异常数据指在迁移过程中出现的不符合逻辑的数据。理论上讲,迁移过程不应当出现异常数据,然而现实情况中,迁移结果总会出现我们不需要的数据。其原因包括数据源出现异常、实现过程中的误操作、系统应用的bug等。总而言之,为了保证这些错误不会出现在最终结果,相应的测试脚本必不可少,也是防止问题进一步扩大的有效举措。这一测试过程常被用来发现在生产环境中可能出现的问题。以下举例如何测试异常数据:

测试环境:全部或部分生产环境数据

测试目的:将某个客户的信息迁移到新系统的CUSTOMERS表后,数据表不应该具有顾客名字为空的记录,如果出现将视为迁移过程的错误。

新系统上要运行的测试代码:

DECLARE @isExistPersonNameWithNULL INTEGER

  

SELECT

@isExistPersonNameWithNULL = count(*)

FROM

CUSTOMERS

where personName is null

  

IF (@isExistPersonNameWithNULL> 0)

BEGIN

INSERT INTO LoadTestErrorLog (errorDescription)

VALUES ('personName doesn't contain legal information')

END

  

Go

  • 数据表的数据量是否符合期望

当数据被迁移至新系统后,应当确保迁移数据量符合应期望值。实现方法多种多样,较简单的方法是直接比较数据迁移前后的数据记录数是否在数值上相等。以下举例说明:

测试环境:全部或部分生产环境数据。

测试目的:客户数据被迁移后,应当确保客户数据没有丢失。

新系统上要运行的测试代码:

DECLARE @NumberofCustomerinOldDB INTEGER

DECLARE @NumberofCustomerinNewDB INTEGER

  

SELECT

@NumberofCustomerinOldDB = count(*)

FROM

oldDB.dbo.persons -- 这是在旧系统中定义的客户表

...

--省略复杂的过滤逻辑

SELECT

@NumberofCustomerinNewDB = count(*)

FROM

newdb.dbo.CUSTOMERS -- 这是在新系统中定义的客户表

where personName is null 

 

IF (@NumberofCustomerinOldDB<>@NumberofCustomerinNewDB )

BEGIN

INSERT INTO LoadTestErrorLog (errorDescription)

VALUES ('not all customers are migrated ')

END

Go

 

最终当把测试sql代码片段组装在一起后,我们获得了一批测试脚本,并按照以下流程,通过使用NANT工具实现自动化:

NANT中的实现方法:

<target name="-init " … />

该任务负责初始化测试环境

<target name="-parseDbScripts " … />

该任务负责编译并部署迁移脚本

<target name="-resetTestData " … />

该任务负责重置测试数据

<target name="-executeMigrationScripts " … />

该任务负责执行迁移脚本

<target name="-testMigration " … />

该任务负责执行迁移测试脚本

<target name="testDataMigration" depends="-init, -parseDbScripts, -resetTestData, executeMigrationScripts, -testMigration" />

该任务将成为持续集成调用的入口

  1. 持续集成

为完成持续集成测试,测试沙盒必不可少。"沙盒"是一个完整的功能环境,在这里脚本能够被编译,测试和运行。

  • 在开发沙盒中,我们准备了少量的核心数据,用以测试sql脚本的质量。
  • 在系统级集成测试沙盒中,我们还准备了一个小型数据库,这个数据库包含了一部分核心数据,着重测试数据迁移过程的逻辑转换。
  • 在生产环境级测试沙盒中,由于数据库来源于实际数据备份,因此数据处于不断变化状态,这就更需要不断运行测试脚本,避免脏数据和数据丢失。由于生产环境数 据量相对大了许多,我们可以适当减少测试次数以减少对开发资源的消耗。同时,其它测试脚本,如变更数据库结构的脚本,都可以和数据迁移脚本组织在一起,一 次性完成测试。

    同样,我们采用自动化机制维护这些开发测试沙盒。

    将测试置于持续集成环境中,下图是处于持续集成环境的测试任务。

  1. 工具选择

选择数据迁移工具应当以帮助提高工作效率和数据迁移运行效率为原则。通常最直接的方法是编写sql脚本,借助其它工具也能起到很好的效果,比如MS SSIS等。然而我们发现,过多的引入第三方工具往往带来的麻烦也多,例如,我们不得不花时间来学习这些工具的某些特殊用法,有时工具也会产生bug,以至于不得不再花时间解决这些bug,而这与最初的开发目标相背离。因此,有效的方法是尽量使用sql脚本执行所有的迁移工作,同时也得到了最佳的执行效率。

  1. 保留中间结果用于脚本调试

相比设计语言,Sql语句较难调试,即使有些数据库产品提供了调试工具,但是调试数据结果集仍然是项挑战性的工作。尤其在旧系统到新系统的迁移过程中,业务逻辑发生巨大变化,客户经常要求提供某些证据,来解释他们对数据迁移结果的怀疑。保留中间环节数据,不仅方便调试,也方便数据追溯,为开发带来更高效率。以下举例说明:

SELECT

...

into debug_allpersonhistroy

FROM

oldDB.dbo.personhistory -- 这是在旧系统中定义的业务存储表

...

--省略复杂的过滤逻辑

select column1...columnN

into debug_allpersonhistroy_aftermapping --保留这一步数据集合

from debug_allpersonhistroy inner join mappingtableBtwOldandNew

...

--省略复杂的过滤逻辑

SELECT

...

FROM

newdb.dbo.contactHistory -- 这是在新系统中定义的业务存储表

...

--省略复杂的过滤逻辑

Go

典型问题

数据迁移在不同的场景往往出现不同的问题,单凭经验也不能全部解决。运用头脑风暴,集中团队中所有力量思考所有可能出现的问题并加以避免。有时开发员遇到的问题也帮助DBA少走弯路。最终,头脑风暴能够提供我们的是一份有价值的列表,里面包含各种问题和注意事项:

  1. 一致性检查

一致性检查包括:字符编码检查、语言设置、环境参数设置等。

迁移过程常出问题的是字符集,它带来的问题是数据乱码。不同系统在最初设计时应用的字符集或编码格式未必相同。在迁移过程中,单凭缺省设置是不够不安全的。有效的办法是在项目伊始,即确认系统间环境一致性。在新系统中采用兼容性的unicode编码也能够解决这些问题。

  1. 控制NULL的使用

由于旧系统本身很少使用约束,以至于在表连接查询中出现大量无法得到正确匹配的数据。在 sql中,当我们试图使用自然连接,我们发现某些数据丢失了,如果使用外连接,这将会带来一种新的脏数据:NULL。从数据库设计角度,NULL不代表任何含义,而实际情况中,很多数据库建模往往给NULL赋予含义,甚至多种含义,以至于不同的查询需求要视不同的业务逻辑对待。在旧系统里,这种现象比比皆是,无疑给迁移带来了不少麻烦。

解决方法:不为NULL赋予逻辑上的定义。尽量少使用外连接运算。

例如:

旧系统定义如下父子结构表:

objectId, parentObjectId,objectType …

------------------------------------

Null Null 'root'

1 Null 'contactManager'

2 1 'contact'

3 1 'contact'

4 Null 'orgnisation1'

 

显然,系统希望构建如下对象树图:

然而,当程序试图遍历所有对象时发现:NULL无法参与计算。因为NULL与任何数据的计算结果都是NULL。程序必须增加额外代码来处理特殊情况。

  1. 代码复用,降低依赖性

迁移脚本应当遵循与编码同样的规则,高内聚,低耦合,能够被重复利用的代码需尽量被封装成单元,重复拷贝并不是迁移脚本应当采用的方法。

解决方法:使用临时存储过程实现某些公用代码的复用,简化调用接口。

  1. 新问题,新测试

当我们遇到新的问题时,常忙于解决问题,给出解释。然而当这一切完成后,并不意味着问题已经全部被解决。因为这些问题仍然可能再次发生,也说明目前测试不足。

解决方法:当新问题出现后,暂停当前的工作,立刻针对这种情况写出测试。为其花费些时间意味着不会让技术问题债台高垒。

例如:在新系统的数据库里,QA发现了一组不符合逻辑的数据:记录的结束时间(EndTimestamp)早于开始时间(startTimeStamp)8个小时。它的实际期望结果是:记录的结束时间必须晚于开始时间。

ID startTimestamp, EndTimestamp, createDate …

-----------------------------------------------------

11020011 2008-12-14 09:23:00 2008-12-14 01:23:00 2008-12-14 09:23:00

 

显然程序在插入数据时用错了时区。在bug被修复之前,立刻加入一个数据库测试以保障今后不会再次出现。

测试代码如下:

DECLARE @CNT INTEGER

  

select @CNT=COUNT(*) from tableA where startTimestamp> EndTimestamp

IF @CNT>0

BEGIN

INSERT INTO LoadTestErrorLog (errorDescription)

VALUES (' EndTimestamp should be late than startTimestamp ')

END

GO

  1. 目标制定者和开发者应该保留的心态

数据迁移是一件看似简单但具有挑战的工作。因此,我们常常过于乐观估计开发效率。然而这里的风险在于我们仅仅看到了处理逻辑,而没有看清楚数据质量,以至于盲目写出的迁移脚本可以在测试环境中工作,但无法在生产环境中运行。

解决方法:无论多么简单的数据迁移,应首先与客户或业务分析师沟通业务逻辑,确保对数据质量的了解。

结论

数据迁移是一项看似简单却蕴含巨大挑战的工作。它不仅包含了具体技术问题,而且要求DBA具有较好的沟通能力,深入的了解业务逻辑。通过旧系统到新系统的数据迁移工作,我们逐渐地将精益软件设计思想深入到细节,并且取得了很好的效果。当数据迁移完成后,我们完成了近6000行的迁移脚本,迁移结果通过了客户方的抽样测试,最终确保了整个系统的正常运行。


clock March 10, 2009 18:11 by author Sky Jia (贾超)

SELECT

      TABLE_NAME,

      COLUMN_NAME,

      IDENT_SEED(TABLE_NAME) AS SEED,

      IDENT_INCR(TABLE_NAME) AS INCR,

      IDENT_CURRENT(TABLE_NAME) AS [CURRENT MAX]

FROM INFORMATION_SCHEMA.columns

WHERE COLUMNPROPERTY(OBJECT_ID(TABLE_NAME),COLUMN_NAME,'IsIdentity')=1


clock February 4, 2009 19:01 by author Sky Jia (贾超)

ViewState is a very misunderstood animal. I would like to help put an end to the madness by attempting to explain exactly how the ViewState mechanism works, from beginning to end, and from many different use cases, such as declared controls vs. dynamic controls.

There are a lot of great articles out there that try to dispel the myths about ViewState. You might say this is like beating a dead horse (where ViewState is the horse, and the internet is the assailant). But this horse isn't dead, let me tell you. No, he's very much alive and he's stampeding through your living room. We need to beat him down once again. Don't worry, no horses were harmed during the authoring of this article.

It's not that there's no good information out there about ViewState, it's just all of them seem to be lacking something, and that is contributing to the community's overall confusion about ViewState. For example, one of the key features that is important to understand about ViewState is how it tracks dirtiness. Yet, here is a very good, in-depth article on ViewState that doesn't even mention it! Then there's this W3Schools article on ViewState that seems to indicate that posted form values are maintained via ViewState, but that's not true. (Don't believe me? Disable ViewState on that textbox in their example and run it again). And it's the #1 Google Search Result for "ASP.NET ViewState". Here is ASP.NET Documentation on MSDN that describes how Controls maintain state across postbacks. The documentation isn't wrong per say, but it makes a statement that isn't entirely correct:

"If a control uses ViewState for property data instead of a private field, that property automatically will be persisted across round trips to the client."

That seems to imply that anything you shove into the ViewState StateBag will be round-tripped in the client's browser. NOT TRUE! So it's really no wonder there is so much confusion on ViewState. There is no where I've found on the internet that has a 100% complete and accurate explanation of how it works! The best article I have ever found is this one by Scott Mitchell. That one should be required reading. However, it does not explain the relationship of controls and their child controls when it comes to initialization and ViewState Tracking, and it is this point alone that causes a bulk of the mishandlings of ViewState, at least in the experiences I've had.

So the point of this article will be to first give a complete understanding of how ViewState basically functions, from beginning to end, hopefully filling in the holes that many other articles have. After a complete explanation of the entire ViewState process, I will go into some examples of how developers typically misuse ViewState, usually without even realizing it, and how to fix it. I should also preface this with the fact that I wrote this article with ASP.NET 1.x in mind. However, there are very few differences in the ViewState mechanism in ASP.NET 2.0. For one, ControlState is a new type of ViewState in ASP.NET 2.0, but it treated exactly like ViewState, so we can safely ignore it for the purposes of this article.

First let me explain why I think understanding ViewState to it's core is so important:

    MISUNDERSTANDING OF VIEWSTATE WILL LEAD TO...
  1. Leaking sensitive data
  2. ViewState Attacks - aka the Jedi Mind Trick -- *waves hand* that plasma tv is for sale for $1.00
  3. Poor performance - even to the point of NO PERFORMANCE
  4. Poor scalability - how many users can you handle if each is posting 50k of data every request?
  5. Overall poor design
  6. Headache, nausea, dizziness, and irreversible frilling of the eyebrows.
If you develop an ASP.NET Application and you don't take ViewState seriously, this could happen to you:
ViewState Madness!!! Drop your red bull and surrender your cpu cycles. You will be frustrated. Performance is futile!
The ViewState form field. ViewState will add your web app's distinctiveness to it's own. Performance is futile.
I could go on but that is the gist of it. Now lets move on by starting back from the beginning:
    WHAT DOES VIEWSTATE DO?
    This is a list of ViewState's main jobs. Each of these jobs serves a very distinct purpose. Next we'll learn exactly how it fulfills those jobs.
  1. Stores values per control by key name, like a Hashtable
  2. Tracks changes to a ViewState value's initial state
  3. Serializes and Deserializes saved data into a hidden form field on the client
  4. Automatically restores ViewState data on postbacks
Even more important than understanding what it does, is understanding what it does NOT do:
    WHAT DOESN'T VIEWSTATE DO?
  1. Automatically retain state of class variables (private, protected, or public)
  2. Remember any state information across page loads (only postbacks) (that is unless you customize how the data is persisted)
  3. Remove the need to repopulate data on every request
  4. ViewState is not responsible for the population of values that are posted such as by TextBox controls (although it does play an important role)
  5. Make you coffee
While ViewState does have one overall purpose in the ASP.NET Framework, it's four main roles in the page lifecycle are quite distinct from each other. Logically, we can separate them and try to understand them individually. It is often the mishmash of information on ViewState that confuses people. Hopefully this breaks it down into more bite size nuggets. Mmmm... ViewState Nuggets.
ViewState Nuggets

1. VIEWSTATE STORES VALUES
If you've ever used a hashtable, then you've got it. There's no rocket science here. ViewState has an indexer on it that accepts a string as the key and any object as the value. For example: 

ViewState["Key1"] = 123.45M; // store a decimal value
ViewState["Key2"] = "abc"; // store a string
ViewState["Key3"] = DateTime.Now; // store a DateTime

Actually, "ViewState" is just a name. ViewState is a protected property defined on the System.Web.UI.Control class, from which all server controls, user controls, and pages, derive from. The type of the property is System.Web.UI.StateBag. Strictly speaking, the StateBag class has nothing to do with ASP.NET. It happens to be defined in the System.Web assembly, but other than it's dependency on the State Formatter, also defined in System.Web.UI, there's no reason why the StateBag class couldn't live along side ArrayList in the System.Collections namespace. In practice, Server Controls utilize ViewState as the backing store for most, if not all their properties. This is true of almost all Microsoft's built in controls (ie, label, textbox, button). This is important! You must understand this about controls you are using. Read that sentance again. I mean it... here it is a 3rd time: SERVER CONTROLS UTILIZE VIEWSTATE AS THE BACKING STORE FOR MOST, IF NOT ALL THEIR PROPERTIES. Depending on your background, when you think of a traditional property, you might imagine something like this: 

public string Text {
    get { return _text; }
    set { _text = value; }
}

What is important to know here is that this is NOT what most properties on ASP.NET controls look like. Instead, they use the ViewState StateBag, not a private instance variable, as their backing store: 

public string Text {
    get { return (string)ViewState["Text"]; }
    set { ViewState["Text"] = value; }
}

And I can't stress it enough -- this is true of almost ALL PROPERTIES, even STYLES (actually, Styles do it by implementing IStateManager, but essentially they do it the same way). When writing your own controls it would usually be a good idea to follow this pattern, but thought should first be put into what should and shouldn't be allowed to be dynamically changed on postbacks. But I digress -- that's a different subject. It is also important to understand how DEFAULT VALUES are implemented using this technique. When you think of a property that has a default value, in the traditional sense, you might imagine something like the following: 

public class MyClass {
    private string _text = "Default Value!";
 
    public string Text {
        get { return _text; }
        set { _text = value; }
    }
}

The default value is the default because it is what is returned by the property if no one ever sets it. How can we accomplish this when ViewState is being used as the private backing? Like this: 

public string Text {
    get {
        return ViewState["Text"] == null ?
             "Default Value!" :
              (string)ViewState["Text"];
    }
    set { ViewState["Text"] = value; }
}

Like a hashtable, the StateBag will return null as the value behind a key if it simply doesn't contain an entry with that key. So if the value is null, it has not been set, so return the default value, otherwise return whatever the value is. For you die-hards out there -- you may have detected a difference in these two implementations. In the case of ViewState backing, setting the property to NULL will result in resetting the property back to it's default value. With a "regular" property, setting it to null means it will simply be null. Well, that is just one reason why ASP.NET always tends to use String.Empty ("") instead of null. It's also not very important to the built in controls because basically all of their properties that can be null already are null by default. All I can say is keep this in mind if you write your own controls. And finally, as a footnote really, while this property-backing usage of the ViewState StateBag is how the StateBag is typically used, it isn't limited to just that. As a control or page, you can access you're own ViewState StateBag at any time for any reason, not just in a property. It is sometimes useful to do so in order to remember certain pieces of data across postbacks, but that too is another subject.

2. VIEWSTATE TRACKS CHANGES
Have you ever set a property on a control and then somehow felt... dirty? I sure have. In fact, after a twelve-hour day of setting properties in the office, I become so filthy my wife refuses to kiss me unless I'm holding flowers to mask the stench. I swear! Ok so setting properties doesn't really make you dirty. But it does make the entry in the StateBag dirty! The StateBag isn't just a dumb collection of keys and values like a Hashtable (please don't tell Hashtable I said that, he's scarey). In addition to storing values by key name, the StateBag has a TRACKING ability. Tracking is either on, or off. Tracking can be turned on by calling TrackViewState(), but once on, it cannot be turned off. When tracking is ON, and ONLY when tracking is ON, any changes to any of the StateBag's values will cause that item to be marked as "Dirty". StateBag even has a method you can use to detect if an item is dirty, aptly named IsItemDirty(string key). You can also manually cause an item to be considered dirty by calling SetItemDirty(string key). To illustrate, lets assume we have a StateBag that is not currently tracking: 

stateBag.IsItemDirty("key"); // returns false
stateBag["key"] = "abc";
stateBag.IsItemDirty("key"); // still returns false
 
stateBag["key"] = "def";
stateBag.IsItemDirty("key"); // STILL returns false
 
stateBag.TrackViewState();
stateBag.IsItemDirty("key"); // yup still returns false
 
stateBag["key"] = "ghi";
stateBag.IsItemDirty("key"); // TRUE!
 
stateBag.SetItemDirty("key", false);
stateBag.IsItemDirty("key"); // FALSE!

Basically, tracking allows the StateBag to keep track of which of it's values have been changed since TrackViewState() has been called. Values that are assigned before tracking is enabled are not tracked (StateBag turns a blind eye). It is important to know that any assignment will mark the item as dirty -- even if the value given matches the value it already has!

stateBag["key"] = "abc";
stateBag.IsItemDirty("key"); // returns false
stateBag.TrackViewState();
stateBag["key"] = "abc";
stateBag.IsItemDirty("key"); // returns true

ViewState could have been written to compare the new and old values before deciding if the item should be dirty. But recall that ViewState allows any object to be the value, so you aren't talking about a simple string comparison, and the object doesn't have to implement IComparable so you're not talking about a simple CompareTo either. Alas, because serialization and deserialization will be occuring, an instance you put into ViewState won't be the same instance any longer after a postback. That kind of comparison is not important for ViewState to do it's job, so it doesn't. So that's tracking in a nutshell.

But you might wonder why StateBag would need this ability in the first place. Why on earth would anyone need to know only changes since TrackViewState() is called? Why wouldn't they just utilize the entire collection of items? This one point seems to be at the core of all the confusion on ViewState. I have interviewed many professionals, sometimes with years and years of ASP.NET experience logged in their resumes, who have failed miserably to prove to me that they understand this point. Actually, I have never interviewed a single candidate who has! First, to truly understand why Tracking is needed, you will need to understand a little bit about how ASP.NET sets up declarative controls. Declarative controls are controls that are defined in your ASPX or ASCX form. Here: 

<asp:Label id="lbl1" runat="server" Text="Hello World" />

I do declare that this label is declared on your form. The next thing we need to make sure you understand is ASP.NET's ability to wire up declared attributes to control properties. When ASP.NET parses the form, and finds a tag with runat=server, it creates an instance of the specified control. The variable name it assigns the instance to is based on the ID you assigned it (by the way, many don't realize that you don't have to give a control an ID at all, ASP.NET will use an automatically generated ID. Not specifying an ID has advantages, but that is a different subject). But that's not all it does. The control's tag may contain a bunch of attributes on it. In our label example up above, we have a "Text" attribute, and it's value is "Hello World". Using reflection, ASP.NET is able to detect whether the control has property by that name, and if so, sets its value to the declared value. Obviously the attribute is declared as a string (hey, its stored in a text file after all), so if the property it maps to isn't of type string, it must figure out how to convert the given string into the correct type, before calling the property setter. How it does that my friend is also an entirely different topic (it involves TypeConverters and static Parse methods). Suffice it to say it figures it out, and calls the property setter with the converted value.

Recall that all-important statement from the first role of the StateBag. Here it is again: Server Controls utilize ViewState as the backing store for most, if not all their properties. That means when you declare an attribute on a server control, that value is usually ultimately stored as an entry in that control's ViewState StateBag. Now recall how tracking works. Remember that if the StateBag is "tracking", then setting a value to it will mark that item as dirty. If it isn't tracking, it won't be marked dirty. So the question is -- when ASP.NET calls the SET on the PROPERTY that corresponds to the ATTRIBUTE that is DECLARED on the control, is the StateBag TRACKING or isn't it? The answer is no it is not tracking, because tracking doesn't begin until someone calls TrackViewState() on the StateBag, and ASP.NET does that during the OnInit phase of the page/control lifecycle. This little trick ASP.NET uses to populate properties allows it to easily detect the difference between a declaratively set value and dynamically set value. If you don't yet realize why that is important, please keep reading.

3. SERIALIZATION AND DESERIALIZATION 
Aside from how ASP.NET creates declarative controls, the first two capabilities of ViewState we've discussed so far have been strictly related to the StateBag class (how it's similar to a hashtable, and how it tracks dirty values). Here is where things get bigger. Now we will have to start talking about how ASP.NET uses the ViewState StateBag's features to make the (black) magic of ViewState happen.

If you've ever done a "View Source" on an ASP.NET page, you've no doubt encountered the serialization of ViewState. You probably already knew that ViewState is stored in a hidden form field aptly named _ViewState as a base64 encoded string, because when anyone explains how ViewState works, that's usually the first thing they mention.

A brief aside -- before we understand how ASP.NET comes up with this single encoded string, we must understand the hierarchy of controls on the page. Many developers with years of experience still don't realize that a page consists of a tree of controls, because all they work on are ASPX pages, and all they need to worry about are controls that are directly declared on those pages... but controls can contain child controls, which can contain their own child controls, etc. This forms a tree of controls, where the ASPX page itself is the root of that tree. The 2nd level is all the controls declared at the top level in the ASPX page (usually that consists of just 3 controls -- a literal control to represent the content before the form tag, a HtmlForm control to represent the form and all its child controls, and another literal control to represent all the content after the close form tag). On the 3rd level are all the controls contained within those controls (ie, controls that are declared within the form tag), and so on and so forth. Each one of the controls in the tree has it's very own ViewState -- it's very own instance of a StateBag. There's a protected method defined on the System.Web.UI.Control class called SaveViewState. It returns type 'object'. The implementation for Control.SaveViewState is to simply pass the call along to the Control's StateBag (it too has a SaveViewState() method). By calling this method recursively on every control in the control tree, ASP.NET is able to build another tree that is structured not unlike the control tree itself, except instead of a tree of controls, it is a tree of data.

The data at this point is not yet converted into the string you see in the hidden form field, it's just an object tree of the data to be saved. Here is where it finally comes together... are you ready? When the StateBag is asked to save and return it's state (StateBag.SaveViewState()), it only does so for the items contained within it that are marked as Dirty. That is why StateBag has the tracking feature. That is the only reason why it has it. And oh what a good reason it is -- StateBag could just process every single item stored within it, but why should data that has not been changed from it's natural, declarative state be persisted? There's no reason for it to be -- it will be restored on the next request when ASP.NET reparses the page anyway (actually it only parses it once, building a compiled class that does the work from then on). Despite this smart optimization employed by ASP.NET, unnecessary data is still persisted into ViewState all the time due to misuse. I will get into examples that demonstrate these types of mistakes later on.

POP QUIZ
If you've read this far, congratulations, I am rewarding you with a pop quiz. Aren't I nice? Here it is: Let's say you have two nearly identical ASPX forms: Page1.aspx and Page2.aspx. Contained within each page is just a form tag and a label, like so: 

<form id="form1" runat="server">
    <asp:Label id="label1" runat="server" Text="" />
</form>

They are identical except for one minor difference. In Page1.aspx, we shall declare the label's text to be "abc": 

<asp:Label id="label1" runat="server" Text="abc" />

...And on Page2.aspx, we shall declare the label's text to be something much longer (the preamble to the Constitution of the United States of America): 

<asp:Label id="label1" runat="server" Text="We the people of the United States,
        in order to form a more perfect union, establish justice, insure
        domestic tranquility, provide for the common defense, promote the
        general welfare, and secure the blessings of liberty to ourselves and
        our posterity, do ordain and establish this Constitution for the United
        States of America." />


Imagine you browse to Page1.aspx, you will see "abc" and nothing more. Then you use your browser to view the HTML source of the page. You will see the infamous hidden _ViewState hidden field with encoded data in it. Note the size of that string. Now you browse to Page2.aspx, and you see the preamble. You use your browser to view the HTML source once again, and you note the size of the encoded _ViewState field. The question is: Are the two sizes you noted the same, or are they different? Before we get to the answer, lets make it a little bit more involved. Lets say you also put a button next to the label (on each page): 

<asp:Button id="button1" runat="server" Text="Postback" />


There is no code in the click event handler for this button, so clicking on it doesn't do anything except make the page flicker. With this new button in place, you repeat the experiment, except this time when browsing to each page, you click the Postback button before looking at the HTML source. So the question is once again... Are the encoded ViewState values the same, or different? The correct answer to the first part the question is THEY ARE THE SAME! They are the same because in neither of the two ViewState strings are any data related to the label at all. If you understand ViewState, this is obvious. The Text property of the label is set to the declared value before it's ViewState is being tracked. That means if you were to check the dirty flag of the Text item in the StateBag, it would not be marked dirty. StateBag ignores items that aren't dirty when SaveViewState() is called, and it is the object it returns that is serialized into the hidden _ViewState field. Therefore, the text property is not serialized. Since the Text is not serialized in either case, and the forms are identical in every other way, the sizes of the encoded ViewStates on each page must be the same. In fact, no matter how large or small of a string you stuff into that text attribute, the size will remain the same.

The correct answer to the second part is again, THEY ARE THE SAME! In order for data to be serialized, it must be marked as dirty. In order to be marked as dirty, it's value must be set after TrackViewState() is called. But even when we perform a postback, ASP.NET recreates and populates the server controls in the same way. The Text property is still set to it's declared value just like it was in the first request. No other code is setting the text property, so there's no way the StateBag item could become dirty, even on a postback. Therefore, the sizes of the encoded ViewStates on each page after a postback must be the same.

So now we understand how ASP.NET determines what data needs to be serialized. But we don't know how it is serialized. That topic is outside the scope of this article (are you missing an assembly reference?), so if you're really interested in how it works, read up on the LosFormatter for ASP.NET 1.x or the ObjectStateFormatter for ASP.NET 2.0.

Finally on this topic is DESERIALIZATION. Obviously all this fancy dirty tracking and serialization wouldn't do any good if we couldn't get the data back again. That too is outside the scope of this article, but suffice it to say the process is just the reverse. ASP.NET rebuilds the object tree it serialized by reading the posted _ViewState form value and deserializing it with the LosFormatter (v1.x) or the ObjectStateFormatter (v2.0).

4. AUTOMATICALLY RESTORES DATA
This is last on our list of ViewState features. It is tempting to tie this feature in with Deserialization above, but it is not really part of that process. ASP.NET deserializes the ViewState data, and THEN it repopulates the controls with that data. Many articles out there confuse these two processes.

Defined on System.Web.UI.Control (again, the class that every control, user control, and page derive from) is a LoadViewState() method which accepts an object as parameter. This is the opposite of the SaveViewState() method we already discussed, which returns an object. Like SaveViewState(), LoadViewState() simple forwards the call on to it's StateBag object, calling LoadViewState on it. The StateBag then simply repopulates it's key/object collection with the data in the object. In case you are wondering, the object it is given is a System.Web.UI.Pair class, which is just a simple type with a First and Second field on it. The "First" field is an ArrayList of key names, and the "Second" field is an ArrayList of values. So StateBag just iterates over the lists, calling this.Add(key, value) for each item. The important thing to realize here is that the data it was given via LoadViewState() are only items that where marked dirty on the previous request. Prior to loading the ViewState items, the StateBag may already have values in it. Those values may be from declarative properties like we discussed, but they may also be values that were explicitly set by the developer prior to the LoadViewState call. If one of the items passed into LoadViewState already exists in the StateBag for some reason, it will be overwritten.

That right there is the magic of automatic state management. When the page first begins to load during a postback (even prior to initialization), all the properties are set to their declared natural defaults. Then OnInit occurs. During the OnInit phase, ASP.NET calls TrackViewState() on all the StateBags. Then LoadViewState() is called with the deserialized data that was dirty from the previous request. The StateBag calls Add(key, value) for each of those items. Since the StateBag is tracking at this point, the value is marked dirty, so that it may be persisted once again for the next postback. Brilliant! Whew. Now you are an expert on ViewState management.

IMPROPER USE OF VIEWSTATE
Now that we know exactly how ViewState works, we can finally begin to understand the problems that arise when it is used improperly. In this section I will describe cases that illustrate how a lot of ASP.NET developers misuse ViewState. But these aren't just obvious mistakes. Some of these will illustrate nuances about ViewState that will give you an even deeper understanding of how it all fits together. 
    CASES OF MISUSE
  1. Forcing a Default
  2. Persisting static data
  3. Persisting cheap data
  4. Initializing child controls programmatically
  5. Initializing dynamically created controls programmatically
1. Forcing a Default
This is one of the most common misuses, and it is also the easiest to fix. The fixed code is also usually more compact than the wrong code. Yes, doing things the right way can lead to less code. Imagine that. This usually occurs when a control developer wants a particular property to have a particular default value, and does not understand the dirty tracking mechanism, or doesn't care. For example, lets say the Text property a control is supposed to be some value that comes from a Session variable. Developer Joe writes the following code: 

public class JoesControl : WebControl {
    public string Text {
        get { return this.ViewState["Text"] as string; }
        set { this.ViewState["Text"] = value; }
    }
 
    protected override void OnLoad(EventArgs args) {
        if(!this.IsPostback) {
            this.Text = Session["SomeSessionKey"] as string;
        }
 
        base.OnLoad(e);
    }
}

This developer has committed a ViewState crime, call the ViewState police! There's two big problems with this approach. First of all, since Joe is developing a control, and he has taken the time to create a public Text property, it stands to reason that Joe may want developers that use his control to be able to set the Text property to something else. Jane is a page developer that is attempting to do just that, like so: 

<abc:JoesControl id="joe1" runat="server" Text="ViewState rocks!" />

Jane is going to have a bad day. No matter what Jane puts into that Text attribute, Joe's control will refuse to listen to her. Poor Jane. She's using this control just like you use every other ASP.NET control, but this one works differently. Joe's control is overwriting Jane's Text value! Worse than that, since Joe sets it during the OnLoad phase, it is marked dirty in ViewState. So to add insult to injury, Jane is now incurring an increase in her page's serialized ViewState size for doing nothing more than putting Joe's Control on her page. I guess Joe doesn't like Jane very much. Maybe Joe's just trying to get back at Jane for something. Well, since we all know which sex rules this world, we can assume Jane ends up getting Joe to fix his control. Much to Jane's delight, this is what Joe comes up with: 

public class JoesControl : WebControl {
    public string Text {
        get {
            return this.ViewState["Text"] == null ?
                Session["SomeSessionKey"] :
                this.ViewState["Text"] as string;
        }
        set { this.ViewState["Text"] = value; }
    }
}

Look at how much less code we have here. Joe doesn't even have to override OnLoad. Because the StateBag returns null if the given key does not exist, Joe can detect whether his Text property has been set already by checking for null. If it is, he can safely return his would-be default value. If it's not null, he happily returns whatever value it is. Simple as can be. Now when Jane uses Joe's control, not only is her Text attribute honored, she no longer incurs a hit on her ViewState size, either. Better behavior. Better performance. Less code. Everyone wins!

2. Persisting static data
By Static, I mean data that never changes or is not expected to change during the lifetime of a page, or even during the users session. Lets say Joe, our would-be shoddy asp.net developer, has been tasked with adding the current user's name to the top of a page in the company's eCommerce application. It's a nice way of telling the user, "hey, we know who you are!" It makes them feel special and that the site is working. Positive feedback. Lets say this eCommerce application has a business layer API that allows Joe to easily get the name of the currently authenticated user: CurrentUser.Name. Joe completes his task: 

(ShoppingCart.aspx)
<asp:Label id="lblUserName" runat="server" />


(ShoppingCart.aspx.cs)
protected override void OnLoad(EventArgs args) {
    this.lblUserName.Text = CurrentUser.Name;
    base.OnLoad(e);
}

Sure enough, the current user's name will show up. Piece of cake, Joe thinks. Of course we know Joe has committed another sin. The label control he is using is tracking it's ViewState when it is assigned the current user's name. That means not only will the Label render the user name, but that user name will be encoded into the ViewState hidden form field. Why make ASP.NET go through all the work of serializing and deserializing the user name, when you are just going to reassign it after all? That's just rude! Even when confronted, Joe shrugs at the problem. It's only a few bytes! But it's a few bytes you can save so easily. There are two solutions to this problem. First... you could just disable ViewState on the label. 

<asp:Label id="lblUserName" runat="server" EnableViewState="false" />

Problem solved. But there's an even better solution. Label has to be one of the most overused controls there are, bested only by the Panel control. It comes from the mindset of Visual Basic Programmers. To show text on a VB Form you needed a label. Labels are supposed to be the ASP.NET WebForm counterpart, so it's only nature to think you need a label to display a text value that isn't hardcoded in the webform. Fair enough, but that's not true. Label's render a <span> tag around their text content. You must ask yourself whether your really need this span tag at all. Unless you are applying a STYLE to this label, the answer is NO. This would suffice just fine: 

<%= CurrentUser.Name %>

Not only do you get to avoid having to declare a label on the form (which means less code, albeit designer-generated code), but you've followed the spirit of the code-behind model: separation of code from design! In fact, if Joe's company had a dedicated designer responsible for the look and feel of the eCommerce site, Joe could have simply passed on the task. "That's the designers job", he could have balked, and rightfully so. There is another reason why you might THINK you need a Label, and that is when something in the code behind may need to programmatically access or manipulate that label. Ok, fair enough. But you must still ask yourself whether you really need a SPAN tag surrounding the text. Introducing the most underused control in ASP.NET: The LITERAL! 

<asp:Literal id="litUserName" runat="server" EnableViewState="false"/>

No span tag here.

3. Persisting cheap data
This one is a superset of #2. Static data is definitely cheap to get. But not all cheap data is static. Sometimes we have data that may change during the lifetime of an application, possibly from moment to moment, but that data is virtually free to retrieve. By free, I mean the performance cost of looking it up is insignificant. A common instance of this mistake is when populating a dropdown list of U.S. States. Unless you are writing a web application that you plan on warping back in time to December 7, 1787 (here), the list of US States is not going to change any time soon. However, as a programmer that hates to type, you certainly wouldn't want to have to type these states by hand into your web form. And in the event a state does rebel (we can only dream... you know who you are), you wouldn't want to have to perform a code change to strike it from the list. Our proverbial programmer Joe decides he will populate his dropdown list from a USSTATES table in a database. The eCommerce site is already using a database, so its trivial for him to add the table and query it. 

<asp:DropdownList id="lstStates" runat="server"
    DataTextField="StateName" DataValueField="StateCode" />

protected override void OnLoad(EventArgs args) {
    if(!this.IsPostback) {
        this.lstStates.DataSource = QueryDatabase();
        this.lstStates.DataBind();
    }
    base.OnLoad(e);
}

As is the nature of databound controls in ASP.NET, the state dropdown will be using ViewState to remember it's databound list of list items. At the time of this ranting, there are a total of 50 US States. Not only does the dropdown list contain a ListItem for each and every state, but each and every one of those states and their state codes are being serialized into the encoded ViewState. That's a lot of data to be stuffing down the pipe every time the page loads, especially over a dialup connection. I often wonder what it would be like if I explained to my grandmother the reason why her internet is so slow is because her computer is telling the server what all the US States are. I don't think she'd understand. She'd probably just start explaining how when she was young, there were only 46 states. Too bad... those extra 4 states are really wearing down your bandwidth. Damn you late comers! You know who you are!

Like the problem with static data, the general solution to this problem is to just disable ViewState on the control. Unfortunately, that is not always going to work. Whether it does depends on the nature of the control you are binding, and what features of it you are dependant on. In this example, if Joe simply added EnableViewState="false" to the dropdown, and removed the if(!this.IsPostback) condition, he would successfully remove the state data from ViewState, but he would immediately run into a troubling problem. The dropdown will no longer restore it's selected item on postbacks. WAIT!!! This is another source of confusion with ViewState. The reason the dropdown fails to remember it's selected item on postbacks is NOT because you have disabled ViewState on it. Postback controls such as dropdownlist and textbox restore their posted state (the selected item of a dropdown ist 'posted') even when ViewState is disabled, because even with ViewState disabled the control is still able to post its value. It forgets it's selected value because you are rebinding it in OnLoad, which is after the dropdown has already loaded it's posted value. When you databind it again, the first thing it does is throw that into the bit bucket (you know, digital trash). That means if a user selects California from the list, then click on a submit button, the dropdown will stubbornly return the default item (the first item if you don't specify it otherwise). Thankfully, there is an easy solution: Move the DataBind into OnInit: 

<asp:DropdownList id="lstStates" runat="server"
    DataTextField="StateName" DataValueField="StateCode" EnableViewState="false" />

protected override void OnInit(EventArgs args) {
    this.lstStates.DataSource = QueryDatabase();
    this.lstStates.DataBind();
    base.OnInit(e);
}

The short explanation for why this works: You are populating the dropdown list with items before it attempts to load it's posted value. Now the dropdown will behave just like it did when Joe first designed it, only the rather large list of states will NOT be persisted into the ViewState hidden field! Brilliant! More importantly, this rule applies to any data that is cheap and easy to get to. You might argue that making a database query on every request is MORE costly than persisting the data through ViewState. In this case I believe you'd be wrong. Modern database systems (say, SQL Server) have sophisticated caching mechanism and are extremely efficient if configured correctly. The state list needs to be repopulated on every request no matter what you're doing. All you've done is change it from being pushed and pulled down a slow, unreliable 56kbps internet connection that may have to travel for thousands of miles, to being pulled over at worse a 10 megabit LAN connection a couple hundred feet between your internet server and database server. AND if you really wanted to improve things, you could cache the results of the database query in the application. You do the math!

4. Initializing child controls programmatically
Let's face it. You can't do everything declaratively. Sometimes you have to get logic involved. That's why we all have jobs, right? The trouble is ASP.NET does not provide an easy way to programmatically initialize properties of child controls correctly. You can override OnLoad and do it there -- but then you're persisting data that probably doesn't need to be persisted into ViewState. You can override OnInit and do it there instead, but that suffers from the same problem. Remember when we learned how ASP.NET calls TrackViewState() during the OnInit phase? It does this recursively on the entire control tree, but it does it from the BOTTOM of the tree UP! In other words, as a control or webform, the OnInit phase of your child controls occurs BEFORE your own. A control will begin tracking ViewState changes in this phase, which means by the time your own OnInit phase begins, your child controls ViewState are all already tracking! Lets say Joe would like to display the current date and time in a label declared on the form. 

<asp:Label id="lblDate" runat="server" />

protected override void OnInit(EventArgs args) {
    this.lblDate.Text = DateTime.Now.ToString("MM/dd/yyyy HH:mm:ss");
    base.OnInit(e);
}

Even though Joe is setting the label text in the earliest event possible on his webform, it's already too late. The label is tracking ViewState changes, and the current date and time will inevitably be persisted into ViewState. This particular example could fall under the cheap data issue above. Joe could simply disable ViewState on the label to solve this problem. But here we are going to solve it a different way in order to illustrate an important concept. What would be nice is if Joe could declaratively set the label text to what he wants, something like: 

<asp:Label id="Label1" runat="server" Text="<%= DateTime.Now.ToString() %>" />

You may have intuitively attempted this before. But ASP.NET will slap you in the face for it. The "<%= %>" syntax can not be used to assign values to properties of server-side controls. Joe could use the "<%# %>" syntax instead, but that isn't very different than the databinding method we've already covered (disabling ViewState and databinding it every request). The problem is we would like to be able to assign a value through code, but allow the control to continue to work in exactly the way it normally would. Perhaps some code is going to be manipulating this label, and we want any changes made to it to be persisted through ViewState like they normally would be. For example, maybe Joe wants to give the users a way to remove the date display from the form, replacing it with a blank date instead: 

private void cmdRemoveDate_Click(object sender, EventArgs args) {
    this.lblDate.Text = "--/--/---- --:--:--";
}

If the user clicks this button, the current date and time will vanish. But if we solved our original ViewState problem by disabling ViewState on the label, the date and time will magically reappear again on the next postback that occurs, because the label's ViewState being disabled means it will not automatically be restored. That's not good. What on Earth is Joe supposed to do now?

What we really want is to declaratively set a value that is based on logic, not static. If it were declared the label could continue to work like it normally does -- the initial state wouldn't be persisted since it is set before ViewState is tracking, and changes to it would be persisted in ViewState. Like I said... ASP.NET does not provide an easy way to accomplish this task. For you ASP.NET 2.0 developers out there, you do have the $ sign syntax, which allows you to use expression builders to declare values that actually come from a dynamic source (ie, resources, declared connection strings). There's no expression builder for "just run this code" so I don't think that helps you either (UPDATE: Unless you use my customCodeExpressionBuilder!). Also for ASP.NET 2.0 developers, there's OnPreInit. That is actually a great place to initialize child control properties programmatically because it occurs before the child control's OnInit (and therefore before it is tracking ViewState) and after the controls are created. However, OnPreInit is not recursive like the other control phase methods are. That means it is only accessible on the PAGE itself. That doesn't help you what-so-ever if you are developing a CONTROL. It's too bad OnPreInit isn't recursive just like OnInit, OnLoad, and OnPreRender are, I don't see a reason for the inconsistency. The root of the problem is simply that we need to be able to assign the Text property of the label BEFORE it begins tracking its ViewState. We already know the page's OnInit event (the first event that occurs in the page) is already too late for that. So what if we could somehow hook into the Init event of the label? You can't add an event handler in code for that, because the soonest you can do it is in your OnInit which is after the source event has already occurred. And you can't do it in the constructor for the page, because declared controls are not yet created at that point. There are two possibilities:

1. Declaratively hook into the Init event: 
<asp:Label id="Label2" runat="server" OnInit="lblDate_Init" />

This works because the OnInit attribute is processed before the label's own OnInit event occurs, giving us an opportunity to manipulate it before it beings tracking ViewState changes. Our event handler would set its text.

2. Create a custom control: 
public class DateTimeLabel : Label {
    public DateTimeLabel() {
        this.Text = DateTime.Now.ToString("MM/dd/yyyy HH:mm:ss");
    }
}

Then instead of a regular label on the form, a DateTimeLabel is used. Since the control is initializing it's own state, it can do so before tracking begins. It does it during the constructor if possible, so that a declared value will be honored.

5. Initializing dynamically created controls programmatically
This is the same problem as before, but since you are in more control of the situation, it is much easier to solve. Lets say Joe has written a custom control that at some point is dynamically creating a Label. 

public class JoesCustomControl : Control {
    protected override void CreateChildControls() {
        Label l = new Label();
 
        this.Controls.Add(l);
        l.Text = "Joe's label!";
    }
}

Hmmm. When do dynamically created controls begin tracking ViewState? You can create and add dynamically created controls to your controls collection at almost any time during the page lifecycle, but ASP.NET uses the OnInit phase to start ViewState tracking. Won't our dynamic label miss out on that event? No. The trick is, Controls.Add() isn't just a simple collection add request. It does much more. As soon as a dynamic control is added to the control collection of a control that is rooted in the page (if you follow its parent controls eventually you get to the page), ASP.NET plays "catch up" with the event sequence in that control and any controls it contains. So let's say you add a control dynamically in the OnPreRender event (although there plenty of reasons why you would not want to do that). At that point, your OnInit, LoadViewState, LoadPostBackData, and OnLoad events have transpired. The second the control enters your control collection, all of these events happen within the control.

That means my friends the dynamic control is tracking ViewState immediately after you add it. Besides your constructor, the earliest you can add dynamic controls is in OnInit, where child controls are already tracking ViewState. In Joe's control, he's adding them in the CreateChildControls() method, which ASP.NET calls whenever it needs to make sure child controls exist (when it is called can vary based on whether you are an INamingContainer, whether it is a postback, and whether anything else calls EnsureChildControl()). The latest this can happen is OnPreRender, but if it happens any time after or during OnInit, you will be dirtying ViewState again, Joe. The solution is simple but easy to miss: 

public class JoesCustomControl : Control {
    protected override void CreateChildControls() {
        Label l = new Label();
        l.Text = "Joe's label!";
 
        this.Controls.Add(l);
    }
}

Subtle. Instead of initializing the label's text after adding it to the control collection, Joe initializes it before it is added. This ensures without a doubt that the Label is not tracking ViewState when it is initialized. Actually you can use this trick to do more than just initialize simple properties. You can databind controls even before they are part of the control tree. Remember our US State dropdown list example? If we can create that dropdown list dynamically, we can solve that problem without even disabling its ViewState: 

public class JoesCustomControl : Control {
    protected override void OnInit(EventArgs args) {
        DropDownList states = new DropDownList();
        states.DataSource = this.GetUSStatesFromDatabase();
        states.DataBind();
 
        this.Controls.Add(states);
    }
}


It works amazingly well. The dropdown list will behave as if the states are simply built-in list items. They are not persisted in ViewState, yet ViewState is still enabled on the control, meaning you can still take advantage of its ViewState dependant features like the OnSelectedIndexChanged event. You can even do this with DataGrids, although that depends on how you are using it (you will run into problems if you are using sorting, paging, or using the SelectedIndex feature).

BE VIEWSTATE FRIENDLY
Now that you have a complete understanding of how ViewState does it's magic, and how it interacts with the page lifecycle in asp.net, it should be easy to be ViewState Friendly! That is the key really... ViewState optimization is easy as pie when you understand what is going on, often times resulting in even less code than the non-friendly code. Have any suggestions, comments, error reports? Please leave a comment or send me an email!


clock February 4, 2009 18:22 by author Sky Jia (贾超)

The software/enterprise architect job is an important one. The duties of an architect are numerous and require specific leadership, communication and technical skills to be fulfilled.

In a recent post, Gabriel Morgan wrote about the qualities of an enterprise software architect starting from Daniel Goleman's Emotional Intelligence (EI) abilities: Self-Awareness, Self-Management, Social Awareness and Relationship Management.

Self-Awareness

  • Emotional self-awareness
  • Accurate self-assessment

Self-Management

  • Self-control
  • Transparency
  • Adaptability
  • Achievement
  • Initiative
  • Optimism

Social Awareness

  •     Empathy
  •     Organizational awareness
  •     Service

Relationship Management

  •     Inspiration
  •     Influence
  •     Developing others
  •     Change catalyst
  •     Conflict management
  •     Teamwork and collaboration

The Software Engineering Institute has collected a large number of opinions regarding the duties, the skills and the knowledge of a software architect as seen by various software engineers. A few of the opinions regarding an architect’s required skills are:

David Cornish (Technical Architect, JPMorgan, London, UK):

Strong communication with both technical and business teams 
Strong design experience and technical knowledge 
Analytical and 'joined-up' thinking 
Conflict resolution

Theo Gantos (Consultant, TEKA, Flint, MI, USA):

A renaissance person. Consulting, diplomacy, organization, conceptualization, abstract thinking, logical reasoning, data modeling skills in several methodologies, ability to self-evaluate and adapt quickly, presentation and communication skills, programming expertise, writing skills, sales skills, charisma, finance and return on investment calculation skills, dealing with difficult and change-resistant people, sense of humor.

Venkatesh Krishnamurthy (Technical Architect, Valtech India, Bangalore, KA, India):

  • Creative
  • An Artist
  • Politician
  • Strong willed
  • Excellent communication skills
  • Excellent presentation skills
  • People person
  • Matured
  • Articulative
  • Courageous to make decisions and stand by it
  • Risk taker
  • Good observer
  • Negotiator

Victor Alejandro Baez Puente (Chief Technology Officer, Grupo Nacional Provincial, Mexico City, DF, Mexico):

  • Experience designing an enterprise application with financial auditing, contract management, enterprise workflow, business process integration, and perhaps asset management components
  • Experience with Service Oriented Architecture (SOA).
  • Experience as a chief architect on inception-to-delivery of J2EE projects.
  • Experience with deploying J2EE rich and/or web client applications in a high-availability, clustered environment
  • Expertise in the Unified Modeling Language (UML) for constructing, and documenting the artifacts of software systems
  • Exemplary general IT knowledge (applications development, testing, deployment, operations, documentation, standards, best practices, security, hardware, networking, OS, DBMS, middleware, etc.)
  • Expertise and experience in lightweight, rapid development, agile methodologies.
  • Experience in estimating and measuring project velocity
  • Experience with interaction with legacy systems and phased application integration
  • Exquisite attention to detail
  • Written, verbal, and diagrammatic communication skills

The examples are numerous. Some put an emphasis on leadership/communicator skills while others take specific technical skills into account. What is your opinion on the skills required of a software/enterprise architect?


clock January 20, 2009 09:46 by author Sky Jia (贾超)

Traditionally, software release is considered to be a handshake between engineering and business. Engineering passes on the tested code to business, which in turn promotes it to the market, thereby completing the cycle. However, with Agile, software release could be bucketed into two categories of internal and external releases. This helps in creating a loose coupling between the two. Internal releases are made by engineering and business has the option of using one of them as an external release.

In a recent article on the Cutter Consortium (download code RELEASEMYTH), Israel Gat of BMC makes an interesting argument for separating the “two” releases in the software world. According to him the internal and external release should be viewed as two faces of the same coin,

A body of code that delivers certain features and functionalities is one thing. The use of this body of code by marketing and sales to accomplish business results is quite another. Not only do the two activities differ, but they do not necessarily need to be tied together through a 1-to-1 relationship.

He gave an interesting metaphor example of a water pool with two pipes, one for inlet and the other for outlet. He compared engineering to the inlet pipe and business to the outlet pipe.

Think of the in-pipe in this example as engineering and the out-pipe as the business. Engineering can post releases at its own pace. The business can selectively choose from the posted releases. In this paradigm, marketing is not obligated to promote a release upon its completion. Marketing might do so in three months; it might choose to promote the current release with another release due at a later time; it might choose to make a release available on a limited basis; or it might choose never to promote a release.

Israel mentioned that since engineering is now loosely coupled with business, they can move towards a fluid release concept in which the software becomes alive and continuous. Engineering can churn out internal releases at a pace suitable to them and business can make a decision on which release gets to the customer as an external release and when.

Commenting on the article, Ryan provided some additional insights that Israel’s team ran three internal releases to one external release. He suggested that the benefit is to get valuable feedback and business can market the external release better. According to Ryan,

It worked great! As a result, I coach most agile teams to start by making sure their "internal release" cadence is twice as fast at marketing, operations and the market is used to. In this way you get a release where you can gain feedback and steer the "external release" to market better.

According to Israel, with Agile, frequent and faster internal releases make the software more alive and fluid. This renders the traditional release process obsolete. The separation of releases helps both engineering and business to work according to their release patterns without disturbing the release frequency of each other.


Search

Calendar

<<  March 2010  >>
SuMoTuWeThFrSa
28123456
78910111213
14151617181920
21222324252627
28293031123
45678910

Categories

Tags