HYXT Blog

we produce valuable software for K12.
clock April 14, 2009 13:54 by author Sky Jia (贾超)

According to the Cambridge Dictionary an apprentice is "someone who works for an expert to learn a particular skill or job". Merriam Webster says: "one who is learning by practical experience under skilled workers a trade, art, or calling". Uncle Bob Martin recently wrote about his experience with apprentices and what he considers key to progressing from apprentice to journeyman.

He describes two hypothetical apprentices: Sam, a developer who has apprenticed with the same master and had the same year fifteen years in a row. The other, Jasmine has changed jobs (and therefore masters) a number of times - growing her skills along the way. The following diagram illustrates the difference in their progress.

Bob’s point is that Sam, who has never changed masters, will always be a student and his growth is limited. Whereas Jasmine, who’s path has been varied,  really is a journeyman – travelling from master to master learning new things from each. Eventually Jasmine herself can become a master.

One commenter JMiller suggests that with a large enough company that you don’t need to leave your employer to change masters companies the size of Google or Microsoft, etc.

Corey Haines points out that while there are companies that are large enough to support Journeyman tours inside the company, none that he knows encourage it.

From her experience at Tektronix, Rebecca Wirfs-Brock remarks: “To me, moving around in the same company is roughly equivalent to changing employers, especially if the company is big enough…and I did several job shifts in my 13 years at Tektronix.”

Corey Haines is starting to have some ideas about how one transitions from apprentice to journeyman:

During the apprentice phase, a person is busy learning. They are practicing specific techniques, rigorously applying rules and procedures. Over time, having been influenced by many mentors, an apprentice starts to develop their own toolbox, the set of practices that they systematically apply. These practices form a basis for further development, a core that an apprentice can build upon.

Paul reports that in the UK companies use a similar approach hiring and training mechanical apprentices. After 6-12 months the apprenticeship is complete and people often move on somewhere else in the industry. Even though the company may not retain that person they benefit as they have a larger pool of well trained people to hire from in the future.


clock March 25, 2009 17:27 by author ethan
Subscribing to Content with Web Slices New for Windows Internet Explorer 8

Today, many Web sites provide content updates through Really Simple Syndication (RSS) news feeds. This requires a Web site to duplicate some content as a special XML file, called a feed, that a news reader application can download and check for updates. In contrast, a Web Slice enables users to subscribe to content directly within a Web page; a separate feed file is not required. Users monitor content changes and view the updated portion of the Web page directly from the Favorites bar (the improved Links toolbar) of Internet Explorer.

This article is organized into the following sections:

Key Points

  • A Web Slice uses simple HTML markup to represent a clipping of a Web page. When a user subscribes to a Web Slice, a portion of the Web page is added to the Favorites bar.
  • The Web Slice or the user can specify how frequently Internet Explorer will check for updates from the Web server. The smallest update interval is 15 minutes.
  • To detect a Web Slice on a Web page, Internet Explorer scans for elements that are of class hslice, have an id property, and contain at least one child element with an entry-title class name.
  • All elements marked as entry-content are combined and shown in the Web Slice preview window when the Web Slice button is clicked. To access the full Web page, click the Open button Cc196992.webslice_open(en-us,VS.85).gif in the preview window.
  • The Web page's author can control how Web Slices are discovered and the order of their appearance on the Feed Discovery button. Cc196992.webslice_discovery_button(en-us,VS.85).gif

What is a Web Slice?

The Web Slice is based on the hAtom Microformat World Wide Web link, with a few additional properties. The Web Slice itself uses simple, semantic HTML markup to represent a portion of Web page that can be subscribed to. Annotations can be applied directly to content within the HTML page; additional files can be used for optimization, but are optional.

A Web Slice has four major features:

Discovery

Note  For Internet Explorer to detect a Web Slice, the Web page must be served from a Web server.

Typically, users discover a Web Slice by moving the mouse pointer over it; this is called in-document discovery.

Web Slice shortcut button screenshot.

Figure 1: A shortcut button appears when the mouse pointer is moved over a Web Slice.

Each Web Slice that Internet Explorer detects is added as an entry to the Feed Discovery button Cc196992.webslice_discovery_button(en-us,VS.85).gif, located on the Command bar. Although a Web page can include up to 100 Web Slices, only 20 entries can appear at one time on the Feed Discovery button. You can customize which Web Slice appears at the top of the list by Setting the Default Web Slice.

Feed Discovery button screenshot.

Figure 2: The Feed Discovery button also detects Web Slices.

In-document discovery only applies to Web Slices in the topmost document. Discovery and subscription of Web Slices is disabled for documents within a frameset; however, publishers may provide their own interface for subscribing. For more information, see Customizing the User Experience.

If a user subscribes to a Web Slice, Internet Explorer adds it to the Favorites bar.

Update

To be notified of updates, a user must first add a Web Slice to the Favorites bar. Upon subscription, the Feed Download Engine will periodically update the Web Slice using information from from the Web site. If Internet Explorer detects a change, it will notify the user by making the text of the Web Slice button bold.

By default, Internet Explorer uses the suggested update interval specified by the Web Slice publisher in the ttl property element, if such element exists; otherwise, updates follow the default feed interval, normally every 24 hours. Users can either adjust the update frequency inside the Web Slice Properties dialog box, or update the cached Web Slice at any time by clicking the Refresh button Cc196992.webslice_refresh(en-us,VS.85).gif in the preview window.

Preview

To view a Web Slice, click its button on the Favorites bar. The preview window, also called the flyout, displays the locally cached content of the Web Slice, or an Alternative Display Source, if one has been defined by the publisher.

Note  When testing Web Slices on an intranet, you might need to disable Compatibility View to ensure that the Web Slice is rendered as it will appear on the Web. To do this, open the Page menu, click Compatibility View Settings, and clear the Display Intranet Sites in Compatibility View option button.

Navigation

To quickly access the full Web page, click the Open button Cc196992.webslice_open(en-us,VS.85).gif in the preview window. By default, Internet Explorer returns to where a particular Web Slice was discovered; the publisher can modify this behavior through the bookmark property.

The remainder of this article provides more details about these four features.

Discovering Web Slices

To detect a Web Slice on a Web page, Internet Explorer scans for elements that have certain class names and attributes.

The following properties are required:

hslice property

The hslice element describes a section of a Web page to which the browser can subscribe. The hslice element contains all of the required and optional properties of a Web Slice.

<div class="hslice" id="score"> 

In contrast to traditional style classes, Web Slice class names do not require a CSS rule in the style sheet. Using a space-delimited list, you can combine Web Slice class names with those of other classes.

<div class="hslice forecast" id="weather"> 

id property

In a manner similar to how a named anchor is used, the id attribute is used to identify the Web Slice in a URL. The id attribute must be present on the same element that uses the hslice class.

The id must be unique: if two or more Web Slices on a Web page have the same id value, Internet Explorer will only recognize the first one. If the id value changes, Internet Explorer will no longer be able to identify the Web Slice.

entry-title property

At least one child of hslice must use the entry-title class name. The value of this property maps to the Web Slice button text and the advertised name on the Feed Discovery button. Cc196992.webslice_discovery_button(en-us,VS.85).gif

<div class="hslice" id="main">    <h2 class="entry-title">Seattle Weather</h2>    ...</div> 

It is possible to combine multiple text spans to create a dynamic title that is updated with the Web Slice. The entry-title class can be applied to more than one element; Internet Explorer will concatenate the values in the order in which they appear in HTML.

In the following example, Internet Explorer displays "Seattle Weather 62°" when the Web Slice is updated.

<div class="hslice" id="main">    <h2 class="entry-title">Seattle Weather</h2>    <p>It is currently <span class="entry-title">62&deg;</span>.</p></div>  

Web Slice titles are limited to 128 characters.

Controlling Updates and Notification

To check for updates, Internet Explorer compares the Web Slice against the cached version. The Windows RSS Platform updates feeds and Web Slices on a regular basis: by default, every 24 hours. However, if the Web Slice includes a ttl property, the Feed Download Engine will use the specified value to adjust its update schedule. The frequency can also be adjusted in the Web Slice Properties dialog box.

The following Web Slice elements are used to control the duration and frequency of update checks:

ttl property

The time to live (TTL) value is the number of minutes during which a Web Slice is considered to be up to date. Unless the user has changed the update interval, the Feed Download Engine will consider this value to be the suggested update frequency. For more information, see Understanding the Feed Download Engine and FSS_SUGGESTED.

<p>Updates occur every <span class="ttl">60</span> minutes.</p>

Alternatively, an abbr object with a title attribute can be used to define the TTL. Using this method, the inner text of the object can be applied towards creating a user-friendly representation of time, as shown in the following example:

<p>Updates occur every <abbr class="ttl" title="60">hour</abbr>.</p> 

The minimum interval used by Internet Explorer to check for updates is 15 minutes.

endtime property

Expiration time for a Web Slice. The date and time is specified in the title attribute, leaving the inner text of the element for a user-friendly representation of time, if needed.

<abbr class="endtime" title="2008-07-25T17:30:00-07:00">(friendly date format)</abbr> 

Time values can be expressed using the ISO 8601 international standard for Date and Time on the Internet: Timestamps (RFC3339) World Wide Web link. The date-and-time string follows the common format: YYYY-MM-DDThh:mm:ss[Z|(+|-)hh:mm].

  • YYYY - 4-digit year
  • MM - month, 01 through 12
  • DD - day, 01 through 31
  • "T" - date and time separator
  • hh - hour, 00 through 23
  • mm - minute, 00 through 59
  • ss - second, 00 through 59 (optional)
  • One of the following:
    • "Z" - UTC/GMT ("zero")
    • ("+"|"-")hh:mm - hour and minute of time zone offset

Note that the time zone offset is the actual time difference from UTC and does not automatically account for daylight saving time. For example, the time zone offset for Seattle is "-08:00" in the winter (Pacific Standard Time) and "-07:00" in the summer (Pacific Daylight Time). For best results, convert local time values to UTC.

You can also use the RFC822 date format (with 4-digit years) to include information about daylight saving time. The following encodes the same time as the preceding ISO 8601 example.

<abbr class="endtime" title="25 Jul 2008 17:30:00 PST">(friendly date format)</abbr>  

Alternative Update Source

When a user subscribes to a basic Web Slice, Internet Explorer returns to the Web page to check for updates.

Basic Web Slice graphic.

Figure 3: Basic Web Slice handles both discovery and update.

Returning to the original Web page to update a single Web Slice might be inefficient, especially if the Web page is large. For better control of network use, a Web Slice can declare an alternative source for updates.

a rel=feedurl

Any Web Slice link that specifies a rel attribute of feedurl is treated as an alternative update source. When an alternative source is present, Internet Explorer will subscribe to the alternative source and no longer use the original Web page content to update the Web Slice.

Alternative update source graphic.

Figure 4: Specifying an alternative source for Web Slice updates.

In the figure above, the two Web pages represent the same Web Slice. The user discovers and subscribes to the Web Slice on the original Web page, Basic.html. However, the source of the Web Slice indicates that an alternative update source should be used, as shown in the following example:

<div class="hslice" id="auction">    <span class="entry-title">Auction Item</span>    <a rel="feedurl" href="update.html#auction-update" style="display:none;"></a></div> 

Although an empty link is not visible to users, it creates an extra tab stop and will be read by screen readers. To provide a good user experience, it is best to suppress the display of alternative links, so that users do not accidentally navigate to these Web pages.

Update.html is a simplified Web page that represents just the basic Web Slice. Internet Explorer uses this Web page to get updates and notify users.

<div class="hslice" id="auction-update">    <h2 class="entry-title">Auction Item</h2>    <p class="entry-content">Current bid is $66</p></div> 

The alternative update source might also point to a short RSS or an Atom feed that has only one item. Note that an optional expiration time can be specified using the mon namespace and mon:endtime element, as shown in the following example:

<?xml version="1.0"?><rss version="2.0"    xmlns:mon="http://www.microsoft.com/schemas/rss/monitoring/2007">    <channel>        <title>Auction Item</title>        <link>http://www.example.com</link>        <ttl>15</ttl>        <mon:endtime>2008-12-06T01:00:00Z</mon:endtime>        <item>            <title>Nintendo DS System = $66.00</title>            <link>http://www.example.com/auction.aspx#slice</link>            <description> ... </description>        </item>    </channel></rss> 

Previewing in the Favorites Bar

The following property is optional but highly recommended.

entry-content property

This property defines that part of a Web Slice which Internet Explorer brings to the Favorites bar. When the Web Slice button is clicked, the entire entry-content element is displayed in the Web Slice preview window. The default width and height of the preview window are determined by the size of the container designated as entry-content. Like the entry-title property, a Web Slice can include more than one entry-content, and all elements are combined to create a single preview.

For security reasons, the Web Slice preview does not allow script or Microsoft ActiveX controls; consequently, HTML forms and buttons are not supported. Instead, to perform related actions, you can provide links that include URL parameters. (Navigation occurs within the currently selected tab.) If additional functionality is required, use an Alternative Display Source.

Applying Styles

The following style sources are used to create the Web Slice preview:

  • The inline styles of the entry-content property and its children
  • Style rules defined in the head of the original Web page
  • Imported and linked style sheets

Styles from parent elements are not inherited by the Web Slice preview. In the following example, the Web Slice preview does not display blue text, because the style is applied to the hslice instead of the entry-content element.

<div class="hslice" id="main" style="color:blue">    <h2 class="entry-title">Seattle Weather</h2>    <p class="entry-content">It is raining.</p></div> 

To affect the HTML preview, apply the style to the entry-content element itself:

<div class="hslice" id="main">    <h2 class="entry-title">Seattle Weather</h2>    <p class="entry-content" style="color:blue">It is raining.</p></div> 

Internet Explorer wraps the entry-content in a body element when rendering the preview. This guarantees that style rules applied to the body element of a Web page are also applied to the body of the Web Slice preview. To apply a body style that is not inherited by the preview, use a class name or id to differentiate the body of the original Web page; then, write specific style rules for that element. In the following example, the green background applies to the original Web page but not to the preview.

<style type="text/css">    .MyBody { background-color: green; }</style><body class="MyBody">    <div class="hslice" id="main">        <h2 class="entry-title">Seattle Weather</h2>        <p class="entry-content">It is raining.</p>    </div><body> 

The preceding style guidance only applies to Web Slices extracted from the main Web page. An Alternative Display Source has full control over the style of the Web Slice preview.

Alternative Display Source

Clicking a Web Slice button on the Favorites bar will generate a preview of the cached value of the entry-content element. Because the Windows RSS Platform is used for storage, active content such as script or ActiveX controls is removed. To host interactive Web content in a Web Slice preview, an alternative display source must be used.

Alternative display source graphic.

Figure 5: Specifying an alternative display source.

a rel=entry-content

The alternative display is specified by changing the entry-content element to a link that has the following attributes:

  • href: Location of the alternative display Web page.
  • rel: The value must be entry-content.
<div class="hslice" id="auction">    <span class="entry-title">Auction Item</span>    <a rel="entry-content" href="display.html" style="display:none;"></a></div> 

When a user opens the Web Slice preview window, it will navigate to Display.html; the cached copy of entry-content is not used. The alternative display source preview should be small enough to fit into a 320-by-240 (WH) space, on a 96-dots per inch (dpi) screen. The preview window does not automatically resize to the dimensions of the Web Slice, if a different size is used.

The following are not supported in an Alternative Display Source preview:

  • Dialog boxes and popups — Script alerts, HTML popups, and dialog boxes such as Print or Add a Favorite are all blocked. No windows of any kind can be launched from the Web Slice flyout.
  • First-time installation of ActiveX controls — The preview window does not display an information bar (gold bar), so the publisher needs to ensure that any ActiveX objects are enabled for the domain beforehand; otherwise, a broken control icon (red 'X') is displayed. The same restriction applies to downloads or anything else that might require user interaction with the information bar.
    Note  Alternative text inside the object tag is displayed if the ActiveX control is not installed. Publishers can use this text to provide download instructrions, for example.
  • Calls to window.external — There is no valid external object associated with the preview window.

When using an alternative display source, all hyperlinks in a Web Slice preview will navigate within the preview window. For links to open in the current tab, specify that they be opened in a new window, as shown in the following example:

<a href="gotosite.html" target="_blank">view full site</a>

For the server, consider using the no-cache HTTP header response for the Web page that acts as the Alternative Display Source. This will ensure that users previewing a Web Slice will see the most recent content.

Using Both Alternative Update and Display Source

It is also possible to combine alternative update with an alternative display.

Alternative display and update graphic.

Figure 6: Using both alternative update and display source.

For this scenario to work, the code for Basic.html only needs to include the alternative update source, as shown in the following example:

<div class="hslice" id="auction">    <span class="entry-title">Auction Item - Alternative Update</span>    <a rel="feedurl" href="update.html#auction-update" style="display:none;"></a></div> 

Update.html will then provide information about the alternative display source:

<div class="hslice" id="auction-update">    <h2 class="entry-title">Auction Item - Alternative Display</h2>    <div class="entry-content">Current bid is $66</div>    <a rel="entry-content" href="display.html" style="display:none;"></a></div>  

Navigating to a Different Web Page

When you click the Open button Cc196992.webslice_open(en-us,VS.85).gif on the Web Slice flyout, Internet Explorer normally navigates to the Web page where the Web Slice was first found or, alternatively, to the the Alternative Update Source if one was specified. The publisher can also modify this behavior by specifying another location altogether, using the bookmark property.

bookmark property

The bookmark property causes the Open button to navigate to the specified location when clicked. The following example opens Location.html.

<div class="hslice" id="auction">    <h2 class="entry-title">Auction Item</h2>    <a rel="bookmark" href="Location.html" style="display:none;"></a></div> 

Customizing the User Experience

The following advanced techniques help you customize how users interact with the Web Slices on your Web site.

Using the Web Slice Icon

Web Slice Icon (64x64)

The Web Slice icon is intended to provide a common visual cue that a Web site includes Web Slice functionality, which makes content from the Web site available for subscription. It must be used in accordance with the Web Slice Icon Guidelines. Developers can download .png and .gif renderings of the Web Slice icon in various sizes.

Setting the Default Web Slice

The Feed Discovery button Cc196992.webslice_discovery_button(en-us,VS.85).gif lists each Web Slice in the order it was parsed from the document by Internet Explorer. A publisher can choose which Web Slice is presented first, thus promoting one particular Web Slice to represent the whole Web page. For example, a bookseller might mark up a list of top sellers, as well as book reviews on the same Web page. It might make sense for the bookseller to promote the top-seller Web Slice as the default, because that facilitates the financial goals of the Web site whereas book reviews merely increase readership.

To specify the default Web Slice, add a link element in the head of the document:

<link     rel="default-slice"    type="application/x-hatom"     href="#auction" /> 

The link element requires the following attributes:

  • rel: The value must be default-slice.
  • type: The value must be application/x-hatom.
  • href: The #id of the Web Slice within the current document.

If the default Web Slice is not found within the current document, the link is ignored.

Disabling In-Document Discovery

To disable in-document Web Slice discovery for all Web pages, in the Internet Options dialog box, click the Content tab, then click the Settings button for feeds, and then clear the Turn on in page Web Slice discovery check box. Web developer can disable in-document discovery for the current Web page by adding the following meta element to the head of the document (all values are case-insensitive):

<meta name="slice" scheme="IE" content="off"/> 

If in-document discovery is disabled, the Feed Discovery button Cc196992.webslice_discovery_button(en-us,VS.85).gif will still indicate that the Web page includes a Web Slice, but the Web page must provide its own UI for discovering and installing it. To prompt users for installation, call window.external.AddToFavoritesBar().

Creating Dynamic Web Slices

Although Web Slices work well with static content that can be downloaded to be viewed at a later time, creating a Web Slice for content that changes in response to form data, Asynchronous JavaScript and XML (AJAX), or user interaction is more difficult. For example, a Web Slice might offer movie listings for all theaters within a few miles of the user's location. Or, a Web Slice might offer a list of newly released books that match a user's search criteria. To update Web Slices like these, Internet Explorer would have to recreate the conditions that generated them.

If you dynamically create or update a Web Slice after the document has loaded, you must provide an Alternative Update Source and/or Alternative Display Source to handle the preview. Pass URL parameters to the external resource so that it can generate a response based on the user's input. Finally, after updating or inserting a Web Slice, call window.external.ContentDiscoveryReset() to refresh the list of Web Slices and feeds on the Feed Discovery button. Cc196992.webslice_discovery_button(en-us,VS.85).gif

Authenticating the User

If your Web site requires user authentication, consider providing a way for the user to save login credentials in a cookie that can be used to identify the user after the browser is closed. Although user name and password settings are available in the Web Slice Properties dialog box, they are only used for HTTPS+Basic and HTTP/HTTPS+Digest authentication. Cookie-based authentication is more broadly supported, and will work regardless of the security settings of the Web Slice.

Note  HTTP-based authentication is not supported for Web Slices that use an alternative update and/or display source. For these scenarios, use cookie-based authentication.

Because some users will choose not to persist their personal information, your Web Slice should explain how to log into your site if authentication was unsuccessful. If you use an Alternative Update Source, then return a single item feed that contains this information in the item's description.

Related Topics


clock March 23, 2009 23:29 by author ethan
Updated: 14 April 2006

SQL Server 2005 Reporting Services offers two deployment models:

  • A standard deployment consists of a single report server instance that uses a local or remote SQL Server Database Engine to host the report server database. You can use SQL Server 2000 or SQL Server 2005 to host the report server database.
  • A scale-out deployment consists of multiple report servers that share a single report server database. The database can be installed on a remote SQL Server instance or locally with one of the report servers. The SQL Server instance that hosts the report server database can be part of a failover cluster.
    Scale-out deployment is supported in the following editions: Enterprise Edition, Developer Edition, and Evaluation Edition.

To simplify the deployment process, you can use checklists that describe the sequence of tasks that must be performed in order to complete a standard deployment. For more information, see Server Deployment Checklist. To learn more about scale-out deployment, see Configuring a Report Server Scale-Out Deployment. For more information about SQL Server 2005 licensing, see the SQL Server 2005 Licensing section on the How to Buy page of the SQL Server Microsoft Web site.

The following diagram shows the standard deployment model where the report server database is located on a remote server. You can also install it locally so that all server components are on the same computer.

Report server standard deployment

Primary considerations in choosing where to host the report server database are:

  • Processing resources
  • Disk space availability

Both the report server and the Database Engine compete for processing resources such as CPU time, memory, and disk access. Some report server operations are resource intensive. For example, a report server attempts to use all available memory for report rendering operations. Running the report server on a separate computer can reduce competition for processing resources.

Report server requirements for disk space provide a second reason for using a remote SQL Server Database Engine to store report server data. Although the footprint of a report server database may be small initially, disk space requirements can grow significantly at run time depending on how you run reports and the number of users accessing the report server. For more information, see Report Server Database Requirements.

You can deploy Reporting Services in a scale-out deployment to create a highly available and scalable report server installation. Configuring a scale-out deployment is also useful if you want to improve the performance of scheduled operations and subscription delivery. A report server scale-out deployment consists of multiple report servers that share a single report server database. Each report server in the deployment is referred to as a node. Nodes participate in the scale-out if the report server is configured to use the same report server database as another report server.

Report server nodes can be load balanced to support high volume reporting. You can also create the report server database on a failover cluster to support high availability requirements.

Unsupported cluster configurations include deploying a complete report server installation (that is, a report server and its database) on each node of a multi-node cluster. Specifically, you cannot deploy Reporting Services on a two-node cluster that consists of an active node and a passive node that is used when the active node fails.

Deploying on a Network Load Balanced (NLB) Cluster

You can run report server nodes on an NLB cluster. You can use a software or hardware solution to implement the NLB cluster. To run the report servers on an NLB cluster, you must use software and tools that support that functionality. Reporting Services does not provide server cluster or virtual server management, nor does it provide a way for you to define a virtual server name that provides a single point of entry to all of the nodes in a report server scale-out.

NLB is only necessary if you want to improve report server performance for on-demand reporting and interactive reports (such as drillthrough and matrix reports). Scheduled reports and subscription processing are faster on a scale-out deployment, but do not necessarily require an NLB cluster to get faster performance.

Deploying on a SQL Server Failover Cluster

SQL Server 2005 provides failover clustering support so that you can use multiple disks for one or more SQL Server instances. Failover clustering is supported only for the report server database; you cannot run the Report Server Windows service or Web service as part of a failover cluster.

To host a report server database on a SQL Server failover cluster, the cluster must already be installed and configured. You can then select the failover cluster as the server name when you create the report server database.

Although the Report Server Windows service and Web service cannot participate in a failover cluster, you can install Reporting Services on a computer that has a SQL Server failover cluster installed. The report server runs independently of the failover cluster. If you install a report server on a computer that is part of a SQL Server failover instance, you are not required to use the failover cluster for the report server database; you can use a different SQL Server instance to host the database.

Scale-out Deployment Diagram

The following diagram shows multiple report servers and report server databases deployed in separate server clusters.

Scale-out deployment diagram

Release History

14 April 2006

New content
  • Deploying on a SQL Server Failover Cluster
  • Deploying on a Network Load Balanced (NLB) Cluster
Changed content:
  • Clarified how Reporting Services uses cluster technologies.

clock March 19, 2009 09:16 by author Sky Jia (贾超)

Posted by James Leigh on Mar 15, 2009 12:30 PM

The ACID properties are one of the cornerstones of database theory. ACID defines four properties that must be present if a database is considered reliable: Atomicity, Consistency, Isolation, and Durability. While all four properties are important, isolation in particular is interpreted with the most flexibility. Most databases provide a number of isolation levels to choose from, and many libraries today add additional layers which create even more fine-grained degrees of isolation. The main reason for this wide range of isolation levels is that relaxing isolation can often result in scalability and performance increases of several orders of magnitude.

Serializable consistency is one of the oldest and highest isolation levels that is generally available, and many choose it due to the simple programming model it provides - only one transaction can execute at a time against a given resource, and many potential sources of problems are removed. However, most applications (particularly web applications) cannot assume this very high level of isolation because it is impractical from the end user perspective - any application with a non-trivial number of users would quickly experience delays of several minutes accessing shared resources, which would rapidly reduce the number of users of that application back to a trivial number. Weak and eventual consistency are common in large distributed data sources such as the Web, and several very large and successful web-based applications (e.g. eBay and Amazon) have shown that optimistic weak consistency is much more scalable than traditional pessimistic mechanisms. This article takes a look at eight different isolation levels that you can use to potentially gain more performance and scalability in your applications by learning to relax data consistency constraints.

The main goal of concurrency control is to ensure transactions are isolated and do not interfere with one another. Higher degrees of isolation are achieved at the expense of potential performance gains. Concurrency control is implemented by a pessimistic or optimistic mechanism. Most relational databases, which are write-optimised, use a pessimistic mechanism. Pessimistic mechanisms use locks and may block operations or use some form of conflict detection. Pessimistic blocking is done when a table, page, or row has been modified, preventing other transactions from accessing potentially modified resources. However, optimistic mechanisms do not use any locks and rely solely on conflict detection to maintain transaction isolation. Conflict detection, as used by optimistic mechanisms, permits all read operations and verifies consistency at the end of the transaction. If a conflict is detected then the transaction is rolled back or repeated. Most web servers are read-optimised and thus use an optimistic mechanism. By permitting all read operations, optimistic mechanisms can achieve a higher read and write throughput while still preserving data consistency when resources are not continually changing.

The isolation levels listed below are here to help Web developers better understand the constraints placed on their programming models, and to engage system architects and developers in discussions to choose the most efficient isolation levels while maintaining necessary data consistency. They are listed from the least isolated (Read Uncommitted) to the most isolated (Serializability).

1 Read Uncommitted

Read uncommitted isolation level requires little isolation between transactions. Every read operation may see pending write operations from any transaction (dirty reads). However, committed write operations must have a serial order to prevent dirty writes. A pessimistic mechanism will block conflicting write operations until others are committed or rolled back. An optimistic mechanism will not lock and will allow everything to go through. If a connection is rolled back, all other connections that made subsequent modifications to the same data will also be rolled back. Shared caches are permitted in this level without validation. This isolation level is best used when transactions are not needed (such as a read-only dataset) or are modified with exclusive access to the database.

Example: An archive database that is only updated while offline, or an audit/logging table that is not used within a transaction

2 Read Committed

Read committed may read any committed state of the system and may be cached without validation (mixed states) as long as changes in the current connection are reflected in the results. Pessimistic mechanisms implement this as a Monotonic View. Optimistic transactions store all changes in isolation, making them only available to itself until committed. Read committed is implemented with an overly optimistic mechanism that delays writing all changes until the transaction is committed. This form of optimistic isolation permits complicated write operations without blocking read operations and has no validation schema. Shared caches are permitted only for committed states. This isolation level is best used when older values are permitted in results and transactions are only use for write operations.

Example: An online forum, when the absolute latest postings may not necessarily be shown and posts don't conflict with each other

3 Monotonic View

Monotonic view is an extension to read committed where transactions observes a monotonically increasing state of the database as it executes. In this level, a pessimistic transaction may be blocked during read operations if there is an outstanding write transaction. Optimistic transactions behave like read committed, keeping their changes in isolation, but validate their cache to ensure it is still valid. Periodically synchronized database clones are permitted in this level. This isolation level is best used when transactions are not needed or transactions only contain write operations.

Example: A user preference tables that are modified only by one person

4 Snapshot Reads

Snapshot Reads extends monotonic view and guarantees that query results reflect a consistent snapshot of the database. A pessimistic mechanism will block other write operations from affecting the results while they are being read. An optimistic mechanism will allow other write operations and inform the reading transaction if any of the results have changed and may roll it back. To implement an optimistic mechanism, a validation must be performed at the end of the read operation to detect if any concurrent write operations modified the result, and if so the result maybe repeated or rolled back. This validation may simply check if write operations occurred in the same table, or it might check the query results for the changes. This optimistic isolation level can detect conflicts easily and favours write operations, while permitting concurrent read operations. This level permits periodically synchronized database clones so long as they provide snapshot reads. It is best used when write operations are low or unexpected to conflict with concurrent read operations and when query results need to be consistent.

Example: A currency conversion or lookup table that is queried more often then it is modified and only the newest values are kept,

5 Cursor Stability

Cursor Stability isolation extends read committed and is the default isolation level of many relational databases. In this isolation level, a pessimistic transaction must indicate which records it will modify when reading them, if done in a separate statement. This is often done using 'FOR UPDATE' keywords appended to the end of a 'SELECT' query. In this case, other conflicting read or write pessimistic transactions will be blocked until the transaction is finished. An optimistic transaction tracks the version number of all modified records/entities to be verified when committed. This is the most popular optimistic isolation level and is provided by all major object-relational mapping libraries. In the Java Persistence API, this level can closely be achieved using FLUSH_ON_COMMIT (although queries may not reflect local changes), and if a conflict is detected an OptimisticLockException is thrown. This isolation can also be used with the HTTP headers If-Match or If-Unmodified-Since that compare a previous resource's version or time-stamp before updating. This level of isolation is best used for entities that are modified based on external information (not read from the database) and changes must not overwrite each other.

Example: A shared company directory or a wiki

6 Repeatable Read

Repeatable Read isolation extends cursor stability and guarantees that any data read within the transaction will not be modified or removed during the transaction. A pessimistic transaction will acquire read locks on all records and block other transactions from modifying them. An optimistic transaction will track all records or entities and verify they have not been modified when committed. This level of isolation is best used when entity states can affect other entities and transactions are made up of read and write operations.

Example: An order-tracking database, where values are read from one entity and used to compute values for other entities.

7 Snapshot Isolation

Snapshot isolation extends snapshot reads and repeatable read and guarantees that all read operations made in a transaction will see a consistent snapshot of the database. Any read operation performed in a transaction will have the same result regardless of whether it was performed earlier or later in the transaction. This differs from repeatable read isolation because it prevents phantom reads (range query results changing). This level is supported by many relational databases in the form of multi-version concurrency control (maybe called SERIALIZABLE), which is pessimistically implemented using a combination of locks and conflict detection. In this level, transactions must be prepared to be rolled back due to conflicts from either a pessimistic mechanism or an optimistic mechanism. A pessimistic mechanism will try to reduce the chances of a conflict by locking resources, but must merge changes when transactions are committed. An optimistic mechanism may also use a multi-version concurrency control, but would not block other transactions from engaging in potentially conflicting operations, instead it would roll back transactions that were found to conflict. This level of isolation is best used for transactions that read and modify multiple records.

Example: A workflow system, with rules based on the state of the system.

8 Serializability

Serializability is an extension of snapshot isolation that specifies all transactions must occur as if they had executed serially, one after the other. A pessimistic mechanism acquires range locks for all evaluated queries, preventing write operations from affecting these results. An optimistic mechanism tracks all evaluated queries and either uses a backwards validation scheme or a forwards validation scheme at the end of the transaction to detect if any concurrent write operations affect concurrent read operations, and if so, rolls back all but one of the conflicting transactions. In this isolation level, the apparent state of the system by any committed transaction will not have changed. This level of isolation is used for transactions that require complete data consistency.

Example: An accounting system that performs range queries to compute new values.

Summary

Below is a summary of the isolation levels outlined in this article, to help you find the level that is most appropriate for your application.

Types of possible collisions between transactions in different isolation levels:

  

Dirty Writes

Dirty Reads

Mixed states

Inconsistent reads

Overwrites

Non-repeatable

Phantom Reads

Inconsistency

Read Uncommitted

Not permitted

Permitted

Permitted

Permitted

Permitted

Permitted

Permitted

Permitted

Read Committed

Not permitted

Not permitted

Permitted

Permitted

Permitted

Permitted

Permitted

Permitted

Monotonic View

Not permitted

Not permitted

Not permitted

Permitted

Permitted

Permitted

Permitted

Permitted

Snapshot Reads

Not permitted

Not permitted

Not permitted

Not permitted

Permitted

Permitted

Permitted

Permitted

Cursor Stability

Not permitted

Not permitted

Permitted

Permitted

Not permitted

Permitted

Permitted

Permitted

Repeatable Reads

Not permitted

Not permitted

Permitted

Permitted

Not permitted

Not permitted

Permitted

Permitted

Snapshot Isolation

Not permitted

Not permitted

Not permitted

Not permitted

Not permitted

Not permitted

Not permitted

Permitted

Serializability

Not permitted

Not permitted

Not permitted

Not permitted

Not permitted

Not permitted

Not permitted

Not permitted

Optimistic requirements for different isolation levels:

  

Cache

Data Sync

Optimistic Conflict Scheme

Suggested Operations

Example

Read Uncommitted

Cache permitted

Sporadic

Detect dirty writes

No concurrent read and write

Archive

Read Committed

Cache permitted

Sporadic

No conflict detection

Monotonic read/write

Web Forum

Monotonic View

Must be validated

Periodic

No conflict detection

Combined reads

User preferences

Snapshot Reads

Must be validated

Periodic

Compare modifications to reads

Consistent reads

Lookup table

Cursor Stability

Cache permitted

Sporadic

Compare modified entity versions

CRUD services

Directory

Repeatable Reads

Cache permitted

Sporadic

Compare read entity versions

Read/write entities

Order tracking

Snapshot Isolation

Must be validated

Periodic

Compare read entity versions

Synchronized entities

Work-flow

Serializability

Must be validated

Full Sync

Compare queries with modifications

Complete data consistency

Accounting

Data consistency is vital in database applications -- it allows developers to make sense of data within a concurrent environment. Although strong consistency levels such as serializability provide a simple programming model, they can cause excess overhead, blocked operations, or transaction rollbacks and may be unnecessary for many applications. Being aware of other, potentially more appropriate isolation levels can help ensure that developers and system architects understand the data consistency needs, while balancing performance tradeoffs.


clock March 18, 2009 23:40 by author arthur

Report Definition Language XML Schema.xml (85.51 kb)


clock March 18, 2009 23:14 by author arthur

1. Report XML Diagram


2. Report Items XML diagram
 

3. Data Regions XML Diagram


4. Table XML Diagram


5. Matrix XML Diagram


6. Chart XML Diagram

clock March 12, 2009 18:21 by author Sky Jia (贾超)

作者 章昱恒 发布于 200935上午1230

数据迁移是指在系统软件开发中,将具有实际业务价值的数据,依据功能需求或系统开发的要求,在不同存储媒介、存储形式或计算机系统之间转移的过程。

数据迁移是系统开发经常涉及到的一项工作。在企业级应用系统中,新系统的开发,新旧系统的升级换代,以及正常的系统维护,不可避免地涉及到大量的迁移工作。而在一个以数据为核心的业务系统中,数据的迁移更是无处不在。比如:在以数据仓库为架构原型的系统设计中,ETL(抽取,转换,装载)部分的实现就是一种数据迁移;对大型数据系统的分布式实施,数据迁移就是整个实施过程的主要部分。而在敏捷实践中,渐进式的数据库开发,更是涉及到大量的数据迁移和同步工作。

我们时常会听到用户提出这样的要求"我们并不过于关心应用的好坏,但需务必保证数据准确"。的确,在以数据为运营基础的行业里,数据质量本身就是软件质量的权重部分,尤其在电信、金融和控制领域里,这一特征表现的格外明显。数据迁移也是敏捷开发中相当重要的环节,它影响着各个发布版本的数据质量,而数据质量又决定着系统的有效性和可靠性,因此高质量地完成数据迁移不容忽视。

数据迁移往往被视为一件很简单的工作。在很多人眼里,数据迁移仅仅是用sql语句向相应数据表装载数据的过程。但在实际操作中,数据迁移涉及到很多层面的因素,如用户需求,系统功能,数据库建模等,若出现问题,将导致开发进展缓慢或质量不高。常见问题有业务系统逻辑模糊、脏数据、遗留系统的技术债和管理债等。那么如何有效的避免这些问题,提高迁移质量呢?

本文将以ThoughtWorks中国公司与客户合作的CRM项目为背景,为读者介绍如何在敏捷开发中高质量地处理数据迁移工作,从而在数据层面提高系统质量。

开发背景

A系统(旧系统)是客户原有的一套CRM(客户关系管理)系统。系统采用B/S 架构,使用sql server 2005做为后台数据库。旧系统的数据建模设计采用了高度范式化的设计思路,其目的是极度追求灵活性。业务数据被大量拆分并散布存储在上百张数据表里。数据表内和表之间不存在参照约束。大量的业务逻辑采用存储过程封装以提高效率。存储过程体系相当庞大,且存在复杂的相互调用。数据库中存在一些脏数据,可能是长期的使用、维护或误操作导致,但没人知道它们有多少,具体存在哪里。应用界面可用性不理想而且系统效率较低,用户常抱怨系统反映迟缓或无反应。数据库存储的业务数据约50G左右。

ThoughtWorks 团队将为客户提供一套新的CRM系统用以替换旧系统主要功能。新系统精简整理旧系统功能,并整合了客户的最新需求。在设计上做了巨大变更,以改善界面可用性,同时为了保障终端用户对系统服务的需要,新旧系统要求能够同时运行并实现数据同步,当终端用户全部过度到新系统后,终止旧系统。在这个过程中,DBA 团队需给予足够的数据保障。

以下为项目版本的发布图。

数据迁移开发方法

1. DBA需要制定目标并且管理自己的任务

尽管在每个迭代中,团队都会讨论决定如何组织'需求故事'story),但是DBA仍然需要有自己的'故事墙'story wall),并且花时间组织自己的story。在实际开发中,数据迁移仅仅是DBA工作的一部分,DBA还要完成相应的story开发和数据分析,有时还要给开发人员提供数据支持。混乱的管理会带来开发上的冲突。因此,有效管理任务是做好数据迁移的首要环节。

故事墙是管理这些任务的最好方法。尽管这个故事墙对客户提供的商业价值是间接的,但从整个团队角度来看,任何需要数据的人或程序都是DBA的用户,故事墙有利于管理每个story包含的数据需求,避免数据迁移任务与其它数据库开发任务之间的冲突,从而减少重复性工作或修复性工作。DBA有必要将这种方法引入到数据库开发中。

DBA要从商业价值角度决策数据迁移的需求。系统开发中,客户和开发员常常会向DBA提出自己的数据迁移要求,但往往这些要求并不具有全局性和决定性,毕竟他们仅仅是针对一个story的需要而提出。如果DBA盲目执行,将起到事倍功半的效果。DBA应当积极参加IPM(迭代计划会议。它是在每个迭代开始时的会议,全体成员共同讨论story计划完成数量)。无论是直接与用户交互,还是参与团队合作,DBA有必要将每个story内容了解的清清楚楚。通常,DBA可以不必像开发人员一样去了解story的开发细节,但通过与业务分析师和开发员的沟通,潜在的数据需求自然浮出水面。针对这些数据需求,通过再次组织并加以优先级,我们很容易回答这些问题:接下来应该完成的任务是什么?它的实际商业价值是什么?谁将需要它?什么时候需要?实践证明,多花些时间和团队或客户沟通是事半功倍的好方法,而且DBA通过了解业务数据可以给开发员更好的指导,减少开发员对数据的误解,有利于提高整体团队的开发效率。

通过对每个story的了解,我们总结并制定了针对当前发布版本需要的7个数据迁移story,并且确认了它们的确不存在任务上的重复,也邀请项目经理和客户一起确认了这份计划。如此我们的目标已经制定。

2. 思考实施策略

我们已经管理好所有数据迁移的任务,接下来考虑如何实现。通过以往的经验,我们发现如果没有仔细思考全局和细节问题而直接编写代码,带来的后果是无法控制的。我们应该首先充分了解这个过程可能存在的风险,然后决定采用什么样的策略,是否可以借助工具提高效率。这里的潜在风险主要包括:

2.1 数据质量

旧系统的数据库建模是一个高度范式化的结构,每个表之间存在相当大的依赖关系。一旦一个表存在脏数据,我们如何保证得到正确的查询结果?

2.2 对原有系统的了解

旧系统的应用程序引入了面向对象的设计方法,并且继承关系数据也被存储在若干张数据表里,如何正确区分这些业务对象和关系,保证在迁移过程中不会制造脏数据?

2.3 业务数据映射

旧系统和新系统之间存在着相当大的业务逻辑差异,我们是否能够将业务逻辑、数据映射到新系统?是否存在不可实现的转换?

在未充分了解这些问题之前,我们无法进一步制定计划,即时给予客户反馈是解决这些问题的最好方法。经过进一步沟通后,我们发现问题的复杂程度远远超过想象,尽管客户对旧系统非常了解,但他们对于某些数据也不能给出明确答案。鉴于这些情况,我们制定了初步的解决策略:

  1. 更多的了解旧系统,即时给予反馈。对于那些无法找到答案的问题,考虑是否可以寻求其它资源或忽略没有价值的数据。
  2. 尽量细化分割每一个复杂需求,形成多个任务。小粒度任务能够帮助暴露更多问题。
  3. 采用测试驱动,确保一套可靠的测试机制。
  4. 制定实现框架和阶段性目标。
  5. 不要过于乐观的估计进展,每一阶段要留有充分的单元测试。
  6. 调整每个迭代的内容,对有较强依赖关系的任务可以放在今后的迭代周期里。

3. 实施数据迁移

新系统的数据迁移包含两个部分:一次性数据迁移和数据同步迁移

一次性数据迁移

一次性数据迁移指仅仅发生在某一个发布版本上线安装时,新旧系统同时处于脱机状态,一部分数据将从旧系统中转移到新系统的过程。

数据同步迁移

数据同步迁移过程发生在新系统运行时,新旧系统同时处于工作状态,双方通过交换数据保证彼此数据的一致性。

同为数据迁移,但因两类迁移各具特点,因此在共同的处理方式上也略有不同。

  

一次性数据迁移

数据同步迁移

特点

  1. 数据量大。
  2. 使用频率低(一次性使用)。
  3. 转换逻辑复杂,需大量定制映射转换数据。

  

  1. 数据量小
  2. 使用频率高(以分钟为单位,周期性运行)。
  3. 转换逻辑复杂,少量定制映射转换数据。
  4. 需要事务处理以保证数据一致性

共同处理方式

  1. 细化任务。
  2. 测试驱动。
  3. 持续集成

不同处理方式

  1. 在执行测试驱动中,应侧重数据质量的测试。应依据不同环境的测试结果,增强测试体系。
  2. 工具选择。避免使用第三方工具,直接使用sql脚本以提高迁移效率。
  3. 保留中间处理结果
  1. 在执行测试驱动中,应侧重逻辑映射方面的测试。
  2. 工具选择。可考虑使用第三方工具,增强事务控制。
  3. 可不保留中间结果
  1. 细化任务

依据最初制定的开发策略,当我们遇到复杂的迁移需求时,首先分解每个需求为若干个模块,然后画出整体结构图。以下是某一处数据迁移脚本的模块分割:

最初由于这个部分的迁移逻辑过于复杂,以至于客户对它的处理结果没有信心。但当共同完成这个图表后,大家一致认为它没有像想象中的困难。总而言之,立刻解决一个复杂的问题很困难,但解决其中一个小问题却很容易。

  1. 测试驱动

如同编写程序代码一样,我们不仅为实现数据迁移脚本使用了测试驱动,还引入了针对数据库设计的一些方法。在程序设计中,当代码本身结构良好,单元(类、方法)之间关系清晰,可以直接添加单元测试。现在,我们有了很好的脚本逻辑结构,可以很容易添加每一步结果的单元测试,这就如同形成了一道安全网,保证异常数据出现时,能够立即发现并加以处理。在实际编写迁移脚本之前,应首先明确测试内容,准备好测试脚本。

测试内容包括:

  • 应产生的符合期望的数据

基于给予的原始测试数据,这一测试过程测试脚本的数据转换逻辑是否正确。以下举例说明:

测试环境:旧系统中存在某个名为'Jason'的客户信息,他的personId 1000101

测试目的:当某一客户的信息迁移到新系统的CUSTOMERS表后,新系统应该存在该客户信息。

新系统上要运行的测试代码:

DECLARE @personName NVARCHAR(250),

  

SELECT

@personName = personName

FROM

CUSTOMERS

WHERE

personId = 1000101

 

IF (@personName <> 'Jason') or (@personName is NULL)

BEGIN

INSERT INTO LoadTestErrorLog (errorDescription)

VALUES ('personName for personId 1000101 is not Jason')

END

Go

 

这里常用的原则是:一段sql语句仅用来测试一处期望数据,这样可以减少代码之间的相互依赖性,更准确的定位错误数据。

  • 不应当产生的异常数据

异常数据指在迁移过程中出现的不符合逻辑的数据。理论上讲,迁移过程不应当出现异常数据,然而现实情况中,迁移结果总会出现我们不需要的数据。其原因包括数据源出现异常、实现过程中的误操作、系统应用的bug等。总而言之,为了保证这些错误不会出现在最终结果,相应的测试脚本必不可少,也是防止问题进一步扩大的有效举措。这一测试过程常被用来发现在生产环境中可能出现的问题。以下举例如何测试异常数据:

测试环境:全部或部分生产环境数据

测试目的:将某个客户的信息迁移到新系统的CUSTOMERS表后,数据表不应该具有顾客名字为空的记录,如果出现将视为迁移过程的错误。

新系统上要运行的测试代码:

DECLARE @isExistPersonNameWithNULL INTEGER

  

SELECT

@isExistPersonNameWithNULL = count(*)

FROM

CUSTOMERS

where personName is null

  

IF (@isExistPersonNameWithNULL> 0)

BEGIN

INSERT INTO LoadTestErrorLog (errorDescription)

VALUES ('personName doesn't contain legal information')

END

  

Go

  • 数据表的数据量是否符合期望

当数据被迁移至新系统后,应当确保迁移数据量符合应期望值。实现方法多种多样,较简单的方法是直接比较数据迁移前后的数据记录数是否在数值上相等。以下举例说明:

测试环境:全部或部分生产环境数据。

测试目的:客户数据被迁移后,应当确保客户数据没有丢失。

新系统上要运行的测试代码:

DECLARE @NumberofCustomerinOldDB INTEGER

DECLARE @NumberofCustomerinNewDB INTEGER

  

SELECT

@NumberofCustomerinOldDB = count(*)

FROM

oldDB.dbo.persons -- 这是在旧系统中定义的客户表

...

--省略复杂的过滤逻辑

SELECT

@NumberofCustomerinNewDB = count(*)

FROM

newdb.dbo.CUSTOMERS -- 这是在新系统中定义的客户表

where personName is null 

 

IF (@NumberofCustomerinOldDB<>@NumberofCustomerinNewDB )

BEGIN

INSERT INTO LoadTestErrorLog (errorDescription)

VALUES ('not all customers are migrated ')

END

Go

 

最终当把测试sql代码片段组装在一起后,我们获得了一批测试脚本,并按照以下流程,通过使用NANT工具实现自动化:

NANT中的实现方法:

<target name="-init " … />

该任务负责初始化测试环境

<target name="-parseDbScripts " … />

该任务负责编译并部署迁移脚本

<target name="-resetTestData " … />

该任务负责重置测试数据

<target name="-executeMigrationScripts " … />

该任务负责执行迁移脚本

<target name="-testMigration " … />

该任务负责执行迁移测试脚本

<target name="testDataMigration" depends="-init, -parseDbScripts, -resetTestData, executeMigrationScripts, -testMigration" />

该任务将成为持续集成调用的入口

  1. 持续集成

为完成持续集成测试,测试沙盒必不可少。"沙盒"是一个完整的功能环境,在这里脚本能够被编译,测试和运行。

  • 在开发沙盒中,我们准备了少量的核心数据,用以测试sql脚本的质量。
  • 在系统级集成测试沙盒中,我们还准备了一个小型数据库,这个数据库包含了一部分核心数据,着重测试数据迁移过程的逻辑转换。
  • 在生产环境级测试沙盒中,由于数据库来源于实际数据备份,因此数据处于不断变化状态,这就更需要不断运行测试脚本,避免脏数据和数据丢失。由于生产环境数 据量相对大了许多,我们可以适当减少测试次数以减少对开发资源的消耗。同时,其它测试脚本,如变更数据库结构的脚本,都可以和数据迁移脚本组织在一起,一 次性完成测试。

    同样,我们采用自动化机制维护这些开发测试沙盒。

    将测试置于持续集成环境中,下图是处于持续集成环境的测试任务。

  1. 工具选择

选择数据迁移工具应当以帮助提高工作效率和数据迁移运行效率为原则。通常最直接的方法是编写sql脚本,借助其它工具也能起到很好的效果,比如MS SSIS等。然而我们发现,过多的引入第三方工具往往带来的麻烦也多,例如,我们不得不花时间来学习这些工具的某些特殊用法,有时工具也会产生bug,以至于不得不再花时间解决这些bug,而这与最初的开发目标相背离。因此,有效的方法是尽量使用sql脚本执行所有的迁移工作,同时也得到了最佳的执行效率。

  1. 保留中间结果用于脚本调试

相比设计语言,Sql语句较难调试,即使有些数据库产品提供了调试工具,但是调试数据结果集仍然是项挑战性的工作。尤其在旧系统到新系统的迁移过程中,业务逻辑发生巨大变化,客户经常要求提供某些证据,来解释他们对数据迁移结果的怀疑。保留中间环节数据,不仅方便调试,也方便数据追溯,为开发带来更高效率。以下举例说明:

SELECT

...

into debug_allpersonhistroy

FROM

oldDB.dbo.personhistory -- 这是在旧系统中定义的业务存储表

...

--省略复杂的过滤逻辑

select column1...columnN

into debug_allpersonhistroy_aftermapping --保留这一步数据集合

from debug_allpersonhistroy inner join mappingtableBtwOldandNew

...

--省略复杂的过滤逻辑

SELECT

...

FROM

newdb.dbo.contactHistory -- 这是在新系统中定义的业务存储表

...

--省略复杂的过滤逻辑

Go

典型问题

数据迁移在不同的场景往往出现不同的问题,单凭经验也不能全部解决。运用头脑风暴,集中团队中所有力量思考所有可能出现的问题并加以避免。有时开发员遇到的问题也帮助DBA少走弯路。最终,头脑风暴能够提供我们的是一份有价值的列表,里面包含各种问题和注意事项:

  1. 一致性检查

一致性检查包括:字符编码检查、语言设置、环境参数设置等。

迁移过程常出问题的是字符集,它带来的问题是数据乱码。不同系统在最初设计时应用的字符集或编码格式未必相同。在迁移过程中,单凭缺省设置是不够不安全的。有效的办法是在项目伊始,即确认系统间环境一致性。在新系统中采用兼容性的unicode编码也能够解决这些问题。

  1. 控制NULL的使用

由于旧系统本身很少使用约束,以至于在表连接查询中出现大量无法得到正确匹配的数据。在 sql中,当我们试图使用自然连接,我们发现某些数据丢失了,如果使用外连接,这将会带来一种新的脏数据:NULL。从数据库设计角度,NULL不代表任何含义,而实际情况中,很多数据库建模往往给NULL赋予含义,甚至多种含义,以至于不同的查询需求要视不同的业务逻辑对待。在旧系统里,这种现象比比皆是,无疑给迁移带来了不少麻烦。

解决方法:不为NULL赋予逻辑上的定义。尽量少使用外连接运算。

例如:

旧系统定义如下父子结构表:

objectId, parentObjectId,objectType …

------------------------------------

Null Null 'root'

1 Null 'contactManager'

2 1 'contact'

3 1 'contact'

4 Null 'orgnisation1'

 

显然,系统希望构建如下对象树图:

然而,当程序试图遍历所有对象时发现:NULL无法参与计算。因为NULL与任何数据的计算结果都是NULL。程序必须增加额外代码来处理特殊情况。

  1. 代码复用,降低依赖性

迁移脚本应当遵循与编码同样的规则,高内聚,低耦合,能够被重复利用的代码需尽量被封装成单元,重复拷贝并不是迁移脚本应当采用的方法。

解决方法:使用临时存储过程实现某些公用代码的复用,简化调用接口。

  1. 新问题,新测试

当我们遇到新的问题时,常忙于解决问题,给出解释。然而当这一切完成后,并不意味着问题已经全部被解决。因为这些问题仍然可能再次发生,也说明目前测试不足。

解决方法:当新问题出现后,暂停当前的工作,立刻针对这种情况写出测试。为其花费些时间意味着不会让技术问题债台高垒。

例如:在新系统的数据库里,QA发现了一组不符合逻辑的数据:记录的结束时间(EndTimestamp)早于开始时间(startTimeStamp)8个小时。它的实际期望结果是:记录的结束时间必须晚于开始时间。

ID startTimestamp, EndTimestamp, createDate …

-----------------------------------------------------

11020011 2008-12-14 09:23:00 2008-12-14 01:23:00 2008-12-14 09:23:00

 

显然程序在插入数据时用错了时区。在bug被修复之前,立刻加入一个数据库测试以保障今后不会再次出现。

测试代码如下:

DECLARE @CNT INTEGER

  

select @CNT=COUNT(*) from tableA where startTimestamp> EndTimestamp

IF @CNT>0

BEGIN

INSERT INTO LoadTestErrorLog (errorDescription)

VALUES (' EndTimestamp should be late than startTimestamp ')

END

GO

  1. 目标制定者和开发者应该保留的心态

数据迁移是一件看似简单但具有挑战的工作。因此,我们常常过于乐观估计开发效率。然而这里的风险在于我们仅仅看到了处理逻辑,而没有看清楚数据质量,以至于盲目写出的迁移脚本可以在测试环境中工作,但无法在生产环境中运行。

解决方法:无论多么简单的数据迁移,应首先与客户或业务分析师沟通业务逻辑,确保对数据质量的了解。

结论

数据迁移是一项看似简单却蕴含巨大挑战的工作。它不仅包含了具体技术问题,而且要求DBA具有较好的沟通能力,深入的了解业务逻辑。通过旧系统到新系统的数据迁移工作,我们逐渐地将精益软件设计思想深入到细节,并且取得了很好的效果。当数据迁移完成后,我们完成了近6000行的迁移脚本,迁移结果通过了客户方的抽样测试,最终确保了整个系统的正常运行。


clock March 12, 2009 13:26 by author ethan

Team Foundation Server (commonly abbreviated TFS) is a Microsoft offering for source control, data collection, reporting, and project tracking, and is intended for collaborative software development projects. It is available either as stand-alone software, or as the server side back end platform for Visual Studio Team System (VSTS).

Contents

Architecture


Team Foundation Server 3-tier architecture

Team Foundation Server works in a three-tier architecture: the client tier, the application tier and the data tier. The client tier is used for creating and managing projects and accessing the items that are stored and managed for a project. TFS does not include any user interface for this tier, rather it exposes web services which client applications can use to integrate TFS functionality with themselves. These web services are used by applications like Visual Studio Team System to use TFS as data storage back end or dedicated TFS management applications like the included Team Foundation Client. The web services are in the application layer. The application layer also includes a web portal and a document repository facilitated by Windows SharePoint Services. The web portal, called the Team Project Portal, acts as the central point of communication for projects managed by TFS. The document repository is used for both project items and the revisions tracked, as well as for aggregated data and generated reports. The data layer, essentially a SQL Server 2005 Standard Edition installation, provides the persistent data storage services for the document repository. The data tier and application tier can exist on different physical or virtual servers as well, provided they are running Windows Server 2003 or better. The data tier is not exposed to the client tier, only the application tier is.

Most activity in Team Foundation Server revolves around a "work item". Work items are a single unit of work which needs to be completed. In many respects they are similar to a "bug" item in bug tracking systems such as Bugzilla, in that a work item has fields to define Area, Iteration, Assignee, Reported By, a history, file attachments, and any number of other attributes. Work items themselves can be of several different types, such as a Bug, a Task, a Quality of Service Assessment, a Scenario, and so forth. The framework chosen for any given project in a Team Foundation Server defines what types of work items are available and what attributes each type of work item contains. These items are internally stored in XML format, and their schema can be customized to add other attributes to different items, or create new items on a per-project basis. Each work item has associated control policies which control who is allowed to access and/or change the items. It also includes notification and logging capabilities to log all the creation, access or change events (controlled by policies) and optionally notify certain users when certain events occur.

Any given Team Foundation Server contains one or more Team Projects, which consists of Visual Studio solutions, configuration files for Team Build and Team Load Test Agents, and a single SharePoint repository containing the pertinent documents for the project. A team project contains the user defined work items, source branches, and reports that are to be managed by TFS. TFS provides capabilities for managing these projects. When creating a project, a software development framework must be chosen, and cannot be changed afterwards. TFS includes several templates for the most common ones, including agile and formal methodologies. Choosing the framework populates the project with predefined items such as project roles and permissions, as well as other documents like project roadmap, document templates, and report definitions. These items can be then linked to work items as well. The status of certain elements of the project can be set to automatically update as work items are updated. TFS can integrate with Microsoft Excel for the creation and tracking of project items. The status of the items can be created and edited in Excel and the resulting spreadsheet document can be submitted to TFS, which will import the data into its project management feature. It can also integrate with Microsoft Project as the project management front end. The project items can also be exported as Excel documents for further analysis of the data.

TFS does not natively include a UI for performing these tasks. The capabilities are exposed via web services, which are then used by client applications like Visual Studio Team System IDE. However, TFS does include a Team Foundation Client (TFC) application which can be used to perform these tasks outside of the VSTS IDE. TFC also operates by invoking the same web services. TFS exposes a client API that can be used by client applications to access the functionality; the API itself manages proxies to communicate with the web services as well as client side caching to reduce latency. The WSDL descriptions of the web services are also provided, in case an application wants to directly call the web services. Visual Studio Team System Web Access, available as an add-on, also addresses this.

Source control

Team Foundation Server provides a source control repository, called Team Foundation Version Control (TFVC). Unlike Microsoft's previous source control offering, Visual SourceSafe (VSS), which relied on a file-based storage mechanism, Team Foundation source control stores all code, as well as a record of all changes and current check-outs in a SQL Server database. It supports features such as multiple simultaneous check-outs, conflict resolution, shelving and unshelving (shelving is a way to save a set of pending changes without committing them to source control, while still making them available to other users), branching and merging, and the ability to set security levels on any level of a source tree, alongside the most visible features of document versioning, locking, rollback, and atomic commits. The source control mechanism integrates with Team System's work items as well; when a check-in (termed "changeset") occurs, a developer can choose to have his code associated with one or more specific work items, to indicate that the check-in works towards solving specific issues. TFS administrators can enforce check-in policies that require Code Analysis requirements to have passed, as well as to enforce the association of check-ins with work items, or update the state of associated work items (like flagging a bug as "fixed" when checking in code that has the bug fixed). Individual versions of files can be assigned labels, and all files with the same label forms a release group. Unlike VSS, TFS source control repository does not support linking to an item from multiple places in the source folder structure, nor does it allow an item to be "pinned" (allow different references to the same file from different directories to point to different versions in a way that cannot be further edited).

TFVC supports branching at entire source code level as well as individual files and directory levels as well, with each branch being maintained individually. Multiple branches can be merged together, with the built in conflict resolution algorithm merging the changes between two branches of the same file where it can automatically reconcile the differences or flagging them for manual inspection if it cannot. Merge can be performed at "changeset" level as well, instead of the branch level. A successful merge is automatically checked out in the source control repository.

TFVC is not limited to source code only, but using the Windows SharePoint Services infrastructure it is built on, it provides a version-controlled library for other documents in the project as well, including project plans, requirements and feature analysis documents among others. All documents in the source controlled repository can be linked with any work item, and access to them can be controlled by defining access policies.

Reporting

Reporting is another major component of Team Foundation Server. Using the combined data for work items, changesets, and information provided by Team Build and results from Test Agents, a variety of reports can be created. For example, the rate of code change over time, lists of bugs that don't have test cases, regressions on previously passing tests, and so on. The reports are built using SQL Server Reporting Services, and can be exported in several different formats, including Excel, XML, PDF, and TIFF. Reports can be accessed both through Visual Studio, as well as through the web portal.

TFS uses its logging framework for automated data collection as well. The logging infrastructure monitors and logs information regarding access and use of the work items and source code, which can then be used by the analysis services to find trends. TFS includes a warehouse adapter in the data tier, which caches data from the underlying normalized database in a form suitable for analytics - in fact tables and dimension tables. SQL Server Analysis Services are then used to analyze this data, and reports created. Reports can span multiple work items including bug trends, code churning, build trends amongst others. Other analysis applications can also use the data directly pulled off the web services.

Project portal

On a per-project basis, TFS also creates a SharePoint site for the project, which can be used to track the progress of the project as well as to explore the work items and source controlled documents in the project, which are presented via the document library. It can also be used to view the reports generated. As a communication medium, the users associated with each other can use it to communicate amongst each other. The comments can be linked to various items as well. For each project, depending on the project properties, TFS uses a predefined template that defines the appearance of the site. These templates can be customized by the TFS administrators.

Shared services

TFS provides a handful of services that can be used for integration with other applications like IDEs and Project Management Systems. The linking service allows loosely coupled relationships to be created between items, for example a bug item and the source code revision(s) it applies to. The security services allows creation of security groups from users, to which access rights are then assigned. The classification service allows definition of policies to automatically classify items based on a multitude of criteria and the eventing service allows any component to raise an event and a notification action assigned to the event. The notification can be either using feed syndication or e-mail, or invoking other web service.

Team build

Team Build is a build server included with Team Foundation Server that can be installed on almost any machine that can support Visual Studio. Machines configured with Team Build can be used by developers to do a complete build of the most recent versions of the software contained in source control. Records of every build, whether it succeeds or fails, are kept so that developers and build administrators can keep track of the progress of the project. If a build succeeds, it analyses what changes have been made to in source control since the last successful build, and updates any work items to indicate that progress has been made. For example, if a tester filed a bug work-item against build #15, and a developer checked in a change just prior to build #18 being created, then the bug work-item would be updated to state that the bug has been fixed. A tester can then confirm or deny that the bug has been resolved.

Currently there are two versions of TeamBuild, each version matched to a TFS installation version. It is also highly customizable.

TFSBuild.proj is the file which drives a TeamBuild. The Team Build Language is synonymous with the msbuild language.

References

·         Team Foundation Server: At Work

·         Visual Studio 2005 Team System: Enterprise-Class Source Control

·         Using Source Code Control in Team Foundation

·         Team Foundation Server Fundamentals: A Look at the Capabilities and Architecture

·         Visual Studio Team System 2008 Web Access

See also

External links


clock March 10, 2009 18:11 by author Sky Jia (贾超)

SELECT

      TABLE_NAME,

      COLUMN_NAME,

      IDENT_SEED(TABLE_NAME) AS SEED,

      IDENT_INCR(TABLE_NAME) AS INCR,

      IDENT_CURRENT(TABLE_NAME) AS [CURRENT MAX]

FROM INFORMATION_SCHEMA.columns

WHERE COLUMNPROPERTY(OBJECT_ID(TABLE_NAME),COLUMN_NAME,'IsIdentity')=1


clock February 27, 2009 17:54 by author arthur

Step 1: Backup the current database transaction log

BACKUP LOG [@DBName] TO DISK='[Target physical path]' WITH FORMAT, NO_TRUNCATE

Step 2: Restore the database using a database backup file. (The create date of the backup file must ahead of the date when you want to recover to)

RESTORE DATABASE [@DBName] FROM DISK='[Complete database backup file physical path]' WITH NORECOVERY

Step 3: Restore to the log point

RESTORE LOG [@DBName] FROM DISK='[Log backup file physical path]' WITH RECOVERY , STOPAT='02/01/2009 00:00'


Search

Calendar

<<  September 2010  >>
SuMoTuWeThFrSa
2930311234
567891011
12131415161718
19202122232425
262728293012
3456789

Categories

Tags