Friday, July 20, 2007

Updating Large Value Data Types in SQL Server 2005

SQL Server 2005 added some new large value data types that are extremely useful for storing large chunks of text.

Such is the case with any large forum web site, such as Lottery Post.  Each forum post — whether humongous or a couple of words — occupies just one row [record] in the database.  In fact, the entire content of a forum post is contained within just one cell [field].

The Lottery Post database was formerly stored using SQL Server 2000, before it was upgraded to SQL Server 2005 in November 2005.  At the time, the only way to store the forum posts was using a text data type, which was fairly inefficient.

A text data type stores all of its data — for the one column — in an external text file, and each row in the data table contains a pointer into the text file, to where the data for each row is stored.  Using that model, the performance of data updates can be handicapped by all the file I/O necessary to maintain the linkages and separate data files.

SQL Server 2005 introduced new data types for storing huge blocks of data: varchar(MAX), nvarchar(MAX), and varbinary(MAX).  Like the regular versions of these data types (using a number, 8,000 or less, to represent the maximum size instead of MAX), the new large value data types store the data in the data file itself, not an external file.  Thus, the new data type are far more efficient.

It turns out there are additional benefits to these new large value data types as well.  By accident, and because of my curiosity, I discovered that there have been modifications to the SQL UPDATE statement specifically for the new data types.

The new syntax allows the developer to modify a large value in the same way that the STUFF() function works.  That is, it can insert a character string directly into the current value, and optionally replace the number of characters you specify.  It's called a partial update.

This is a tremendous leap in efficiency, as the data in the cell does not need to be deleted and replaced, as would happen in a normal update.  At some point, every database operation boils down to file I/O, and this is about as minimal a file I/O operation as you can get for inserting data.

The main difference between the partial update and the STUFF() function is the the partial update cannot create

Partial update syntax (using common options)

UPDATE table_name
    SET column_name .WRITE ( expression , @Offset , @Length )
    [ FROM{ <table_source> } [ ,...n ] ]
    [ WHERE { <search_condition> ]
[ ; ]

  • column_name is the original value into which expression will be inserted.
  • expression is the value to insert.
  • @Offset is the starting point in column_name at which expression is written.
  • @Length is the number of characters deleted from column_name, starting from @Offset, before expression is inserted.

Notes

  • column_name cannot be NULL and cannot be qualified with a table name or table alias.
  • @Offset is a zero-based ordinal position, is bigint, and cannot be a negative number.
  • If @Offset is NULL, the update operation appends expression at the end of the existing column_name value and @Length is ignored.
  • @Offset cannot be greater than the length of column_name, or else an error is returned.
  • If @Offset plus @Length is longer than the value in column_name, column_name is truncated starting at @Offset.
  • @Length is bigint and cannot be a negative number.
  • If @Length is NULL, the update operation removes all data from @Offset to the end of the column_name value.
  • If expression is set to NULL, @Length is ignored, and the value in column_name is truncated at the specified @Offset.

Examples

For each example we will be updating a table called MyTable, which has a column named MyCol.  We will look at the effect of updating a simple character string.

Starting value of MyCol = 'Hello there.'

Example 1 — Inserting a value with no deletion.

UPDATE MyTable SET MyCol .WRITE(', my friend', 11, 0);

Result = 'Hello there, my friend.'

Example 2 — Truncate all existing data from the @Offset position. (@Length = NULL)

UPDATE MyTable SET MyCol .WRITE('Sam!', 6, NULL);

Result = 'Hello Sam!'

Example 3 — Append value to the end of the column. (@Offset = NULL)

UPDATE MyTable SET MyCol .WRITE(' How are you?', NULL, 0);

Result = 'Hello there. How are you?'

Example 4 — Remove characters from the end of the column. (expression = NULL)

UPDATE MyTable SET MyCol .WRITE( NULL, 5, 0);

Result = 'Hello'

Example 5 — Remove some characters and replace them with expression.

UPDATE MyTable SET MyCol .WRITE('Joe', 6, 5);

Result = 'Hello Joe.'

Final thoughts

In addition to the previous benefits mentioned, the .WRITE clause of the UPDATE statement also has logging efficiencies.  Partial updates to large value data types using the .WRITE clause are minimally logged.

The .WRITE clause cannot be used with the text data type.  In fact, if you are currently using a text data type in your database (or ntext or image) it would be a good idea to convert them to the new large value data types, because Microsoft has stated that they will stop supporting them in some future version of SQL Server.

Now that you have this new technique in your bag of tricks you can look in your SQL Server database for places to use it.  The performance gained should be worth your while.

0 Comments:

Post a Comment

<< Home