Friday, April 4, 2008

Smart String Concatenation

Whether on a individual computer or a server attached to the Internet someplace, people are always looking for better performance of their software.

The first thought that comes to mind is often adding memory or a faster processor, or choosing a faster operating system or web browser.

But many times (perhaps most times) the real culprit for slow performance is the programmer who wrote the software.

As a programmer, my philosophy is that I personally take responsibility for the performance of my programs, rather than require faster hardware or environment.  I have found that by forcing myself to think about the efficiency of every piece of code I write, the combined performance savings across an entire application is immense.

Think about it: if you save just a quarter of a second of time serving one page view, you have actually saved about 70 hours of processor time after a million page views.  And for a site like Lottery Post that gets many millions of page views a month... well, you can do the math.

With that concept in mind, I'd like to offer some tips about string concatenation that will make your programs more efficient.

String concatenation — the processing of combining strings together — is something that is still slow on many popular platforms.  For example, on all versions of Internet Explorer (up until IE8 is released), combining strings is very inefficient.  Even in the .NET framework, there are some ways of programming that will result in poor performance.

String Concatenation in .NET

In the .NET framework, whether you're programming in C# or VB, the best way to combine strings is using the String.Concat() method.  However, just using String.Concat() is not enough, and here's where the extra efficiency comes into play:

Make sure that all the arguments passed to String.Concat() are strong-typed Strings.  Otherwise, an overloaded version of the String.Concat() will be used that accepts all Object arguments, and each value you pass will be boxed to an Object (when passed) and un-boxed from an Object (when concatenated by the method).

When you test the program, it works just fine either way, but if you are using the Object argument version of the String.Concat() method, you are silently leaking performance.

For example, if you want to create a string like "There are 27 pages", the second version shown below is more efficient:

myString = String.Concat("There are ", intPages, " pages")

myString = String.Concat("There are ", intPages.ToString(), " pages")

(Note:  I did not use CStr(intPages), I used intPages.ToString().  That's because the CStr() VB function accepts an Object type, which will require boxing/unboxing, whereas I believe an Integer can execute its ToString() method quicker.)

In the first example, all three arguments are treated as Object types, but in the second example, all three arguments are String types, so the compiler chooses the quicker all-String-arguments version of the String.Concat() method.

Why not use VB's string concatenation operator ("&")?  Because when you compile your application, the compiler breaks everything down into String.Concat() operations anyway, and you can do a much better job of that than the compiler could.  If you ever took a look at some of the code that gets generated for combining strings, you would be amazed at what is actually taking place under the covers.

And by all means, do not using String.Format() for your string concatenation, unless you are actually using the formatting capabilities of the method (i.e., transforming data into another format).  Even though it can produce slightly more readable code, String.Format() has lots of overhead that makes it a poor choice.

Programming efficiency

Sometimes if you think about the way the computer is executing the code, rather than how your mind is assembling the string, you can come up with some nice efficiencies.

For example, let's say you're creating the name of an image file based on the value of a couple different variables.

With the variable names shown in <brackets>, the file name will be "icon_<isCircle>_<size>_.<isGif>", where <isCircle> is a boolean (True for "circle", False or "square"), <size> is any integer (such as 16, for a 16 x 16 image), and <isGif> is a boolean (True for "gif", False for "jpg").

Here is how someone would normally code this (using the tips above for String.Concat):

myImage = String.Concat("icon_", If(isCircle, "circle", "square"), "_", size.ToString(), "_.", If(isGif, "gif", "jpg"))

or, in C#, the same thing would be:

myImage = String.Concat("icon_", isCircle? "circle" : "square", "_", size.ToString(), "_.", isGif? "gif" : "jpg");

It will indeed work fine, and for top efficiency it uses the all-String version of the String.Concat() method, but why should you force the computer to combine 6 different strings together, when you can do the same thing by combining only 3 strings together?

The following is functionally equivalent to the first solution, but takes half the work to do:

myImage = String.Concat(If(isCircle, "icon_circle_", "icon_square_"), size.ToString(), If(isGif, "_.gif", "_.jpg"))

or, in C#:

myImage = String.Concat(isCircle? "icon_circle_" : "icon_square_", size.ToString(), isGif? "_.gif" : "_.jpg");

If you were to go through your program code, how many times would you see something similar to the first solution, as opposed to the second?  Probably a lot.

JavaScript coding

The same exact approach applies to JavaScript — especially to JavaScript.

As I mentioned earlier, JavaScript is very slow performing string concatenation on IE web browsers, and IE makes up the majority of web browsers in use today.  That's a lot of potential for slow code.

Therefore, when writing JavaScript programs be careful anywhere you combine strings, especially when it's done inside a loop.

The reason behind the slow performance is that each time strings are concatenated, a new copy of  the string is created in memory, which requires allocating new memory, copying the contents of the old strings, and then releasing the memory from the old strings.

When doing a lot of string concatenation in JavaScript, it is often much better to create an array of string values, and then use an Array.join() method to combine them.  Each time a new element is added to an array, you're allocating memory for the new element, but you're not copying the old string value, and you're not releasing memory for the old string.

Here is an example comparing regular string concatenation with array-based concatenation.  The example is to create a string containing a comma-separated list of numbers 1 through 100.  The second method is much faster than the first.

Method 1: regular string concatenation

var str = "";

for (var i=1; i<=100; i++) {
    str += (i + ",");
}

Method 2: combine array elements

var a = [];

for (var i=1; i<=100; i++) {
    a[i-1] = i;
}

var str = a.join(",");

Lots of other efficiencies

There are lots and lots of other things that can be done to increase performance of strings and string concatenation.  This only scratches the surface.

Hopefully what this does is to show the types of things that you can think about when you're looking to increase the efficiency and performance of your programs.

I wouldn't go crazy doing things that make your code impossible to read, but at the same time don't be afraid to do things that save only a tiny bit of time.  As you implement lots and lots of small tweaks, eventually they will combine into a much bigger overall savings.

1 Comments:

At 2:56 PM, JADELottery said...

interesting, i have to make a note of this.

Post a Comment

<< Home