Strings are
immutable.
This is the third part
of the series of articles dedicated to the .NET strings. The first two parts
can be found here:
-
Value Type vs
Reference Type
-
String
interning process
In the Object Oriented programming world an
immutable object is an object which cannot be modified once it is created. This
behaviour of the strings is what made the interning
process possible. Having the strings immutable then a copy of the
reference can be created instead of copying the entire object. Therefore
multiple objects can point to the same string literal. But immutability does
not mean that the memory where the object data (string litetal) is stored is
read-only. What it really means is that behind the scene the .NET framework
makes sure that you cannot change the value of the string literal (or at least
not when working with managed/safe code). Let's see what happens in the
following code:
Line 1: String s1 = String.Intern("ABC");
Line
2: String s2 = String.Intern("ABC");
Line
3: s2 = s2.ToLower();
Line 1 adds the literal "ABC" to the intern pool and returns
the reference to the object s1. Line 2 tries to add the literal "ABC"
to the intern pool, but in this case, and aligned with the .NET documentation
the "ABC" is not added since it already exists. In turn the same
reference is returned to the object s2. Until now both of the objects point to
the same string literal by pointing to the same reference. The very interesting
part comes in Line 3. Here the method 'toLower()' does the following: creates a
new string literal and populates it with the value "abc". The reference to the string
literal is then returned and now the object s2 points to a new memory location.
Note that by no means the memory location which holds the literal
"ABC" was overwritten with the value "abc" in this case. Therefore we are
in the situation that s1 still points to "ABC" and now s2 points to "abc". This assumption is all good
and valid when we are in the context of managed/safe code. If we deal with
unmanaged code then we need to be very carefully when we do operations with strings. As I mentioned above the
memory location where the string literal is stored is not read-only and
therefore it can be overwritten if we write code that does that. And with the
unmanaged code this can be achieved. Let’s see what happens in the below
example:
static void Main(string[] args)
{
String s1 = String.Intern("String cannot be
changed");
String s2 = String.Intern("String cannot be
changed");
int bufferLength = s1.Length;
GetUserName(s1, ref bufferLength);
Console.WriteLine("The second string: {0}",s2);
}
[DllImport("Advapi32", CharSet = CharSet.Unicode)]
static extern bool GetUserName(
[MarshalAs(UnmanagedType.LPWStr)] string userName, ref int bufferLength);
Running the above code on my computer the
following message was displayed in the console (Marius is the my NT username): "The second string:
Marius cannot be changed".
So, we declare s1 and s2 and we make sure that
they point to the same literal by using the String.Intern(String s) method.
Next an unmanaged/unsafe piece of code is called: GetUserName from the "Advapi32.dll" (you can follow the link for the MSDN
description of the method). What happens during the method call is the
interesting part: the method is passed one of the strings declared and since the
unmanaged code does not follow the rules of the managed code regarding the
immutability of the strings it writes the actual response
at the memory location that s1 points to. But in the managed world the s2
object also points to the same memory location and therefore the content of the
string literal is actually changed.
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5