Page 1 of 1

Pointers in C#

#1 rusoaica  Icon User is offline

  • They're watching you, Neo!
  • member icon

Reputation: 209
  • View blog
  • Posts: 672
  • Joined: 10-March 12

Posted 14 October 2014 - 11:18 AM

Pointers in C#


A few decades ago, when the king programming language was C, pointers were the saviors of the day for any programmer with high expectations from their code. Today, in the world of managed code and modern languages such as C#, pointers are still holding their ground, although they are rarely used and they are usually associated with "advanced programming".

Why is this important to learn about?

The official recommendation is that one should never thinker with such low level programming concepts as raw pointers, unless knowing well what their functionality is. They are present, but not encouraged, and most of the times there are more secure workarounds than dancing with the hardware directly. However, there are still some certain situations when using pointers is the only solution, or when using them would drastically improve performance. Finally, you should learn about them just for the sake of knowledge, even if hopefully you will never make use of them.

Definitions of terms used.
Reference - A value that enables a program to indirectly access a particular datum, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the datum, and accessing the datum is called dereferencing the reference.
Address - Essentially, an address is the computer hardware synonym for a real-world address on a map. Computers allocate addresses in incremental ranges, from some value to some other value. Each range, or address, is actually a memory location in the hardware, that can store data with a known amount of bits.
Pointer - A variable that can be used to store the address of something.

Pointers and Unsafe Code in C#


Note: In order for the examples provided here to work, you will need to change an option inside Visual Studio:
  • Right click on the project name and click Properties.
  • Click Build on the options in the left.
  • Check the 'Allow unsafe code' option.


Back in the day when computers had little memory, the mapping of it was made manually by programmers. Once with the advance of technology and the introduction of hardware memory mapping, this has become increasingly more complicated. Today, with the use of address translation hardware, the addresses at which some data is stored can change even without the data being transferred to a new location. On modern computers, addresses change because of the operating system or garbage collectors move them to organize memory more efficient, so, while on older computers addresses were a fixed location, things have changed today.
The pointers were the first approach to the concept of abstracting an address. In simple terms, a pointer is a variable that stores the address of something. This means that we can use a pointer to get the data stored at the address pointed at. This process is called dereferencing. Pointers can also be involved in arithmetic operations that move their addresses pointing at memory. For instance, if you have two integers stored one after another in memory, you can apply arithmetic to the pointer pointing at the first integer so that it points to the second integer. This is the beauty and the power of pointers, but also, this is their most dangerous trait. One mistake in calculating the address of a pointer and it will point to some memory location where it shouldn't, leading to abnormal program errors.
Somehow similar to pointers, references are the ultimate address abstraction concept, and they are a reference to a collection of data or an object. The main difference between a pointer and a reference is that, unlike pointers, references cannot be manipulated directly. What does this mean? It means that, while we can apply arithmetic over pointers, we cannot do the same with references. So, what CAN we do with a reference? We can dereference it and access the data that it references or pass it to another feature, to use it. A reference is just a wrapper for an address.
We all know what a reference is, we use it all the time. If we have a class that we want to use in multiple places, we don't copy it over and over again. We use it as a blueprint from which we create copies (references). In this case, the reference variable is not an actual object, but a simple reference to an actual object of the same type:

MyObject MyReferenceObject = new MyObject();



In the above example, MyReferenceObject is an actual sintance of MyObject. So, if we modify something to MyReferenceObject, that change will not affect MyObject itself. To affect it, we would have had to write it like this:

MyObject MyReferenceObject = MyObject();



In the above example, if we make changes to MyReferenceObject, those changes will also be reflected in MyObject. Conclusion: when the assignment produces a copy of the object, we have reference. If it doesn't, we have a value semantic.
Because pointers belong to those kind of codes that can be dangerous if they are not used correctly, aside of selecting the "Allow Unsafe Code" option as described at the beginning of this tutorial, they also have to be enclosed inside a usafe{} block, just to make it even more clearly. In addition, we cannot create a pointer to anything, as one could do in C++. The available options for creating a pointer include all the basic value types like int, double, char, float, etc, to a structure (must NOT contain managed types) or to another pointer. This means that we cannot declare a pointer to a reference, an object, a delegate, etc.

The most easy way of declaring a pointer is:

type* variable;



just like in C++. The * symbol is the dereferencing operator. Its counterpart is the "address of" operator, &. Operator & returns the address of a variable. In practice, we have:

unsafe
{
    int* MyPointer;
    int MyInteger;
    MyPointer = &MyInteger;
}



Aftermath of the above code: we have created a pointer, MyPointer, of type int. After that, we have declared a regular integer. In the last line, we have stored the address of MyInteger inside our pointer, MyPointer.
As a side note, pointers DO NOT inherit anything from an object, so we can't use, for instance, ToString() method to get the value of a pointer. When we want to read the value of a pointer, we must first cast to an appropriate type, in our case, an int:

Console.WriteLine(((int)MyPointer).ToString());



If instead, you want to display the value to which the pointer points, you can use:

Console.WriteLine((*MyPointer).ToString());



The above line will display the value of MyInteger, because MyPointer points to MyInteger.
We can always use indirection after casting a void pointer, but this would produce unexpected results. For instance:

void* MyPointer = &MyInteger;
Console.WriteLine((*(double*)MyPointer).ToString());



What we did was to assign a pointer to a 32 bit integer, then cast it to a double pointer (*double) and use indirection to display the value to which the pointer points. To your surprise, you will get unexpected results when running this code, because the original int has 4 bites and the double uses 8. In other words, the extra 4 bites come from a "neighbor" memory location which couldn't be normally accessed. This example proves what happens when we read unintended memory blocks, but just imagine what would happen if we would try to write those locations. Let's try:

int MyInteger2 = 0;
int MyInteger= 1234;
void* MyPointer = &MyInteger;
*(double*)MyPointer = 123456.789;
Console.WriteLine(MyInteger2.ToString());



After running this code snippet, you will actually acknowledge that MyInteger2 has its value changed, although we never changed its value inside our code. This happens because we assign an 8 bits value to MyInteger, which can keep only 4. This means that the extra 4 bites overwrite the value of MyInteger2. This example is harmless, but in some cases, this could result in program crashes or even more unexpected dangerous behavior. This is one of the main reasons why the code must be put inside a unsafe{} block.

Indirection can also be multiple. We can indirect twice, thrice, etc. Here is an example:

int** MyPointerPointer;
int* MyPointer;
int MyInteger = 1234;
MyPointer = &MyInteger;
MyPointerPointer = &MyPointer;
MessageBox.Show((**MyPointerPointer).ToString());



So, we have declared a pointer that points to another pointer, that points to an integer. That is why we have to use two "*", because we used two indirections.
As we previously said, we can't have pointers to any object type, or in some cases, not the way we could think of. For instance, in cases that imply an array:

int[] MyArray = new int[10];
for (int i = 0; i < 10; i++)
    {
    MyArray[i] = i;
    }
int* MyPointerArray = &MyArray[0];
Console.WriteLine((*MyPointerArray).ToString());



You might expect that this would produce a pointer to the first element of the array. Well, you would be wrong, and the compiler would complain that it cannot take the address of an unfixed expression. You might say: "but, wait, isn't the first element of the array a simple integer? I thought we could apply pointers to simple integers!". That is true; nonetheless, the integer is being kept inside an integer array, meaning it is a managed object that can change address at any given time. As we said, we CAN do it, but not the way one might think. The correct way would be:

fixed (int* MyPointerArray = &MyArray[0])
{
    Console.WriteLine((*MyPointerArray).ToString());
}



OR

fixed (int* MyPointerArray = MyArray)
{
    Console.WriteLine((*MyPointerArray).ToString());
}



Notice the usage of the reserved word "fixed". This means we are using a fix array. One thing to notice is the fact that the pointer declared inside the fixed block behaves as any other variable, and so does its scope.
A simple method of accessing other members of the array is the usage of pointer arithmetic:

fixed (int* MyPointerArray = MyArray)
{
   Console.WriteLine((*MyPointerArray + 3).ToString());
}



One restriction applies when it comes to fixed pointers: they cannot be modified inside the fixed block. The workaround this is to create a copy of it:

fixed (int* MyPointerArray = MyArray)
{
    int* TemporaryPointer = MyPointerArray;
    Console.WriteLine((*++TemporaryPointer).ToString());
}



where we are displaying the contents of MyArray[1]. Surely, we can try this with multidimensional arrays too:

int[,] MyArray = new int[10,10];

fixed (int* MyPointerArray = &MyArray[0,0])
{
    for (int i = 0; i < 100; i++)
    {
    	*(MyPointerArray + i) = i;
    }
}



In this example, we are initializing a two dimensional integer array, by simply accessing it as a linear block of memory.
In contrast with arrays, structures don't need any special care when it comes to pointers, because they are value types and they are allocated on the stack. As a result, we can simply use

public struct MyStructType
{
    public int a;
    public int b;
};
MyStructType MyStruct = new MyStructType();
MyStructType* MyStructPointer = &MyStruct;



Accessing a struct field is pretty simple:

(*MyStructPointer).a = 1;



OR

MyStructPointer->b = 2;
Console.WriteLine(MyStructPointer->a.ToString());



What about strings? Can we have pointers to them? After all, they are also managed objects. Same as for the arrays, we can fix the string and afterwards initialize a char pointer to the first character of the string:

string MyString = "Hello world";
fixed (char* MyStringPointer = MyString)
{
    Console.WriteLine((*(MyStringPointer + 3)).ToString());
}



This fixes the array, creates a pointer to its first char and then we use pointer arithmetic to access the third char inside the string.
In addition to handling value types, we can also use the stack to create our own primitive types. What we need is to allocate enough stack to be able to store n copies of the data type. For this, we have the statement stackalloc type[n], which does just that and returns a pointer to the start of the allocation. Two things that are different this time are: the stack is never affected my memory allocation or garbage collectors, so we don't need to fix the storage as we did for the arrays; we don't need to de-allocate the memory, because when the variables go out of scope (usually when the method that declared them returns), the stack is automatically cleaned up. This is how we implement the above lines:

int* MyPointerArray = stackalloc int[100];
MyPointerArray[30] = 99;
Console.WriteLine(MyPointerArray[30].ToString());



What we did was to create 100 integers, of the fixed size of 400 bits, we stored the value 99 inside the 30th one using the pointers, and finally, display the value contained by the 30th variable.

In Conclusion

While for many C/C++ programmers who switch to C#, pointers might feel handy and familiar, in reality, using pointers in C# is, most of the times, a bad idea. There are ways of doing things without them almost every time, that are less complicated and dangerous. Anyway, as i stated in the beginning of this tutorial, there might be some specific situations when pointers might be more reliable, faster, needed in some API calls within P/Invoke etc, or, using them might be a quick fix when translating a program that relies on pointers from C++ to C#.

Is This A Good Question/Topic? 0
  • +

Page 1 of 1