Demystifying C#'s Parameter Modifiers and Value and Reference Types

A quick and rough simplification

ยท

13 min read

When learning C# you will find out about the different ways that variables can be passed as arguments to function parameters, such as by value or by (some kind of) reference. These are very confusing things at the beginning for people trying to learn C#. In this post, I will try to simply the concepts and summarize the otherwise very lengthy official documentation.

First of all, you need to always remember this: in C# all arguments are passed by value by default! What that means is that values are copied by default, on both variable assignments and function calls.

So, when you are not using parameter modifiers (like ref), everything is passed by value! The difference in behavior arises from the kind of object you are passing: a value object or a reference object, and what the variables you pass contain, value objects or reference objects. So it matters that you know what kind of objects you are working with (classes, structs, records, simple types โ€” like int, double, etc.)!

Value objects are accessed directly, sort of like they exist right there in the variable, or on the current stack. Think of them like simple types: passing them by value, which is the default, will always copy the actual data they contain.

Reference objects on the other hand are accessed via a reference, so the variable that is assigned a reference object (or a reference value) contains only a reference to the real data. Accessing that data is done transparently so the programmer doesn't have to use a different syntax for this case.

Beware that different variables can be made to hold the same value by assigning a variable to another variable. For reference types this means two variables referring to the same data:

var obj1 = new MyReferenceType("1");
var obj2 = obj1;

Now obj2 contains a copy of obj1, which holds a reference, thus the reference gets copied and the copy gets assigned to obj2, not the actual data that obj1 points to!

In other words, obj2 and obj1 both contain two distinct references pointing to the same object in memory.

Note that reassigning obj2 does not modify obj1, just the reference held inside that particular obj2 variable:

obj2 = new MyReferenceType("2");

obj1 still holds the original MyReferenceType("1") instance after the above operation.

For value types the default behavior of passing by value is different in that the actual data gets copied.

var date1 = DateTime.Now;
var date2 = date1;

DateTime is a value type object (a struct) so date2 now contains a copy of the actual date1 object, not a copied reference that refers to the same object, like it happened with reference types above.

Now, if we reassign to date2, just like before, date1 remains untouched, because we're only altering what the variable holds, nothing else:

date2 = DateTime.Now;
// date1 remains untouched after this instruction.

But now, say that DateTime, which is a struct, did not have an immutable public interface, and calling AddDays did not return a new DateTime instance with a number of days added, but instead modified the state of the current instance that it's being called on:

date2.AddDays(1);

Now, date2 would be one day ahead of date1 and date1 would be untouched, because date2 was not assigned a reference to the same data that date1 points to, but an actual copy of the data that date1 contained.

Why is that? Because DateTime is a value type (a struct). If DateTime was a reference type, adding days to date2 would have also added days to date1 because both variables would have pointed to the same data in memory.

These deductions are simple to make, once you understand what types of objects you're working with and what the default behavior of C# of passing by value means.

Parameter modifiers

The official Microsoft docs say that in C# arguments can be passed to parameters either by value or by reference, by value being the default.

Passing by reference enables function members, methods, properties, indexers, operators, and constructors to change the value of the parameters and have that change persist in the calling environment.

Functional programming advocates might not agree with the above statement, because it enables functions to change variables that are defined outside of their scope, which makes code prone to runtime bugs that are hard to diagnose.

The scope of this article is not to advocate for or against the different programming styles, but just to explain these features of the C# language and to explain when or where they might be useful.

Pass by value

The default behavior of the language, when no parameter modifiers are used, is called pass by value, like I have mentioned before.

What this means is that whatever value the variable that is passed-in holds, it will be copied from the passed-in variable to the function argument, effectively creating different instances of the data but with the same exact value (at the start, not necessarily by the end of the function block). These instances are in no other way related to each other. Changing (ie. reassigning) one does not change the other for example. They are independent.

This can sometimes be confusing for new C# programmers, because they usually think that only simple types, like int and double, are passed by value, ie. the types inheriting from ValueType, because they are cheap to copy all the time.

In fact, reference types can also be passed by value. This is what normally happens when you don't use parameter modifiers and pass a reference type to an argument. The difference between value types and reference types, is that with reference types it's the reference that gets copied from one variable to the other, not the object the variable refers to (like it happens with primitive types). This is because there's no telling how large the object graph might be and copying such objects might be very expensive to do (but if you're crazy enough, you can still implement it).

So when the copy goes through, you will have two different variables holding the same reference value, but these references, while being equal at the start, are not the same, because the variables are not the same. The variables are still independent.

Passing by value copies values, but what the value actually is underneath doesn't matter very much. C# tries to abstract away memory pointers, which are more prevalent in programming languages like C and C++. This helps you always think of variables as their values, and not as either values or references (pointers).

C# does this by automatically dereferencing pointers when you use them, without requiring any extra syntax, like you have with C++ (ie. the * prefix, the -> arrow operator, etc.).

More precisely, when a reference type object gets passed-in by value (ie. no parameter modifier is used), the outside variable's reference value gets copied to the function argument. The two variables, the outside variable and the function argument, are two distinct variables they now hold the same value: the same reference, to the same object in memory.

So if you have something like this:

public static void PassByValue(int i, MyObject o)
{
    // `i` contains an actual clone of the passed in variable because it is
    //     a `ValueType` and value types in C# are cloned on assignment,
    //     so `i` and whatever was passed in for it, will point to different
    //     values (objects).

    // `o` contains a copy of the reference passed in when this function
    //     is called, so doing anything with this reference, like calling
    //     methods on the  underlying object, can modify `myObj` below, which
    //     is outside of the function body.
}

MyObject myObj1 = new MyObject();

PassByValue(1, myObj1);

// `myObj1` might have been modified here, if the above function calls
// a mutating function on `myObj1` inside its body.

It might seem like calling the function PassByValue(1, myObj1) should clone myObj1 into the function body as the object o, but that's not what happens, because myObj is not a value type, but a reference type, so what actually gets copied is the reference itself, because that is the value that is stored intrinsically in the variable.

So, while the two references (myObj1 and o) are two distinct variables that point to the same object in memory, nothing is preventing you from changing the value of either variable so that they no longer point to the same object (or act on the same object), but reassigning something else to the parameter variable inside the function, doesn't also reassign to the variable outside the function's scope.

public static void PassByValue(int i, MyObject o)
{
    // we don't do anything to modify `o` up until here,
    // for example we don't call any members that modify
    // `o`'s state, because that would also modify the state
    // of the object passed as argument.

    // now this assignment does not affect the value of `myObj1`
    // below, it just makes `o` here point to a different object.
    o = new MyObject();

    // starting from here, we can now do anything with `o`,
    // without affecting `myObj1`!
}

MyObject myObj1 = new MyObject();

PassByValue(1, myObj1);

// myObj1 is still the same here like it was when it was created above.

To summarize, pass by value is the default behavior and it copies what the variable that's passed in holds:

  1. If it's a value type object, it clones the value of the argument so that the variable in the caller's scope and the variable in the function's scope point to two different objects (but with the same value, initially).
  2. If it's a reference type, it copies the reference only, so that that reference in the caller's scope (the one passed as argument) and the reference in the function's scope both point to the same same object, but while the references held by the two variables are practically one and the same, the two variables are not themselves one and the same, so reassigning one does not also reassign the other.

Pass by reference

Passing a variable by reference is very simply what it just sounds like: passing a variable, to a function, by creating a reference to the variable itself.

As you might have deduced, no matter what semantics the passed in object (via its variable) has by default (value object type or reference type), C# will always take a reference to it and pass it as argument to the function.

To elaborate:

If the variable passed in is a value type, then it will no longer be passed by value and cloned before entering the body of the function, but will actually be passed by reference like any other reference type object. C# will take a reference to the value type object (to its variable) and make the variable inside the function's scope point to the same value in memory as the variable from the calling scope.

So this is almost the same behavior like the one described above for pass by value for reference types, thus anything you do with that reference will affect the outside variable too, but this time including reassignment!

Also, because of the value semantics of value types, = (re)assigns (new) values to whatever a variable points to.

  1. If the variable holds a value type, the value type is exchanged.
  2. If the variable holds a reference to a reference type, the reference is exchanged.
  3. If the variable holds a reference to another variable (due to the ref modified) thanks to C#'s automatic dereferencing, the value of the variable that this variable points to is actually being exchanged (๐Ÿ˜ตโ€๐Ÿ’ซ I know), while this variable continues to hold a reference to the other variable and its new value.

So between two variables that hold references to the same value types, if you reassign one of those variables, C# replaces the value held at the address that that variable points to. In a way, value types are also references, but they just have different semantics (and different treatment at runtime).

If the variable passed in is a reference type, then a reference to that variable will be created (a reference to a reference type), so if you reassign to that reference for example, the outside variable will reflect this change.

Note that the above two cases boil down to the same behavior: a reference to the outside variable is created and we work with that reference in the function's scope. Altering it, alters the outside variable too. Calling methods on it that might modify it, also modifies the object pointed to by the outside variable.

To conclude, value types can almost be confused with the variables that hold them, while reference types sort of sit separately from the variables that hold them and C# automatically dereferences them on access, and when using the ref modifier, new references to the arguments will be created (variables pointing to other variables, sort of like becoming aliases of them).

Here is some test code to support the ideas above:

using System;

public class MyClass
{
    public int Y { get; set; } = 0;
}

public class Program
{
    public static int z = 0;

    public static void TestValueTypeByRef(ref int arg)
    {
        arg = 1;
        // also arg.MutateSomehow()
    }

    public static void TestRefByRef1(ref MyClass c)
    {
        c.Y = 1;
    }

    public static void TestRefByRef2(ref MyClass c)
    {
        c = new MyClass();
        c.Y = 2;
    }

    public static void Main()
    {
        Console.WriteLine($"Main z = {z}"); // Main z = 0

        TestValueTypeByRef(ref z);
        Console.WriteLine($"After TestValueTypeByRef(ref z): Main z = {z}"); // After TestValueTypeByRef(ref z): Main z = 1
        Console.WriteLine("");

        var mc = new MyClass();
        Console.WriteLine($"Main mc.Y = {mc.Y}"); // Main mc.Y = 0

        TestRefByRef1(ref mc);
        Console.WriteLine($"After TestRefByRef1(ref mc): Main mc.Y = {mc.Y}"); // After TestRefByRef1(ref mc): Main mc.Y = 1

        TestRefByRef2(ref mc);
        Console.WriteLine($"After TestRefByRef2(ref mc): Main mc.Y = {mc.Y}"); // After TestRefByRef2(ref mc): Main mc.Y = 2
    }
}

I almost forgot, but passing by reference requires that variables be initialized when passed in (you cannot pass in null variables).

Passing by input reference

This is a special case of passing by reference where you're not allowed to modify (assign to) the variable inside the function, but you can call methods on it. Essentially the variable is readonly inside the function body (but the object is still modifiable internally via its interface).

This is good for functions that might want to take a reference to an object from an outside scope and call methods on it, but ensure that the external variable still points to the same object and that it was not swapped with another object instance, inside the function. (Might seem like a performance optimization but in most cases it's not.)

You also don't need to use the in keyword when calling the function like you do when using ref. In the calling scope, there's no need to be aware that the variable you're passing as an argument will be passed by reference, because it cannot be modified (reassigned) unexpectedly.

This also requires that the reference be already initialized before calling the function or else you'll get a null reference exception.

Passing by output reference

This is the last special case of passing by reference, which is exactly like ref but you don't need to initialize the variable before calling the function.

You do however need to assign a value to the out parameter before returning from the function.

You also need to use the out keyword when declaring and calling the function (you need to be aware of this in the calling scope because the variable is not readonly inside the function and could be reassigned).

Using out can make code more readable and can consolidate for example, multiple operations on multiple variables inside one function by having those variables be returned back to the calling scope without having to pass them as arguments, and also encapsulate them in the function's return type. This way you could use the function's return type for error reporting for example. Or you could use tuples and exceptions for error reporting.

Closing note

At this point, I mainly write these blog posts to help myself consolidate what I've learned, but I hope it can help others like myself as well. While the quality of my posts now is more of a rough draft, my focus is on learning by taking notes. I do try to keep these notes to some standard of quality though and I will try to increase the quality as time goes by.

And hey, if you have anything constructive to say or anything to ask, feel free to use the comments section below.

ย