Daniel Earwicker Software Architect
Currently at large

A new kind of managed lvalue pointer


ROSLYN 2014-04-27

It's already the evening and I haven't yet added anything to the C# compiler today, so here goes!

Properties have special support in C#, but they are not "first class". You can't get a reference to a property and pass it around as a value. Methods are much better served in this regard: delegates are a way to treat a method as a value. But they are just objects with an Invoke method.

So all we need is an interface with a Value property. Objects supporting that interface can represent a single property that can be passed around like any other value:

public interface IProperty<T>
{
    T Value { get; set; }
}

This is closely analogous to an old-fashioned pointer in C and C++, as I mused aloud all those years ago. Let's turn that whole idea into a strangely alluring language feature, which I'll call "property references", and then occasionally forget that terminology and call them pointers instead.

Firstly, syntax. We could use actual pointer syntax, but I already used some of that in yesterday's feature. Dagnabbit! Fortunately C++/CX has already paved the way: it has the concept of a reference to a fancy object that must be explicitly dereferenced. The syntax is like this:

MyClass ^r = %x;

(*r).Foo();

If this looks weird, try replacing ^ with * and % with &. It's then exactly like C/C++. ^ is a postfix modifier on a type declaration that means "Will store a pointer to one of those", and % is a unary prefix operator that means "Give me a pointer to whatever comes next". And for the sake of uniformity in common code, C++/CX always uses * to dereference.

Before getting into the nitty-gritty of changing the compiler, let's survey the glorious sunny uplands we wish to invade. We should be able to do this in C#:

var x = 5;
var px = %x; // Take a "pointer" to x

Console.WriteLine(*px); // Prints 5

*px = 6; // Can assign "through" a pointer
(*px)++; // Or increment

Console.WriteLine(x); // Prints 7 - we modified x via px

Unlike old-school pointers, we can - of course - quite safely return one of these things from a method, referring to a local variable:

static int^ GetCounter(int init)
{
    var counter = init;
    return %counter;
}

// elsewhere...
var c = GetCounter(100);
Console.WriteLine(*c); // Prints 100

(*c)++;
Console.WriteLine(*c); // Prints 101

The trick is that each bit of new syntax expands very simply into some standard C#. All the heavy lifting is done by the compiler's existing support for lambdas:

var x = 5;
System.IProperty<int> px = System.Property.Bind(v => x = v, () => x); // int *p = %x;
var y = px.Value; // var y = *x;

But of course, those helper types are not part of the standard System namespace. We need to add them:

namespace System
{
    public interface IProperty<T>
    {
        T Value { get; set; }
    }

    public static class Property
    {
        private class PtrImpl<T> : IProperty<T>
        {
            Action<T> _set;
            Func<T> _get;

            public PtrImpl(Action<T> set, Func<T> get)
            {
                _set = set;
                _get = get;
            }

            public T Value
            {
                get { return _get(); }
                set { _set(value); }
            }
        }

        public static IProperty<T> Bind<T>(Action<T> set, Func<T> get)
        {
            return new PtrImpl<T>(set, get);
        }
    }
}

These don't need to be in mscorlib.dll (some of the later C# features rely on types in System.Core.dll). So we can just create a new System.Extras.dll assembly and stick them in there.

So, one thing that makes this a whole 'nother level of crazy compared to my first two forays in to compiler features is that here we are adding new syntax. Fortunately Roslyn makes this quite easy. There's a file called Syntax.xml from which the project generates classes for all the nodes that can appear in a syntax tree. We can (as usual) follow the example of what we find in there.

All the unary prefix operators are in here, so they share a single class that can be distinguished by the Kind property:

<Node Name="PrefixUnaryExpressionSyntax" Base="ExpressionSyntax">
  <Kind Name="UnaryPlusExpression"/>
  <Kind Name="UnaryMinusExpression"/>
  <Kind Name="BitwiseNotExpression"/>
  <Kind Name="LogicalNotExpression"/>
  <Kind Name="PreIncrementExpression"/>
  <Kind Name="PreDecrementExpression"/>
  <Kind Name="AddressOfExpression"/>
  <Kind Name="PointerIndirectionExpression"/>
  <Kind Name="AwaitExpression"/>
  <Kind Name="PropertyReferenceExpression"/>    
  <Field Name="OperatorToken" Type="SyntaxToken">
    <Kind Name="PlusToken"/>
    <Kind Name="MinusToken"/>
    <Kind Name="TildeToken"/>
    <Kind Name="ExclamationToken"/>
    <Kind Name="PlusPlusToken"/>
    <Kind Name="MinusMinusToken"/>
    <Kind Name="AmpersandToken"/>
    <Kind Name="AsteriskToken"/>
    <Kind Name="AwaitKeyword"/>
    <Kind Name="PercentToken"/>
    <PropertyComment>
      <summary>SyntaxToken representing the kind of the operator of the prefix unary expression.</summary>
    </PropertyComment>
  </Field>
  <Field Name="Operand" Type="ExpressionSyntax">
    <PropertyComment>
      <summary>ExpressionSyntax representing the operand of the prefix unary expression.</summary>
    </PropertyComment>
  </Field>
  <TypeComment>
    <summary>Class which represents the syntax node for prefix unary expression.</summary>
  </TypeComment>
  <FactoryComment>
    <summary>Creates an PrefixUnaryExpressionSyntax node.</summary>
  </FactoryComment>
</Node>

I've added the PropertyReferenceExpression and the PercentToken. For the type modifier ^ I have to cook up a whole new node type:

<Node Name="PropertyReferenceTypeSyntax" Base="TypeSyntax">
  <Kind Name="PropertyReferenceType"/>
  <Field Name="ElementType" Type="TypeSyntax">
    <PropertyComment>
      <summary>TypeSyntax node that represents the element type of the property reference.</summary>
    </PropertyComment>
  </Field>
  <Field Name="CaretToken" Type="SyntaxToken">
    <Kind Name="CaretToken"/>
    <PropertyComment>
      <summary>SyntaxToken representing the caret symbol.</summary>
    </PropertyComment>
  </Field>
  <TypeComment>
    <summary>Class which represents the syntax node for property reference type.</summary>
  </TypeComment>
  <FactoryComment>
    <summary>Creates a PropertyReferenceTypeSyntax node.</summary>
  </FactoryComment>
</Node>

Now, if we try to build the compiler we'll get errors about missing names in the enum SyntaxKind, so we need to add them:

public enum SyntaxKind : ushort
{
    ...

    PropertyReferenceType,
    PropertyReferenceExpression
}

In SyntaxKindFacts.cs there's a workaday switch statement that we need to modify so it takes care of mapping % tokens to our new unary operator:

public static SyntaxKind GetPrefixUnaryExpression(SyntaxKind token)
{
    switch (token)
    {
        case SyntaxKind.PlusToken:
            return SyntaxKind.UnaryPlusExpression;
        case SyntaxKind.MinusToken:
            return SyntaxKind.UnaryMinusExpression;
        case SyntaxKind.TildeToken:
            return SyntaxKind.BitwiseNotExpression;
        case SyntaxKind.ExclamationToken:
            return SyntaxKind.LogicalNotExpression;
        case SyntaxKind.PlusPlusToken:
            return SyntaxKind.PreIncrementExpression;
        case SyntaxKind.MinusMinusToken:
            return SyntaxKind.PreDecrementExpression;
        case SyntaxKind.AmpersandToken:
            return SyntaxKind.AddressOfExpression;
        case SyntaxKind.AsteriskToken:
            return SyntaxKind.PointerIndirectionExpression;

        // The new part:
        case SyntaxKind.PercentToken:
            return SyntaxKind.PropertyReferenceExpression;

        default:
            return SyntaxKind.None;
    }
}

And there's a another that defines the precedence of operators, which is how the compiler decides what to do when you don't fully parenthesise your expressions. I figure that the new % operator should copy the existing &amp operator:

private static uint GetPrecedence(SyntaxKind op)
{
    switch (op)
    {
        ...

        case SyntaxKind.AddressOfExpression:
        case SyntaxKind.PropertyReferenceExpression: // the new part
            return 16;
        default:
            return 0;
    }
}

For the all-new ^ operator we have to throw some code in to deal with it. Like I said yesterday, Roslyn's structure seems surprisingly procedural. It's not using functional parser combinators or anything "cool" and/or "academic". It's just a bunch of methods that examine the current token, do switch statements, etc. On the plus side, it is very easy to learn how it works by stepping through it in the debugger. That's the saving grace of procedural code: ease of hacking.

I hooked into the same place that handles pointer syntax, as (again) its closely analogous.

private TypeSyntax ParsePointerTypeMods(TypeSyntax type)
{
    // Check for pointer types
    while (this.CurrentToken.Kind == SyntaxKind.AsteriskToken)
    {
        type = syntaxFactory.PointerType(type, this.EatToken());
    }

    // Check for property reference types (new)
    while (this.CurrentToken.Kind == SyntaxKind.CaretToken)
    {
        type = syntaxFactory.PropertyReferenceType(type, this.EatToken());
    }

    return type;
}

Note: we don't have to write that syntaxFactory.PropertyReferenceType method. It's one of the pieces that are auto-generated from what we added to Syntax.xml.

Now, we have the syntax. All we need now is to sort out the binding phase, which figures out whether the syntax actually makes sense (that when it refers to a variable called x, there actually is one called x, and every expression has a type, etc.)

And it is here that I am overcome with one of those attacks of laziness that are the hallmark of the truly great programmer, hem-hem. Faced with a pattern like this:

System.Property.Bind(v => o = v, () => o)

We don't want to have to write screeds of code that builds the BoundExpression that make up that pattern (believe me: I got about half-way through the first lambda before realising I would be retired before finishing the whole thing). In any case, the compiler can already do it - that's its job. Ideally we could use the existing parser to get a kind of "syntax template", in which we can replace certain identifiers with chunks of other syntax, and then ask the existing binder to bind it. Then we'd have to do almost no thinking at all! Bliss.

So for example:

private static readonly SyntaxTemplate _propertyReferenceTemplate
    = new SyntaxTemplate("System.Property.Bind(__v_pr__ => o = __v_pr__, () => o)");
                                        
private BoundExpression BindPropertyReferenceExpression(PrefixUnaryExpressionSyntax node, DiagnosticBag diagnostics)
{
    return RedirectDiagnostics(diagnostics, node, redirected =>
        BindExpression(_propertyReferenceTemplate.Replace("o", node.Operand).Syntax, redirected));
}

We'll come back to that RedirectDiagnostics part later. The key point is that I'm creating an instance of my new class SyntaxTemplate as a static, so it is reused for the lifetime of the compiler. It's immutable, hence thread-safe. Then every time I need to bind something like %foo, I can just replace the o in the template with foo (which is in node.Operand). Replace returns a new SyntaxTemplate rather than modifying the original (that's what immutability is all about).

Again, binding is a recursive, procedural system. There's a big switch statement that calls out to methods that bind various things, so we need to hook our new method BindPropertyReferenceExpression into that:

private BoundExpression BindExpressionInternal(ExpressionSyntax node, DiagnosticBag diagnostics, bool invoked, bool indexed)
{
    if (IsEarlyAttributeBinder && !EarlyWellKnownAttributeBinder.CanBeValidAttributeArgument(node, this))
    {
        return BadExpression(node, LookupResultKind.NotAValue);
    }

    Debug.Assert(node != null);
    switch (node.Kind)
    {
        ...

        // New part
        case SyntaxKind.PropertyReferenceExpression:
            return BindPropertyReferenceExpression((PrefixUnaryExpressionSyntax)node, diagnostics);

See how there's a switch statement on an enum, then a cast - all the kinds of thing that beginners are told not to do when they learn C#, because supposedly virtual method dispatch on a single object solves all problems. (Oh wait, no it doesn't.) But still, it wouldn't make sense to have a built-in type switch in languages like C#, Java or C++ (except apparently in one situation).

Anyway, BindExpression calls BindExpressionInternal, which calls our new BindPropertyReferenceExpression method, which expands our template and passes it to BindExpression... we're going in circles! But it's okay. The reason this doesn't asplode the stack is because our template doesn't include further references to %.

Now, about that RedirectDiagnostics wrinkle. The binding process has a DiagnosticBag object that gets passed around. Any errors found are thrown in the bag. Each error has a Location object, identifying the place in the user's source code where the error was spotted, so the locations were discovered at the parsing stage. The problem we have is that we parse our template code separately, so the locations bear no relation to the user's source code. This means that the IDE's text editor cannot put red squiggles in the right place.

To fix this, I literally fix the diagnostics:

private T RedirectDiagnostics<T>(DiagnosticBag diagnostics, 
             CSharpSyntaxNode nodeWithLocation, 
             Func<DiagnosticBag, T> generate)
{
    var captured = new DiagnosticBag();
    var result = generate(captured);

    foreach (var diag in captured.AsEnumerable().OfType<DiagnosticWithInfo>())
        diagnostics.Add(new CSDiagnostic(diag.Info, nodeWithLocation.Location));

    return result;
}

The generate function does the binding, but to a "fake" temporary DiagnosticBag, which we then copy into the real one but replacing all the Location objects with a single good location. This isn't ideal. Recall that some of the syntax tree was inserted from the user's source and so had perfectly good locations. I need to figure out a way of detecting whether a location is junk or not. But it sort of works.

So, we have binding for %. For * we have to enhance the existing code, branching based on whether operand is a pointer:

private static readonly SyntaxTemplate _pointerIndirectionTemplate = new SyntaxTemplate("p.Value");

// Based on ExpressionBinder::bindPtrIndirection.
private BoundExpression BindPointerIndirectionExpression(PrefixUnaryExpressionSyntax node, DiagnosticBag diagnostics)
{
    BoundExpression operand = BindValue(node.Operand, diagnostics, GetUnaryAssignmentKind(node.Kind));

    // Try using the template on anything that isn't a pointer
    if (!operand.Type.IsPointerType())
        return RedirectDiagnostics(diagnostics, node, redirected => 
            BindExpression(_pointerIndirectionTemplate.Replace("p", node.Operand).Syntax, redirected));
            
    TypeSymbol pointedAtType;
    bool hasErrors;
    BindPointerIndirectionExpressionInternal(node, operand, diagnostics, out pointedAtType, out hasErrors);

    return new BoundPointerIndirectionOperator(node, operand, pointedAtType ?? CreateErrorType(), hasErrors);
}

We can even use the template technique for type declarations - just replace T with whatever we get from the user:

private static readonly SyntaxTemplate _propertyReferenceTypeTemplate = new SyntaxTemplate("System.IProperty<T>");

internal Symbol BindNamespaceOrTypeOrAliasSymbol(ExpressionSyntax syntax, DiagnosticBag diagnostics, ConsList<Symbol> basesBeingResolved, bool suppressUseSiteDiagnostics)
{
    switch (syntax.Kind)
    {
        ...

        case SyntaxKind.PropertyReferenceType:
            {
                return RedirectDiagnostics(diagnostics, syntax, redirected => BindNamespaceOrTypeOrAliasSymbol(
                    _propertyReferenceTypeTemplate.Replace("T", ((PropertyReferenceTypeSyntax)syntax).ElementType).Syntax, 
                    redirected, basesBeingResolved, suppressUseSiteDiagnostics));
            }

That's every change needed to support the feature. If you want to play with it (and even add features of your own using the SyntaxTemplate class), I've updated my fork with all these changes. You will need to define the System.IProperty and System.Property types - it will work if you just paste the code.

json-mobx - Like React, but for Data (Part 2) 2017-02-15
Redux in Pieces 2017-01-28
Box 'em! - Property references for TypeScript 2017-01-11
TypeScript - What's up with this? 2017-01-01
MobX - Like React, but for Data 2016-12-28
Eventless - XAML Flavoured 2016-12-24
Immuto - Epilogue 2016-12-20
Immuto - Radical Unification 2016-09-22
Immuto - Working with React (An Example) 2016-09-16
TypeScript - What is a class? 2016-09-11
Immuto - Strongly Typed Redux Composition 2016-09-11
TypeScript and runtime typing - EPISODE II 2016-09-10
TypeScript and runtime typing 2016-09-04
What's good about Redux 2016-07-24
TypeScript multicast functions 2016-03-13
Introducing doop 2016-03-08
TypeScript is not really a superset of JavaScript and that is a Good Thing 2015-07-11
A new kind of managed lvalue pointer 2014-04-27
Using pointer syntax as a shorthand for IEnumerable 2014-04-26
Adding crazily powerful operator overloading to C# 6 2014-04-23
Introducing Carota 2013-11-04
Want to comment on anything? Create an issue!