Wednesday, February 27, 2008

Derived Properties in LINQ to SQL

When using a rich domain model, you often end up with properties that are dervied from other persisted values. To take a trivial example:

public bool HasMiddleName
{
get { return MiddleName != ""; }
}

where MiddleName is a persisted property (column in the database, mapped by LINQ, etc.).

This works fine in object land, but breaks down when you need to query the database based on this property. LINQ to SQL can't read into the definition of the property itself and figure out what to do. If you try to run something that looks like:

setOfObjects.Where(o => o.HasMiddleName)

you'll get an exception saying that LINQ to SQL doesn't know how to translate HasMiddleName.

Now you're left with a choice: either duplicate the implementation of your property in the query, or come up with something clever. Starting with the assumption that duplicating domain knowledge is bad, let's explore the clever path.

Our particular architecture builds up queries using IQueryable, to make sure that as much of our logic as possible is executed by the database, rather than in memory. IQueryable<T>.Where takes an Expression<Func<T, bool>>, basically an expression that represents a function taking your object and returning a boolean. In the normal case, you just pass a lambda to Where and the compiler builds the expression for you. In our case, we have to build the expression ourselves. Back in the domain object we do:

public static readonly Expression<Func<MyObject,bool>>
HasMiddleNameExpr = o => o.MiddleName != "";

Now we can write queries that look like:

var result = setOfObjects.Where(MyObject.HasMiddleNameExpr)

and things will run just fine. Great! Except we still have duplication: our object has a property with logic in it, and the definition of the expression with the logic in it. Let's remove that.

First, since Expressions can't be invoked directly, save a compiled version of the expression (i.e., a Func<MyObject,bool>).

public static readonly Func<MyObject,bool> HasMiddleNameFunc =
HasMiddleNameExpr.Compile();

Now, define the property in terms of the Func.

public bool HasMiddleName
{
get { return HasMiddleNameFunc(this); }
}

Now all the duplication is removed. Unfortunately this seems like quite a few hoops to have to jump through. Instead of just a property we now have a property, a static function, and a static expression representing the function.

Unfortunately, it gets worse. The particular query we were writing depended on two derived properties and not just one. That meant that we couldn't just call Where with our expression. We wanted all objects where either one or the other property was true. That meant we had to build the or expression ourselves. It looks something like this in the query:

ParameterExpression p =
Expression.Parameter(typeof(MyObject), "p");

var result = setOfObjects.Where(
Expression.Lambda<Func<MyObject,bool>>(
Expression.OrElse(
Expression.Invoke(MyObject.HasMiddleName,p),
Expression.Invoke(MyObject.HasLastName,p)),
p));

What this does is build an expression that represents something like:

p => MyObject.HasMiddleNameFunc(p) ||
MyObject.HasLastNameFunc(p)

The combining of two Func<T,bool> expressions via a logical operator could probably be hidden by an extension method to clean things up a bit.

Ultimately, though, we ended up deciding that there was no way a junior developer reading this code in the future would be able to maintain this, much less understand it. After all this exploration we opted just to duplicate the domain logic in the query.