DevTrends

Stop using AutoMapper in your Data Access Code

A few months ago, Jimmy Bogard, author of the excellent AutoMapper wrote a great article about Autoprojecting LINQ queries. Now that Jimmy has done all the hard expression tree work, this article extends his example to include caching and simple flattening capabilities and goes on to show it in use in a simple EF 4.1 Code First application.

Background

So what is this article all about? It is all about writing less boilerplate code in your data access layer and also reducing the amount of data being transfered from your database. Let's take an example so we can get some context about what this article covers:

Imagine that you have a Student table in your database with a million rows and 100 columns storing every piece of information you could imagine about each student. If you want to find out about one student, Joe Smith, how many rows do you return? One I hope. Similarly, if you want to produce a report listing the name of every student enrolled in a particular course, how many columns do you return from the student table? All 100? Let's hope not. As developers, we should always be striving to make our appications as efficient as possible, particularly when it comes to external resources such as databases. If you only need firstname and lastname, then that is all you should return.

If you follow this principle then the vast majority of your code will never work with the whole entity, but instead use just a few fields. The problem is that these subset of fields will be different for each discrete piece of functionality in the application and you will have a separate class for each scenario. Populating these classes is rather tiresome. If you are doing it efficiently, the following style of code will be familiar:

var studentAddressDetails = from s in context.Students
                            select new StudentAddressDetails
                            {
                              FirstName = s.FirstName,
                              LastName = s.LastName,
                              Address1 = s.Address1,
                              Address2 = s.Address2,
                              Address3 = s.Address3,
                              City = s.City,
                              County = s.County,
                              PostCode = s.PostCode,
                            };

There is nothing wrong with the code above. It works and is performant. It is just rather tedious to write these kind of projections throughout your DAL. This article explains how to replace this code, with the code displayed below:

var studentAddressDetails = context.Students.Project().To<StudentAddressDetails>();

The above code results in exactly the same SQL statement being generated and the performance should be very similar in both cases, but I know which one I would rather write as a developer.

Why can't we just use AutoMapper?

Whilst I am a big fan of AutoMapper and use it in most projects I work on, especially for Domain to ViewModel mapping, when in comes to data access code, AutoMapper is not so useful. To put it simply, AutoMapper only works with in memory data, not the IQueryable interface which is more typically used in DAL scenarios. In the data access layer, whether we are using Entity Framework, LINQ to SQL or NHibernate, we often use the IQueryable interface, specifiying what we want to query before the OR/M engine translates this into SQL and returns our data. If you use AutoMapper in your DAL however, you are almost certainly returning more data than you need from the database, as the mapping will not occur until AFTER the original query has executed and the data has been returned.

If we take the example used in the previous section, using AutoMapper we would probably write something similar to this:

// probably set at app startup in reality
Mapper.CreateMap<Student, StudentAddressDetails>();

var details = Mapper.Map<IEnumerable<Student>, IEnumerable<StudentAddressDetails>>(context.Students).ToList();

Unfortunately, if you fire up Query Analyser and see what is being executed, you will find that every column is being returned from the database before AutoMapper gets to work and performs the mapping. So whilst being convenient, in the DAL, AutoMapper can be rather inefficient.

Haven't I seen a solution to this before?

Jimmy is well-aware of this fact and his blog post describes an alternative to AutoMapper for DAL scenarios. The solution is basically just an extension method for IQueryable that allows you to specify exactly what to return when the query is executed. Whilst this sounds very simple, in reality, the code necessary for this extension method is not straightforward, making extensive use of expression trees which whilst very powerful, are notoriously difficult to work with. I would strongly advise you to take a look at his post before continuing with this article.

Jimmy's example code works well but, as he mentioned in his article, the code is more a proof of concept than a finished product. There are a few limitations that make it unusable in production code as it stands. Fortunately, all the difficult parts have already been done and it is very simple to modify the code so that it can be used in a commercial application, without worrying about performance.

Extending Jimmy's example

We will focus on improving three distinct areas of the code. Firstly, as Jimmy mentioned in his blog, his code is not very performant, generating the expression tree every time the extension method is used. We will add some caching to fix this issue. Secondly, the code is not tolerent of any missing properties, or even read-only properties. We will modify the code to only try and map writable properties and should a mapping not be found, we will simply move on to the next property and not throw an exception. Finally, we will allow some simple flattening of data using a convention over configuration approach.

It's tempting to try and combine the Project and To classes into one but as Jimmy pointed out, doing so makes calling it slightly more verbose because you have to specify both source and destination generic types when you use the method. This is because when it comes to generic type inference, .NET can either infer all the generic parameters or none. The standard approach in this situation is to split the functionality over two classes, which is exactly what Jimmy has done.

So with a single class, you would call you extension method as follows:

context.Students.ProjectTo<Student, StudentAddressDetails>();

But, by adding a second class, we can simplify the call to:

context.Students.Project().To<StudentAddressDetails>();

Most people would agree that the second option is a slightly nicer syntax, hence the reason for separate Project and To classes.

Fixing the caching issue is very straightforward. Before building the expresion, we check a static ExpressionCache dictionary object using a key generated from the source and destination types. If present, we use the cached version. Otherwise, we generate the expression and store it in the cache for future use. You can see the full code in the download below.

public IQueryable<TDest> To<TDest>()
{
  var queryExpression = GetCachedExpression<TDest>() ?? BuildExpression<TDest>();

  return _source.Select(queryExpression);
}        

private static Expression<Func<TSource, TDest>> GetCachedExpression<TDest>()
{
  var key = GetCacheKey<TDest>();

  return ExpressionCache.ContainsKey(key) 
    ? ExpressionCache[key] as Expression<Func<TSource, TDest>> 
    : null;
}

To fix the fault tolerance and flattening issues, we need to split up the Expression.Lambda statement so we can check a couple of different options for source parameters and prevent failure even if no source can be found. Note that I have shortened variable names in the code below to improve display on the website.

private static Expression<Func<TSource, TDest>> BuildExpression<TDest>()
{
  var srcProps = typeof(TSource).GetProperties();
  var destProps = typeof(TDest).GetProperties().Where(dest => dest.CanWrite);
  var paramExpr = Expression.Parameter(typeof(TSource), "src");

  var bindings = destProps
            .Select(destProp => BuildBinding(paramExpr, destProp, srcProps))
            .Where(binding => binding != null);

  var expression = Expression.Lambda<Func<TSource, TDest>>(
            Expression.MemberInit(
              Expression.New(typeof(TDest)), bindings), paramExpr);

  var key = GetCacheKey<TDest>();

  ExpressionCache.Add(key, expression);

  return expression;
}        

private static MemberAssignment BuildBinding(Expression paramExpr, MemberInfo destProp, 
                        IEnumerable<PropertyInfo> srcProps)
{
  var srcProp = srcProps.FirstOrDefault(src => src.Name == destProp.Name);

  if (srcProp != null)
  {
    return Expression.Bind(destProp, Expression.Property(paramExpr, srcProp));
  }

  var propNames = SplitCamelCase(destProp.Name);

  if (propNames.Length == 2)
  {
    srcProp = srcProps.FirstOrDefault(src => src.Name == propNames[0]);

    if (srcProp != null)
    {
      var srcChildProps = srcProp.PropertyType.GetProperties();
      var srcChildProp = srcChildProps.FirstOrDefault(src => src.Name == propNames[1]);

      if (srcChildProp != null)
      {
        return Expression.Bind(destProp, 
              Expression.Property(
                Expression.Property(paramExpr, srcProp), srcChildProp));
      }
    }
  }

  return null;
}

For each writable destination property we call BuildBinding which initially just looks at the source type for a similarly named property. If we do not find such as property, then we assume that we are trying to do flattening, so try to split the property name by CamelCase. If this results in two words, we try to find a property with the same name as the first word and if present, try to find a property on the nested type with the same name as the second word. This is better visualised with an example.

Take a destination property named CoursesCount. First we look for a property on the source named CoursesCount. If not present, splitting using CamelCase results in two words: Courses and Count. As the split results in two words, we continue and try to find a property on the source named Courses. If present, we will then try and find a property on the Courses type called Count. If present, then this is our mapping. Otherwise, we return null and the CoursesCount property will not be populated. The code provided allows one level of flattening, but it is pretty straightforward to adapt this to your needs.

Example usage in a simple EF 4.1 application

Now that we have our extension method, let's see how it can be used. Our example uses the latest Entity Framework 4.1 Code First, which is a fantastic addition to EF4 that has been extremely popular with developers since its release a few months ago. It is beyond the scope of this article to explain how EF 4.1 Code First works, so if you are not familiar with it, I suggest reading some of the excellent article available on the web, such as those by Scott Guthrie or the EF Design Blog.

Our simple example uses the following database schema:

Example database schema

Figure 1: Example database schema

We have a student database where each student can take multiple courses, but only has a single teacher. Our code first entities are displayed below:

public class Teacher
{
  public int Id { get; set; }
  public string Name { get; set; }
  public ICollection<Student> Students { get; set; }
}

public class Course
{
  public Course()
  {
    Students = new List<Student>();
  }

  public int Id{ get; set; }
  public string Description { get; set; }
  public ICollection<Student> Students { get; set; }
}

public class Student
{
  public Student()
  {
    Courses = new List<Course>();
  }

  public int Id { get; set; }
  public string FirstName { get; set; }
  public string LastName { get; set; }
  public DateTime DateOfBirth { get; set; }
  public ICollection<Course> Courses { get; set; }
  public Teacher Tutor { get; set; }
}

public class StudentSummary
{
  public string FirstName { get; set; }
  public string LastName { get; set; }
  public string NotADatabaseColumn { get; set; }
  public string TutorName { get; set; }
  public int CoursesCount { get; set; }

  public string FullName
  {
    get { return string.Format("{0} {1}", FirstName, LastName); }
  }
}

public class AnotherStudentSummary
{
  public AnotherStudentSummary()
  {
    Courses = new List<Course>();
  }

  public string LastName { get; set; }
  public ICollection<Course> Courses { get; set; }
}

Teacher, Course and Student classes correspond to database tables of the same name. StudentSummary and AnotherStudentSummary are examples of non-database classes used as projection targets. You can imagine that in a real-life scenario, all the tables would have many more columns, so efficient querying would be much more important.

The standard approach for returning all students as StudentSummary entities, would be to do a manual projection:

var students = from s in context.Students
               select new StudentSummary
               {
                 FirstName = s.FirstName,
                 LastName = s.LastName,
                 TutorName = s.Tutor.Name,
                 CoursesCount = s.Courses.Count
               };

This query seen on its own doesn't look particularly cumbersome, but imagine the dozens or even hundreds of projections that you will require on a large project, typically mapping many more properties than shown in this examples and you will understand how useful this kind of extension method can be. The previous projection can be replaced with just one line:

var students = context.Students.Project().To<StudentSummary>(); 

If we look at the SQL generated from this query, we can see that we are returning the minimum necessary to hydrate the StudentSummary object. In fact, the SQL output is exactly the same in both cases:

SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[FirstName] AS [FirstName], 
    [Extent1].[LastName] AS [LastName], 
    [Extent2].[Name] AS [Name], 
    (SELECT 
      COUNT(1) AS [A1]
      FROM [dbo].[StudentCourses] AS [Extent3]
      WHERE [Extent1].[Id] = [Extent3].[StudentId]) AS [C1]
    FROM  [dbo].[Students] AS [Extent1]
    LEFT OUTER JOIN [dbo].[Teachers] AS [Extent2] ON [Extent1].[TeacherId] = [Extent2].[Id]

Taking each property of the StudentSummary object in turn, let's examine how they are being mapped. FirstName and LastName are just standard, same-name mappings from the database, NotADatabaseColumn is a read/write property that is not present in the database. There are many reasons why you may have properties that do not map to database columns and you do not want the projection failing in these cases. TutorName is an example of a flattening. The Student class has a Tutor property of type Teacher and the Teacher class has a Name property, so you can see from the SQL that we join onto the Teacher table and retrieve the Name property. CoursesCount is another flattening. In this case, Courses is an ICollection which of course has a Count property. Again, you can examine the SQL and see that this is being done in a very efficient manner. Finally, FullName is a read-only property, so is ignored by the extension method.

The second example which maps to the AnotherStudentSummary entity shows how simple collection mapping is also supported. Note that if you want to do anything more than map the whole collection without manipulation, then you will need to adapt the code yourself as this is not currently supported.

Code Download

You can download all the code from this article here. Alternatively, just get the QueryableExtensions class without the example code.

Conclusion

When hitting a database, as a developer, it is important to only return the data that you need and no more. When using modern ORM's, this is typically achieved by using projections when writing IQueryable queries. Typing this type of projection code can be tiresome and some people use AutoMapper instead, believing that it achieves the same thing. Unfortunately, AutoMapper knows nothing about IQueryable and only works with in-memory data, making it less than ideal in DAL scenarios. In order for AutoMapper to do its mapping, it needs to retreive all source data from the database, resulting in much more data being returned than is necessary, reducing performance and increasing database load and network traffic.

The solution mentioned in this article, which was originally conceived by Jummy Bogard, addresses these issues (boilerplate code vs unnecessarily large database queries), and allows you to automatically map types in your DAL in a simple, efficient manner. The code required to perform a mapping is measured in characters rather than lines and the performance is right up there with the long-winded, manual projection method. The approach is entirely convention-based with no mechanism to override the conventions, so it is certainly not going to be useful in all situations, but I have found it invaluable in the few weeks that I have been using it.

Useful or Interesting?

If you liked the article, we would really appreciate it if you would share it with your Twitter followers.

Share on Twitter

Comments

Comments are now closed for this article.