Stop using AutoMapper in your Data Access Code

A few months ago, Jimmy Bogard, author of the excellent AutoMapper wrote a great article about Autoprojecting LINQ queries. Now that Jimmy has done all the hard expression tree work, this article extends his example to include caching and simple flattening capabilities and goes on to show it in use in a simple EF 4.1 Code First application.

*** UPDATE: AutoMapper has now been updated to include Queryable Extensions. You can read this article to find out why you should not be using Mapper.Map in your data access code. Then head over to the docs for information on using ProjectTo<> ***

Background

So what is this article all about? It is all about writing less boilerplate code in your data access layer and also reducing the amount of data being transferred from your database. Let's take an example so we can get some context about what this article covers:

Imagine that you have a Student table in your database with a million rows and 100 columns storing every piece of information you could imagine about each student. If you want to find out about one student, Joe Smith, how many rows do you return? One I hope. Similarly, if you want to produce a report listing the name of every student enrolled in a particular course, how many columns do you return from the student table? All 100? Let's hope not. As developers, we should always be striving to make our applications as efficient as possible, particularly when it comes to external resources such as databases. If you only need firstname and lastname, then that is all you should return.

If you follow this principle then the vast majority of your code will never work with the whole entity, but instead use just a few fields. The problem is that these subset of fields will be different for each discrete piece of functionality in the application and you will have a separate class for each scenario. Populating these classes is rather tiresome. If you are doing it efficiently, the following style of code will be familiar:

var studentAddressDetails = from s in context.Students
                            select new StudentAddressDetails
                            {
                              FirstName = s.FirstName,
                              LastName = s.LastName,
                              Address1 = s.Address1,
                              Address2 = s.Address2,
                              Address3 = s.Address3,
                              City = s.City,
                              County = s.County,
                              PostCode = s.PostCode,
                            };

There is nothing wrong with the code above. It works and is performant. It is just rather tedious to write these kind of projections throughout your DAL. This article explains how to replace this code, with the code displayed below:

var studentAddressDetails = context.Students.Project().To<StudentAddressDetails>();

The above code results in exactly the same SQL statement being generated and the performance should be very similar in both cases, but I know which one I would rather write as a developer.

Why can't we just use AutoMapper?

Whilst I am a big fan of AutoMapper and use it in most projects I work on, especially for Domain to ViewModel mapping, when in comes to data access code, AutoMapper is not so useful. To put it simply, AutoMapper only works with in-memory data, not the IQueryable interface which is more typically used in DAL scenarios. In the data access layer, whether we are using Entity Framework, LINQ to SQL or NHibernate, we often use the IQueryable interface, specifiying what we want to query before the ORM engine translates this into SQL and returns our data. If you use AutoMapper in your DAL however, you are almost certainly returning more data than you need from the database, as the mapping will not occur until AFTER the original query has executed and the data has been returned.

If we take the example used in the previous section, using AutoMapper we would probably write something similar to this:

// probably set at app startup in reality
Mapper.CreateMap<Student, StudentAddressDetails>();

var details = Mapper.Map<IEnumerable<Student>, IEnumerable<StudentAddressDetails>>(context.Students).ToList();

Unfortunately, if you fire up Query Analyser and see what is being executed, you will find that every column is being returned from the database before AutoMapper gets to work and performs the mapping. So whilst being convenient, in the DAL, AutoMapper can be rather inefficient.

Haven't I seen a solution to this before?

Jimmy is well-aware of this fact and his blog post describes an alternative to AutoMapper for DAL scenarios. The solution is basically just an extension method for IQueryable that allows you to specify exactly what to return when the query is executed. Whilst this sounds very simple, in reality, the code necessary for this extension method is not straightforward, making extensive use of expression trees which whilst very powerful, are notoriously difficult to work with. I would strongly advise you to take a look at his post before continuing with this article.

Jimmy's example code works well but, as he mentioned in his article, the code is more a proof of concept than a finished product. There are a few limitations that make it unusable in production code as it stands. Fortunately, all the difficult parts have already been done and it is very simple to modify the code so that it can be used in a commercial application, without worrying about performance.

Extending Jimmy's example

We will focus on improving three distinct areas of the code. Firstly, as Jimmy mentioned in his blog, his code is not very performant, generating the expression tree every time the extension method is used. We will add some caching to fix this issue. Secondly, the code is not tolerant of any missing properties, or even read-only properties. We will modify the code to only try and map writable properties and should a mapping not be found, we will simply move on to the next property and not throw an exception. Finally, we will allow some simple flattening of data using a convention over configuration approach.

It's tempting to try and combine the Project and To classes into one but as Jimmy pointed out, doing so makes calling it slightly more verbose because you have to specify both source and destination generic types when you use the method. This is because when it comes to generic type inference, .NET can either infer all the generic parameters or none. The standard approach in this situation is to split the functionality over two classes, which is exactly what Jimmy has done.

So with a single class, you would call you extension method as follows:

context.Students.ProjectTo<Student, StudentAddressDetails>();
	

But, by adding a second class, we can simplify the call to:

context.Students.Project().To<StudentAddressDetails>();
	

Most people would agree that the second option is a slightly nicer syntax, hence the reason for separate Project and To classes.

Fixing the caching issue is very straightforward. Before building the expression, we check a static ExpressionCache dictionary object using a key generated from the source and destination types. If present, we use the cached version. Otherwise, we generate the expression and store it in the cache for future use. You can see the full code in the download below.

public IQueryable<TDest> To<TDest>()
{
  var queryExpression = GetCachedExpression<TDest>() ?? BuildExpression<TDest>();

  return _source.Select(queryExpression);
}        

private static Expression<Func<TSource, TDest>> GetCachedExpression<TDest>()
{
  var key = GetCacheKey<TDest>();

  return ExpressionCache.ContainsKey(key) 
    ? ExpressionCache[key] as Expression<Func<TSource, TDest>> 
    : null;
}

To fix the fault tolerance and flattening issues, we need to split up the Expression.Lambda statement so we can check a couple of different options for source parameters and prevent failure even if no source can be found. Note that I have shortened variable names in the code below to improve display on the website.

private static Expression<Func<TSource, TDest>> BuildExpression<TDest>()
{
  var srcProps = typeof(TSource).GetProperties();
  var destProps = typeof(TDest).GetProperties().Where(dest => dest.CanWrite);
  var paramExpr = Expression.Parameter(typeof(TSource), "src");
  
  var bindings = destProps
            .Select(destProp => BuildBinding(paramExpr, destProp, srcProps))
            .Where(binding => binding != null);

  var expression = Expression.Lambda<Func<TSource, TDest>>(
            Expression.MemberInit(
            Expression.New(typeof(TDest)), bindings), paramExpr);

  var key = GetCacheKey<TDest>();

  ExpressionCache.Add(key, expression);

  return expression;
}        

private static MemberAssignment BuildBinding(Expression paramExpr, MemberInfo destProp, 
  IEnumerable<PropertyInfo> srcProps)
{
  var srcProp = srcProps.FirstOrDefault(src => src.Name == destProp.Name);

  if (srcProp != null)
  {
    return Expression.Bind(destProp, Expression.Property(paramExpr, srcProp));
  }

  var propNames = SplitCamelCase(destProp.Name);

  if (propNames.Length == 2)
  {
    srcProp = srcProps.FirstOrDefault(src => src.Name == propNames[0]);

    if (srcProp != null)
    {
      var srcChildProps = srcProp.PropertyType.GetProperties();
      var srcChildProp = srcChildProps.FirstOrDefault(src => src.Name == propNames[1]);

      if (srcChildProp != null)
      {
        return Expression.Bind(destProp, 
              Expression.Property(
                Expression.Property(paramExpr, srcProp), srcChildProp));
      }
    }
  }

  return null;
}

For each writable destination property we call BuildBinding which initially just looks at the source type for a similarly named property. If we do not find such as property, then we assume that we are trying to do flattening, so try to split the property name by CamelCase. If this results in two words, we try to find a property with the same name as the first word and if present, try to find a property on the nested type with the same name as the second word. This is better visualised with an example.

Take a destination property named CoursesCount. First we look for a property on the source named CoursesCount. If not present, splitting using CamelCase results in two words: Courses and Count. As the split results in two words, we continue and try to find a property on the source named Courses. If present, we will then try and find a property on the Courses type called Count. If present, then this is our mapping. Otherwise, we return null and the CoursesCount property will not be populated. The code provided allows one level of flattening, but it is pretty straightforward to adapt this to your needs.

Example usage in a simple EF 4.1 application

Now that we have our extension method, let's see how it can be used. Our example uses the latest Entity Framework 4.1 Code First, which is a fantastic addition to EF4 that has been extremely popular with developers since its release a few months ago. It is beyond the scope of this article to explain how EF 4.1 Code First works, so if you are not familiar with it, I suggest reading some of the excellent article available on the web, such as those by Scott Guthrie or the EF Design Blog.

Our simple example uses the following database schema:

Example database schema

We have a student database where each student can take multiple courses, but only has a single teacher. Our code first entities are displayed below:

public class Teacher
{
  public int Id { get; set; }
  public string Name { get; set; }
  public ICollection<Student> Students { get; set; }
}

public class Course
{
  public Course()
  {
    Students = new List<Student>();
  }

  public int Id{ get; set; }
  public string Description { get; set; }
  public ICollection<Student> Students { get; set; }
}

public class Student
{
  public Student()
  {
    Courses = new List<Course>();
  }

  public int Id { get; set; }
  public string FirstName { get; set; }
  public string LastName { get; set; }
  public DateTime DateOfBirth { get; set; }
  public ICollection<Course> Courses { get; set; }
  public Teacher Tutor { get; set; }
}

public class StudentSummary
{
  public string FirstName { get; set; }
  public string LastName { get; set; }
  public string NotADatabaseColumn { get; set; }
  public string TutorName { get; set; }
  public int CoursesCount { get; set; }

  public string FullName
  {
    get { return string.Format("{0} {1}", FirstName, LastName); }
  }
}

public class AnotherStudentSummary
{
  public AnotherStudentSummary()
  {
    Courses = new List<Course>();
  }

  public string LastName { get; set; }
  public ICollection<Course> Courses { get; set; }
}

Teacher, Course and Student classes correspond to database tables of the same name. StudentSummary and AnotherStudentSummary are examples of non-database classes used as projection targets. You can imagine that in a real-life scenario, all the tables would have many more columns, so efficient querying would be much more important.

The standard approach for returning all students as StudentSummary entities, would be to do a manual projection:

var students = from s in context.Students
               select new StudentSummary
               {
                 FirstName = s.FirstName,
                 LastName = s.LastName,
                 TutorName = s.Tutor.Name,
                 CoursesCount = s.Courses.Count
               };

This query seen on its own doesn't look particularly cumbersome, but imagine the dozens or even hundreds of projections that you will require on a large project, typically mapping many more properties than shown in this examples and you will understand how useful this kind of extension method can be. The previous projection can be replaced with just one line:

var students = context.Students.Project().To<StudentSummary>();	

If we look at the SQL generated from this query, we can see that we are returning the minimum necessary to hydrate the StudentSummary object. In fact, the SQL output is exactly the same in both cases:

SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[FirstName] AS [FirstName], 
    [Extent1].[LastName] AS [LastName], 
    [Extent2].[Name] AS [Name], 
    (SELECT 
      COUNT(1) AS [A1]
      FROM [dbo].[StudentCourses] AS [Extent3]
      WHERE [Extent1].[Id] = [Extent3].[StudentId]) AS [C1]
    FROM  [dbo].[Students] AS [Extent1]
    LEFT OUTER JOIN [dbo].[Teachers] AS [Extent2] ON [Extent1].[TeacherId] = [Extent2].[Id]

Taking each property of the StudentSummary object in turn, let's examine how they are being mapped. FirstName and LastName are just standard, same-name mappings from the database, NotADatabaseColumn is a read/write property that is not present in the database. There are many reasons why you may have properties that do not map to database columns and you do not want the projection failing in these cases. TutorName is an example of a flattening. The Student class has a Tutor property of type Teacher and the Teacher class has a Name property, so you can see from the SQL that we join onto the Teacher table and retrieve the Name property. CoursesCount is another flattening. In this case, Courses is an ICollection which of course has a Count property. Again, you can examine the SQL and see that this is being done in a very efficient manner. Finally, FullName is a read-only property, so is ignored by the extension method.

The second example which maps to the AnotherStudentSummary entity shows how simple collection mapping is also supported. Note that if you want to do anything more than map the whole collection without manipulation, then you will need to adapt the code yourself as this is not currently supported.

Code Download

You can download all the code from this article here. Alternatively, just get the QueryableExtensions class without the example code.

Conclusion

When hitting a database, as a developer, it is important to only return the data that you need and no more. When using modern ORM's, this is typically achieved by using projections when writing IQueryable queries. Typing this type of projection code can be tiresome and some people use AutoMapper instead, believing that it achieves the same thing. Unfortunately, AutoMapper knows nothing about IQueryable and only works with in-memory data, making it less than ideal in DAL scenarios. In order for AutoMapper to do its mapping, it needs to retrieve all source data from the database, resulting in much more data being returned than is necessary, reducing performance and increasing database load and network traffic.

The solution mentioned in this article, which was originally conceived by Jimmy Bogard, addresses these issues (boilerplate code vs unnecessarily large database queries), and allows you to automatically map types in your DAL in a simple, efficient manner. The code required to perform a mapping is measured in characters rather than lines and the performance is right up there with the long-winded, manual projection method. The approach is entirely convention-based with no mechanism to override the conventions, so it is certainly not going to be useful in all situations, but I have found it invaluable in the few weeks that I have been using it.

Useful or Interesting?

If you liked the article, I would really appreciate it if you could share it with your Twitter followers.

Share on Twitter

Comments

Avatar for Paul Paul wrote on 06/06/2011

This was an awesome read and is going to save me a lot of time. One of the main reasons I don't like using DTO's was having to write these exceptionally long LINQ queries, almost took the joy out of LINQ.

Thanks again.

Avatar for Sangsu Sangsu wrote on 15/06/2011

This is good approach to make more copact Linq & IQueryable. I'll try your code.

Thanks good post.

Avatar for Daniel Liuzzi Daniel Liuzzi wrote on 20/06/2011

This is an amazing piece of work Paul. I have a question. Do you know if it's possible to re utilize all those cached expressions to do reverse mappings? In other words, using your code I now can auto map POCOs to ViewModels, but it would be great if I could map a ViewModel instance from a POST back to a POCO instance I can feed into EF.

Avatar for Daniel Liuzzi Daniel Liuzzi wrote on 20/06/2011

Regarding my previous question, I came up with the following workaround:

var studentEntity = new[] { studentModel }.AsQueryable().Project().To&lt;Student&gt;().Single();

This works, but it's such a blatant hack I'm almost embarrassed to post it :)

Avatar for Paul Hiles Paul Hiles wrote on 23/06/2011

Regarding cached expressions, we cannot reuse them for reverse mappings because the direction of the mapping results in a different expression tree, so unfortunately they must be cached separately.

As you have discovered, whilst you can use this code outside of an IQueryable scenario by casting, it is quite awkward, but as you say, it does work. I typically use the Project().To() code to map to domain entities and then use AutoMapper to map to and from ViewModels.

Avatar for leniency leniency wrote on 01/08/2011

I made some modifications to allow property names to have arbitrarily long CamelCasing.

https://gist.github.com/1119157

I added a function that does a recursive search for the first property matching the destination name. It has a limitation that it will match along a path with the shortest property name first, so if you have two possible paths to a destination name, it could give unexpected results.

Avatar for Tej Sarao Tej Sarao wrote on 09/09/2011

Thanks for the excellent piece of work.
I added a feature to define the mapping in a attribute with the property. The attribute looks like

public class MappingPathAttribute : Attribute
{
public string Path = string.Empty;
}


[MappingPath(Path="Tutor.Name")]
public string TutorName { get; set; }

I made a change to the BuildBinding function in the github version https://gist.github.com/1119157.

My BuildBinding function looks like

private static MemberAssignment BuildBinding(Expression parameterExpression, MemberInfo destinationProperty, IEnumerable<PropertyInfo> sourceProperties)
{
object obj = destinationProperty.GetCustomAttributes(true).FirstOrDefault(p => p.GetType().Name == "MappingPathAttribute");
string[] sections;
if (obj!=null)
{
string path = ((MappingPathAttribute)obj).Path;
sections = path.Split('.');
}
else
{
sections = SplitCamelCase(destinationProperty.Name);
}
return ResolveProperty(
parameterExpression,
destinationProperty,
sections[0],
1,
sourceProperties,
sections.ToArray());
}

Hope this helps.
Tej

Avatar for Ryan O'Neill Ryan O'Neill wrote on 17/09/2011

I'm impressed, this is going to save me acres of code. I think Entity Framework is excellent but it is missing stuff like this at version 4.2 (not that I blame them, it's going to get there).

Avatar for melnac melnac wrote on 22/09/2011

Very usuful code. I have only a problem when i try it with nullable property in the source object. Are there some workaround for this scenario ?

Thanks.

Avatar for NJ NJ wrote on 17/11/2011

I'm trying to use this in .net 3.5

On Line 53 I get following error:
Error 2 Argument '2': cannot convert from 'System.Collections.Generic.IEnumerable<System.Linq.Expressions.MemberAssignment>' to 'System.Collections.Generic.IEnumerable<System.Linq.Expressions.MemberBinding>' D:\xxx\Extensions\QueryableExtensions.cs 53 123 D:\xxx\

Avatar for Paul Hiles Paul Hiles wrote on 22/11/2011

@NJ - Yes the code relies on generic variance in .NET 4. To get it working for .NET 3.5, I think you can just add the following on to the end of the var bindings = ... statement:

.Cast<MemberBinding>();

Avatar for Dan Plaskon Dan Plaskon wrote on 02/12/2011

It should be noted that the Dictionary used for the cache is not threadsafe; so you could run into issues (as I did) using this from ASP.NET. This could probably be replaced with either a Hashtable.Synchronized instance, or a ConcurrentDictionary (.NET 4.0 only, I believe)

Avatar for jrummell jrummell wrote on 07/12/2011

This is great, thanks for sharing! Do you have any ideas for "configured projections"? That is, when the entity property names don't match the projected type's property names:

.ForMember(p => p.Property, options => options.MapFrom(e => e.OtherProperty))

Avatar for Andrei Ignat Andrei Ignat wrote on 12/01/2012

How could we bind sub-collections?

Avatar for KG KG wrote on 08/02/2012

I'm also interested in how you'd bind sub-collections

Avatar for MG MG wrote on 19/02/2012

This is great but as it seems only for simple bindings. So how could we bind sub-collections?

Avatar for Luis Fernando Luis Fernando wrote on 26/02/2012

Maybe you already know, but this functionality is already in the main repository of AutoMapper.

Take a look here: https://github.com/AutoMapper/AutoMapper/blob/master/src/AutoMapper/QueryableExtensions.cs

And here (on "CreateMapExpression" method on line 202) : https://github.com/AutoMapper/AutoMapper/blob/master/src/AutoMapper/MappingEngine.cs

Avatar for SeriousM SeriousM wrote on 15/03/2012

Hi, good article but please find another motivation to do stuff like that instead of:
"Imagine that you have a Student table in your database with a million rows and 100 columns storing every piece of information you could imagine about each student."
This sentence just tell me that you have no idea how to normalize your database.

Avatar for Magnus Gladh Magnus Gladh wrote on 21/03/2012

To SeriousM:

Normalization of the database is greate in some scenarios and not so greate in other scenarios.

I have an object that I want to store to my databas and if I normalizes it as I should (or as Entity Framework wants to) then I will get around 20 tables. That's not a problem if I wasn't added around 3 millions of records each month to the database. So If I don't want to kill my database and doing 20 Inner joins when I want to get that object. I need to denormalize my data and tables. Some data will simple be serialized down as a blob to a column and other tables will be merged togheter to decreese the numbers of tables and joins I need to do to get the full object back later on.

So normalization is good in some scenarios, but denormalized is better if you have performance issues...

CQRS actually have a nice touch of denormalized data where each read question is it's own table. Performance is great with this approach, but it has it's cost!!

Avatar for Heribert Heribert wrote on 19/04/2012

Absolutely great and simple to use!!!
Saved me enormous amount of work!

Thanks for sharing this.

Avatar for Martin Martin wrote on 19/04/2012

This is great, but I wonder: Is it possible to make this work with nested types too?

I have the database objects House, and Garage. House has a relation to Garage. A house has one garage, and for both I have DTO objects.

But running the projection then returns me:
System.NotSupportedException
The LINQ expression node type 'Invoke' is not supported in LINQ to Entities.

Avatar for Stuart Ferguson Stuart Ferguson wrote on 18/06/2012

Has anybody managed to make an update to allow usage with Nested Objects so that they may be projected to as well?

Avatar for John John wrote on 23/04/2013

I'm getting the following error: "The method 'Where' cannot follow the method 'Select' or is not supported.".

see question on Stackoverflow: http://stackoverflow.com/q/16174351/1133338

Avatar for Udik Udik wrote on 09/05/2013

Am I the only one who sees the creation of independent classes for each column combination as a huge code smell? If your Student record has 100 columns, there are 2^100 possible partial classes. Every time you change your db structure (for example modifiy a field type) or want to add a behaviour you'll have to change all the classes that include that field or that behaviour. Every time you want a new field in your web report, you'll need to modify the models of your app. I recognize straight away an application that will become a maintenance nightmare in a few years.

Avatar for Rafa Rafa wrote on 04/07/2013

Hi, THIS IS AN EXCELLENT ARTICLE! really out of box.. but I have a question with "CoursesCount is another flattening. In this case, Courses is an ICollection which of course has a Count property" still I cannot see Hhow or when you call or somethong the .Count() method .... =S

You said that as Course is an ICollection has a Count() Method ,that's ok but where or how is call .....

bottomline ..WE SHOULD NOT USE AutoMapper?

Thanks for the help!

Comments are now closed for this article.