I have been using Entity Framework for a few years. I have flip-flopped between calling out to repositories in my business logic or using lazy loading to retrieve data as I work my way through the code.
The problem with the first approach is that a single business object can end up relying on a lot of repositories (in the case of repository per table). The resulting logic is heavily mixed with calls out to the data layer, which makes it harder to test (lots of mocks).
The problem with the second approach is that lazy loading leads to surprises sometimes. Novices call out to the database within loops without realizing it. Even using Include
requires a lot of care. Too many Include
s and the generated query can be very slow.
So I was getting a little frustrated. A few weeks ago, I discovered the newer DbEntityEntry<TEntity>
classes in Entity Framework. This allowed me to load related entities explicitly and check if they were already loaded. This led to a hybrid approach where I pass a single root object and a "loader" class. Before accessing a navigation property, I would ask the loader to load it. If the object was already loaded, it would just do nothing. Since it didn't cost anything, I just called it every time I referenced a navigation property.
I have added this class to an existing open source project for anyone wanting to see it: https://github.com/jehugaleahsa/TestMvcApplication/blob/master/DataModeling/EntityLoader.cs.
A co-worker pointed out this class really only solved some problems. Just like lazy-loading, it meant either using Include
s or dealing with multiple database hits within a loop. Then the other day I had another idea: I could grab the related entities for a collection of entities. For instance, getting all of the roles for a list of users. This way, I can grab the all of the related entities in a single database hit outside of a loop.
I ended up with this class: https://github.com/jehugaleahsa/TestMvcApplication/blob/master/DataModeling/EntityCollectionLoader.cs. This class dynamically builds a query based on the entities. Under the hood, it builds a join that filters the related entities by the main entities' primary keys.
Both loader classes provide methods that return an IQueryable<TRelated>
to allow for further filtering, sorting, etc. The problem with these methods is that it is impossible to know whether the related entities have already been loaded, so they always result in a database hit. This is still useful for doing multiple Include
s in a single query.
My business objects now just accept a loader factory in their constructors. The business object method getting called just takes the top-level object(s). So, I really only have a single dependency on my data layer. This dependency is hidden behind an interface.
I am wondering what the drawbacks are to this approach. It still mixes calls to the data layer object, throughout, except now I can navigate to the results from the top-level object's navigation properties. Plus it is a single dependency that is easily mocked out (everything becomes a no-op). It can be used right along with lazy-loading so it won't be a disaster if you forget to call Load
first. At first I was concerned that I would be building queries in my business layer, but I realized I was just saying Load(u => u.Role)
and then immediately accessing user.Role
. The loader class and Entity Framework are dealing with how to populate the navigation property.
Am I missing something? This seems like a really awesome compromise between pure lazy-loading and eager-loading.
Update
I've actually moved this code into its own GitHub repository and NuGet package: https://github.com/jehugaleahsa/EntityLoaders