2014-02-07

Optimizing Entity Framework Using View-Backed Entities

I was profiling a Web application built on Entity Framework 6 and MVC 5, using the excellent Glimpse. I found that a page with three lists of five entities each was causing over a hundred query executions, eventually loading a huge object graph with hundreds of entities. I could eliminate the round trips using Include(), but that still left me loading way too much data when all I needed was aggregate/summary data.

The problem was that the aggregates I needed were complex and involved calculated properties, some of which were based on aggregates of navigation collection properties: a parent had sums of its children's properties, which in turn had sums of their children's properties, and in some cases parents had properties that were calculated partly based on aggregates of children's properties. You can see how this quickly spun out of control.

My requirements were that the solution had to perform better, at returning the same data, while allowing me to use standard entity framework, code first, with migrations. My solution was to calculate this data on the server side, using entities backed by views that did the joining, grouping, and aggregation. I also found a neat trick for backward-compatible View releases:

IF NOT EXISTS (SELECT Table_Name FROM INFORMATION_SCHEMA.VIEWS WHERE Table_Name = 'MyView')
    EXEC sp_executesql N'create view [dbo].[MyView] as select test = 1'
GO
ALTER VIEW [dbo].[MyView] AS
SELECT ...

It's effectively upsert for views - it's safe to run whether or not the view already exists, doesn't ever drop the view if it does exist (leaving no period where a missing view might cause an error), and it doesn't require keeping separate create and alter scripts in sync when changes are made.

I then created the entities that would represent the views, using unit tests to ensure that the properties now calculated on the server matched expected values the same way that the original, app-calculated properties did. Creating entities backed by views is fairly straightforward; they behave just like tables, but obviously can't be modified - I made the property setters protected to enforce this at compile time. Because my View includes an entry for every "real" entity, any query against the entity type can be cast to the View-backed type and it will pull full statistics (there is no possibility of an entity existing in the base table but not in the view).

Next I had to create a one to one association between the now bare entity type and the view type holding the aggregate statistics. The only ID I had for the view was the ID of the raw entity it was connected to. This turned out to be easier said than done - entity framework expects that, in a one to one relationship, it will be managing the ID at one end of the relationship; in my case, the ID's at both ends were DB-generated, even though they were guaranteed to match (since the ID in the view was pulled directly from the ID in the entity table).

I ended up abandoning the one-to-one mapping idea after a couple days' struggle, instead opting to map the statistics objects as subclasses of the real types in a table per type structure. This wound up being relatively easy to accomplish - I added a table attribute to the sub type, giving the name of the view, and it was off to the races. I went through updating references to the statistics throughout LINQ queries, views, and unit tests. The unit and integration tests proved very helpful in validating the output of the views and offering confidence in the changes.

I then ran my benchmarks again and found that pages that had required over a hundred queries to generate now used only ten to twenty, and were rendering in half to a third the time - a one to two hundred percent improvement, using views designed purely to mimic the existing functionality - I hadn't even gone about optimizing them for performance yet!

After benchmarking, it looks even better (times are in milliseconds, min/avg/max):

EF + LINQEF + Views
3 lists of 5 entities (3 types)360/785/167560/105/675
2 lists of 6 entities (1 type)325/790/193590/140/740
1 entity's details + 1 list of 50 entities465/975/268590/140/650

These tests were conducted by running Apache JMeter on my own machine against the application running on Windows Azure, across a sampling of 500 requests per page per run. That's a phenomenal 450 to 650 percent improvement across the board on the most intensive pages in the application, and has them all responding to 100% of requests in under 1 second. The performance gap will only widen as data sets grow; using views will make the scaling much more linear.

I'm very pleased with the performance improvement I've gotten. Calculating fields on the app side works for prototyping, but it just can't meet the efficiency requirements of a production application. View-backed entities came to the rescue in a big way. Give it a try!

No comments: