-4

Goal: write code to express SQL queries in C# that get converted to SQL at the data layer level and will be compatible with any common data layer / ORM such as Dapper or Entity Framework (EF).

Edit:

And, ability to dynamically build queries at runtime

Justification: decouple the querying aspect of the data infrastructure away from the data layer itself. Query code would not need to be rewritten another data layer is swapped in.

Here is some example EF query code:

var query = context.Students
                   .where(s => s.StudentName == "Bill")
                   .FirstOrDefault<Student>();

Here is some example Dapper code:

var customer = connection.Query<Customer>("Select * FROM CUSTOMERS WHERE CustomerName = 'Mark'").ToList();

Here is some example RepoDb code:

var person = connection.Query<Person>(e => e.FirstName == "John" && e.LastName == "Doe").FirstOrDefault()

EF and RepoDb use expressions which are good, but they are not necessarily compatible with each other. The code might be similar, but it wouldn't be trivial to replace one with the other.

I have implemented something similar to this in the past, and it worked very well across three database platforms. There was no need for raw SQL:

var selectStatement = new SelectStatement
{
    TableName = "OrderLine",
    Joins =
    {
        new Join
        {
            JoinType =  JoinType.Inner,
            On = 
            {
                new ComparisonPredicate
                {
                    Colname = "OrderId",
                    RelationalOperator =  RelationalOperator.Equal,
                    Value = "Order.Id"
                }
            }
        }
    },
    Where =
    {
        new ComparisonPredicate 
        { 
            Colname = "Order.Id",
            RelationalOperator =  RelationalOperator.Equal,
            Value = new Guid("c16a0181-f984-48b2-949e-4ce96a323ea8")
        }
    }
};

var orderLines = dataLayerAbstraction.GetAsync<OrderLine>(selectStatement);

You might ask wouldn't IQueryable do the trick?

Perhaps...

IQueryable aims at achieving exactly what I have mentioned here. It is a strongly typed way of expressing a query in C#. However, my real question and point of discussion here is: is IQueryable expressive enough for 99% of SQL queries on any platform in an efficient way? Here is some context. Some explicitly recommend against abstracting queries in favor of writing a method for each possible query. Also, implementing IQueryable is very onerous and if you were to try it, you'd need to implement your own version for each data layer that doesn't already implement it.

I often see code that drops out to raw SQL because expressions are either not suitable to express the query or create bad SQL that is not efficient.

I can say from experience that a similar object model I've pointed out here allows fine-tuning of the query and is independent of the database platform. I never ran into queries that I couldn't express this way, and I was always able to optimize the query to get the best out of the database platforms.

Is there a library out there that takes this approach? Has anyone else achieved some form of decoupling of querying from the data layer?

Note: GraphQL is probably the most notable successful implementation of decoupling the querying of data from the physical data layer. For example GraphQL.EntityFramework maps GraphQL queries to Entity Framework. this approach in the Dapper extensions is also notable.

PS: I'd also be keen to hear if anyone has ever successfully implemented a library which parses SQL to an object model, and then re-renders the SQL for the specific platform.

  • 1
    *"implemented a library which parses SQL to an object model"* - the question is what purpose this is designed to serve? Mainstream OO languages lack the power in their type system to statically reason about an SQL query, and platforms don't just differ in syntax but also in actual functionality. And what does *"decoupling of querying from the data layer"* even mean? The relational model already logically decouples querying from the physical storage - the "data layer" is the query function, so what would it mean to decouple querying from the layer whose very purpose is querying? – Steve Jul 19 '20 at 01:38
  • Take a look at the three query examples above. They are all different. The question is: can there be a higher level of abstraction that would allow you to swap the data layer out without changing the query. The answer is yes. GraphQL proves this point for example. It is possible to query data with GraphQL and any number of data layers. I'm more or less canvassing different approaches to achieve similar results. – Christian Findlay Jul 19 '20 at 02:02
  • If I understand correctly, the "three queries" are all parts of ORM frameworks, and most ORMs allow you to swap the underlying brand of database technology. I'm not clear that there needs to be a "higher level of abstraction", because they are all basically just variations in syntax and ceremony, and your "SelectStatement" block just seems to be a more long-winded way of writing plain Ansi SQL. – Steve Jul 19 '20 at 03:03
  • The goal would be to write one query that could work with Dapper, EF, or some other data layer. For example, you could write the query with GraphQL syntax, and then the data layer (say EF) would convert that GraphQL to SQL. There's no trickery here. It's not hard to understand. GraphQL would be a higher layer of abstraction for example. – Christian Findlay Jul 19 '20 at 03:36
  • Not all database platforms implement plain ansi sql – Christian Findlay Jul 19 '20 at 04:16
  • The article you link to claims that IQueryable is difficult to unit test, but that simply isn't true. EF is already agnostic to any particular flavour of SQL and already has providers for about two dozen different database technologies, including some intended specifically for unit testing. Queries are expressed using LINQ syntax, then the provider converts the LINQ into something which works with the underlying DB, decoupling querying from the DB technology, allowing for easy unit testing of queries with EF's in-memory DB or sqlite, or even CSV files with the 'Effort' library. – Ben Cottrell Jul 19 '20 at 08:11
  • So you want to add an abstraction on top of EF/Dapper/RepoDb just to be able to switch between them easily? I wouldn't as it would probably end up too complicated with limited benefits, only working for the simplest of queries. How would it handle window functions, for example? – D. Jurcau Jul 19 '20 at 08:34
  • @BenCottrell I agree. The article dismisses the approach without a valid reason. However, implementing IQueryable for Dapper etc. would be very onerous. I have done some experimentation I can extract SQL from expressions using EF, but this isn't anywhere near the solution I was hoping for. – Christian Findlay Jul 19 '20 at 08:43
  • @D.Jurcau precisely. Yes. – Christian Findlay Jul 19 '20 at 08:43
  • 2
    @ChristianFindlay This rather sounds like a solution looking for a problem to me in that case. Why would you want to switch EF to another ORM if doing so didn't give you any different capabilities? EF already decouples querying from the data layer, allows easy unit testing and works against a dozen different database providers. Keep in mind the drawbacks of the new abstraction only being able to use a subset of EF's capabilities, so fewer opportunities to optimise a query, added maintenance cost of an extra layer of complexity, and seemingly a more verbose and less intuitive syntax too. – Ben Cottrell Jul 19 '20 at 23:09
  • @BenCottrell why would you spend your time arguing this point? I hear what you are saying, and it's not as though I haven't thought that through. But, I'm asking to hear from people who have explored this - not from people who think it's a bad idea and should not be entertained. – Christian Findlay Jul 20 '20 at 00:56
  • 1
    @ChristianFindlay I'm asking to try to clarify the problem you're trying to solve. As per the help centre for this site: https://softwareengineering.stackexchange.com/help *"You should only ask practical, answerable questions based on actual problems that you face"* -- I may be wrong, but it rather sounds to me as if what you're asking isn't based on an actual problem. Otherwise, if it is based on an actual problem, then it seems that EF already solves it? – Ben Cottrell Jul 20 '20 at 14:05
  • 2
    @ChristianFindlay, the problem is you have to be prepared to accept that people *have* explored this, and *as a result* they consider it "a bad idea that should not be entertained". As I'd already pointed out, normal OO languages do not have sufficiently powerful type systems to model an SQL query in terms of objects (your stated "goal"). The best can do (as you show) is model the structure of the SQL source code, which just increases the overall length of code necessary to express a query, and is (for that reason and others) a practically worthless activity. (1/2) – Steve Jul 20 '20 at 15:33
  • 2
    A number of us now, as well as addressing your question as it is, have asked further questions designed to identify whether we are misunderstanding something or looking at your question in the wrong way, and you mostly respond in a shirty tone. If you already know this is a brilliant and totally workable idea, then why are you asking for others' opinions? (2/2) – Steve Jul 20 '20 at 15:49
  • To sum up, I've had requirements where a) no single data layer exists on all platforms, and b) queries need to be extricated from (or, stored outside of) the compiled code. Yes, this increases the overall length of code, but this is not a worthless activity. I have done it and it works. GraphQL also exemplifies the concept well. The reason I am asking is to hear about the approach of others. I accept your opinion. Why must this be a debate? Why not just let others answer? – Christian Findlay Jul 20 '20 at 23:30
  • @Steve Martin Fowler doesn't seem to think that this is a "practically worthless activity" https://martinfowler.com/eaaCatalog/queryObject.html – Christian Findlay Jul 23 '20 at 01:39
  • 1
    @ChristianFindlay, Fowler doesn't exactly make a case for it either, and what he does say is wildly ambiguous. I'm not sure whether what he describes is covered by ordinary ORM packages, but you're aware of them and want an "abstraction" of them, yet when pressed for further explanation or clarification you've responded waspishly. This whole discussion has become Kafkaesque frankly - you asked me a couple of days ago to "just let others answer", but none have, and you've finally tagged me again. It remains unclear how to help you, or what an answer to your question should look like. – Steve Jul 23 '20 at 02:39
  • 1
    wonder if you are aware of [object-relational impedance mismatch](https://softwareengineering.stackexchange.com/a/120338/31260) – gnat Aug 01 '20 at 12:19
  • @gnat Wikipedia: "The object-relational impedance mismatch is a set of conceptual and technical difficulties that are often encountered" , "Objects (instances) reference one another and therefore form a graph in the mathematical sense (a network including loops and cycles). Relational schemas are, in contrast, tabular and based on relational algebra, which defines linked heterogeneous tuples". This is the exact technical difficulty that ORMs seek to overcome. I haven't answered OP yet, but I point you to LINQ Expression trees which do an excellent job of expressing queries as objects. – Christian Findlay Aug 01 '20 at 23:11

1 Answers1

1

Language Integrated Query (LINQ)

What is a query and what does it do?

A query is a set of instructions that describes what data to retrieve from a given data source (or sources) and what shape and organization the returned data should have. A query is distinct from the results that it produces.

LINQ is the answer to my question. LINQ allows developers to express queries as an abstract data model. It does

decouple the querying aspect of the data infrastructure away from the data layer itself

In the OP, I asked the question:

Is IQueryable expressive enough for 99% of SQL queries on any platform in an efficient way?

The answer is partially yes. Entity Framework is proof that it mostly works, and other ORMs implement functionality for rendering other types of LINQ expressions to SQL and other query types. There may be some cases where dropping out to SQL is necessary for some reason or another, but these are the exceptions and not the rule.

My Repo DB example above is not compatible with IQueryable as for as I can tell, but the Query method is compatible with the same LINQ expression signature as EF's Where method. The real point is that the LINQ expressions are used by both platforms.

Dynamically Building Queries at Runtime

LINQ supports this. The other goal that was not explicitly stated in the OP, although added it to the OP. It is often necessary to dynamically build query objects. This is no problem for LINQ:

How to use expression trees to build dynamic queries (C#)

Dynamic queries are useful when the specifics of a query are not known at compile time. For example, an application might provide a user interface that enables the end-user to specify one or more predicates to filter the data. In order to use LINQ for querying, this kind of application must use expression trees to create the LINQ query at runtime.

My code example above can easily be expressed as a LINQ expression and can also be easily built at runtime. The code sample is based on code from 2008 when LINQ was only new. It works. If you didn't want to use LINQ, you could take this approach, but LINQ basically makes this approach redundant.

Note on Dapper: Dapper doesn't directly implement IQueryable, but there is no reason why someone couldn't write a provider that renders LINQ queries to SQL that works for the Dapper micro ORM.