Can Fluent DSL's exist in big data environments?

Question

The way I understand Fluent Domain Specific Languages I am able to use method chaining to have a conversation with the code. For example, if the business requirement is to call the database in order to "Get all customers with InActive accounts" I could use:

Customers().WithInActiveAccount()

Customers has millions of rows. Pulling all customers into memory is not efficient when I only need a subset of customers (and maybe not even possible given memory constraints). I suspect ORM's solve this problem by treating code as data, lazy-loading and building a complete query based off the entire expression. So the final query may be

SELECT * CUSTOMERS WHERE InActive = true

IME, when dealing with highly normalized tables ORM's produce inefficient DB queries. Rolling yet another custom ORM to solve such an issue feels like a death march waiting to happen. And stored procedures written by a DB professional are going to be efficient.

In this simple case I can simply change customers to an object:

Customers.WithInactiveAccount()

What if I need to do something more complex?

Customers.WithInactiveAccount().BornAfter(October 1, 1990)

How do I efficiently build up queries as I build more advanced expressions that potentially draw in other entities? This is a question I'm sure every ORM asks themselves right in the early stages of development. Do I have to limit myself to "dumb queries" to maintain performance? If this a technique that exists?

These are the types of questions I find myself getting from developers like me that have experienced across the board performance problems with ORM's in the big data world.

So when dealing with these types of normalized Databases is a fluent DSL a practical option? (I'm assuming a fluent DSL for DB access requires an underlying ORM to function)

It's unclear what you are asking. You write the `Customers()` method. If you don't want it to go to the database and load all rows, then, well, don't write it to do that! There's really not more to it than that. — Jörg W Mittag, Feb 11 '15 at 16:09
This is a contrived example. Yes, you are correct in this simple example I can do just that. But, the question of fluent DSL's for more complicated queries still stands. Looking at ORM's performance in real time on a real DB I experience real performance problems from sub-optimal queries. — P.Brian.Mackey, Feb 11 '15 at 16:16
I still don't see what this has to do with whether you are building your query using a fluent DSL or not. The performance depends on how you optimize the query, not whether you built it by chaining methods. — Jörg W Mittag, Feb 11 '15 at 16:37
@JörgWMittag - "I'm assuming a fluent DSL for DB access requires an underlying ORM to function." If an ORM can produce optimized queries in a big data normalized environment (which I explained I have not seen one that does) then I can build fluent interfaces that use said ORM. Or, if I can build a fluent DSL that does not rely on an ORM that also allows me to construct complicated queries then that's a DIFFERENT option. Anyone that has not worked in a big data normalized environment with DBA's will be unlikely to answer this question. — P.Brian.Mackey, Feb 11 '15 at 16:45
If your ORM, is generating poorly performing queries because of the normalisation of your tables, have you considered linking it to a hand-constructed view rather than the raw tables? — Jules, Feb 11 '15 at 19:57
Would LINQ qualify as a 'fluent DSL' according to your definition? — Benjamin Hodgson, Feb 11 '15 at 21:47
@BenjaminHodgson - No. LINQ does not read like natural language and it's geared towards abstraction. "Select, With": these are broad terms. Fluent DSL should read naturally and the language is domain specific. `AllCustomers()` is more along the lines of fluent DSL language. Imagine a language created to minimize the communication gap between the business heads and the developers. @Jules that is something we have not considered. — P.Brian.Mackey, Feb 11 '15 at 22:02

score 3 · Answer 1 · edited Feb 11 '15 at 21:29

First, let's clarify terms a little...

The term DSL is enormously wide. SQL, HTML, LOGO, Mathematica, are all DSLs. You are talking about referring\ querying your data model according to its actual structure in a strongly typed manner.

Fluent means method chaining so your source looks more like English and less like a programming language. like so: Noun().Adjective().Verb().Adverb(). This is not the only or even the best way to form queries.

Big-Data usually refers to data that can not be efficiently stored and queried using RDBMS. This means Big-data and "normalized" are mostly mutually exclusive.

Now regarding your question. First of all I'm answering based on my experiance of several years you using C#, F#, some C++, and some Java, NHibernate, MS-SQL, PostgreSQL, and some MongoDB, and some Hadoop, mostly on pretty big data-sets.

"Fluent" is a bad idea. It's usually harder to write, and tends to be misleading to the reader. it's also a lot less "discoverable" you need to learn an entire vocabulary to use and understand a given "fluent" API.
Using an ORM (NHibernate, Hibernate, Entity Framework), is better than manipulating data by yourself. This is not always true, and you should always test, optimize, and understand what your ORM is doing and why. This involves a pretty significant learning curve, you need to understand your ORM, you need to understand how to create a correct mapping, and how to control the way queries are generated. On the other hand if you know what you are doing about ~98% of the time using an ORM is the fastest way to create the best and most performant solutions, with the least effort. ~2% of the time you end up going to the DBA, you write a stored procedure or some SQL, and you use it from within the ORM...
You should have a proper DAL layer, handling data manipulation. Using an ORM doesn't remove the need to build DAL.
Writing queries and manipulating data in your programming language, in a strongly typed way is a great idea. It's fast, verified by the compiler, and very convenient. C# has a special feature called LINQ that enables querying various data sources, these include: C#'s collections, XML, RDBMSes, ODATA sources, many other structured, non structured, real big-data (MongoDB, Cassandra(?), Hadoop), and ORMs such as NHibernate and Entity Framework. Hibernate\NHibernate also have a "Fluent" quarry language called Criteria, and a special non-strongly typed (strings) language called HQL. NHibernate's Linq provider also has some limitations. Usually the strongly typed options are preferable, but still it's very important to understand them thoroughly.
You seem to not "believe" in ORMs... I think this comes from not being familiar, and lacking experience of working with them. I assure you all of the questions you are asking have been considered, and addressed, by some of the best developers in the industry.

I believe I clarified the specific type of DSL system I'm looking for in the question. The point of this DSL is not to generate excitement, rather to assist with business communication. At a large hospital system I worked on a DB with many TB's of data. The system is SQL Based (RDBMS). However, the data was too broad and spread out to create performant queries. In this scenerio RDBMS and big data is not mutually excluse. You can have an RDBMS that is not query-able. I've worked with MS master SQL DBA's that had nothing but negative comments regarding ORM's and their performance penalties. — P.Brian.Mackey, Feb 11 '15 at 19:27
@P.Brian.Mackey What isn't true? Have you personally used an ORM extensively? Have you optimized applications that used an ORM? Of course that if a seasoned DBA would write code to get something from a DB he will do a better job then a **generic** ORM. That's not the point. it's a cost- benefit question. An ORM allow's you to write more succinct, more business oriented, and very performant code, faster. — AK_, Feb 11 '15 at 19:30
Performance is the point when dealing with the specific subset of Big Data for reasons you stated in this answer. — P.Brian.Mackey, Feb 11 '15 at 19:32
Look, what your'e asking for is possible. Creating a layer that allows writing fluent quarries on a specific data source. But it's a pretty hard problem. you can go and look how to implement a LINQ provider using Expression Trees, or look at the NHibernate source to see how Criteria is implemented. The thing is it's a pretty bad idea, unless you are just going to wrap NHibernate or another decent ORM. You would be much better building a decent DAL maybe with a different module to handle "business quarries" — AK_, Feb 11 '15 at 19:50
Thanks. I appreciate your answer and your feedback. I will give it some more thought and come back later. I'm also researching another technique that may help answer this difficult question, CQRS: http://martinfowler.com/bliki/CQRS.html — P.Brian.Mackey, Feb 11 '15 at 20:11

Can Fluent DSL's exist in big data environments?

1 Answers1