99

Is it considered an anti pattern to hardcode SQL into an application like this:

public List<int> getPersonIDs()
{    
    List<int> listPersonIDs = new List<int>();
    using (SqlConnection connection = new SqlConnection(
        ConfigurationManager.ConnectionStrings["Connection"].ConnectionString))
    using (SqlCommand command = new SqlCommand())
    {
        command.CommandText = "select id from Person";
        command.Connection = connection;
        connection.Open();
        SqlDataReader datareader = command.ExecuteReader();
        while (datareader.Read())
        {
            listPersonIDs.Add(Convert.ToInt32(datareader["ID"]));
        }
    }
    return listPersonIDs;
}

I would normally have a repository layer etc, but I have excluded it in the code above for simplicity.

I recently had some feedback from a colleague who complained that SQL was written in the source code. I did not get chance to ask why and he is now away for two weeks (maybe more). I assume that he meant either:

  1. Use LINQ
    or
  2. Use stored procedures for the SQL

Am I correct? Is it considered an anti pattern to write SQL in the source code? We are a small team working on this project. The benefit of stored procedures I think is that SQL Developers can get involved with the development process (writing stored procedures etc).

Edit The following link talks about hard coded SQL statements: https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/hard-coded-sql-statements. Is there any benefit of preparing an SQL statement?

w0051977
  • 7,031
  • 6
  • 59
  • 87
  • 34
    "Use Linq" and "Use stored procedures" are not reasons; they are just suggestions. Wait two weeks, and ask him for reasons. – Robert Harvey May 14 '17 at 15:04
  • @Rovert Harvey, thanks. I hope to see him in two weeks. However, it may be more like two months or a year. He is a contractor who comes in now and then. Would you describe the SQL as being "hardcoded into the application" without a repository layer? Perhaps that is it. – w0051977 May 14 '17 at 16:09
  • 49
    The Stack Exchange network uses a micro-ORM called Dapper. I think it's reasonable to say that the vast majority of Dapper code is "hardcoded SQL" (more or less). So if it's a bad practice, then it's a bad practice adopted by one of the most prominent web applications on the planet. – Robert Harvey May 14 '17 at 16:28
  • 18
    To answer your question about repositories, hard-coded SQL is still hard-coded SQL no matter where you put it. The difference is that the repository gives you a place to *encapsulate* the hard-coded SQL. It's a layer of abstraction that hides the details of the SQL from the rest of the program. – Robert Harvey May 14 '17 at 16:29
  • 4
    You are intertwining two programs in the same source. This usually mean that it is harder to understand and/or the IDE cannot help you. – Thorbjørn Ravn Andersen May 14 '17 at 16:55
  • 5
    Regarding your edit: "prepared" SQL statements are always a good thing to do. They sanitize input parameters and prevent SQL injection attacks. Contrast with concatenated sql strings, which are not safe because they are subject to SQL injection. – Robert Harvey May 14 '17 at 18:39
  • 27
    No, SQL within code is not an anti-pattern. But that's an awful lot of boiler plate code for a simple SQL query. – GrandmasterB May 14 '17 at 21:53
  • 47
    There's a difference between 'in the source code' and 'spread all over the source code' – tofro May 15 '17 at 07:18
  • 7
    Note that stored procedures result in your code running in two places instead of one. That is also more complex. – Thorbjørn Ravn Andersen May 15 '17 at 07:33
  • 6
    You have to talk to the database in some fashion. Having the SQL there puts you in control of what statements are being run, as opposed to an ORM that is generating them for you. – Jon Raynor May 15 '17 at 15:48
  • 6
    @RobertHarvey: StackOverflow does use Dapper (and so do I, it's great!), but Dapper does not require or encourage SQL to be mixed directly into C# source code. – Charles Burns May 15 '17 at 16:05
  • 2
    @CharlesBurns: That's news to me. All of the examples on the [Github Dapper home](https://github.com/StackExchange/Dapper) page are parameterized SQL, and that's how I've generally used Dapper. – Robert Harvey May 15 '17 at 16:09
  • 1
    @RobertHarvey: Those are just usage examples, not design recommendations. See my post on the thread. – Charles Burns May 15 '17 at 16:22
  • @GrandmasterB: ADO.NET code of the kind illustrated in the OP was always rather verbose. Few people write code like that anymore, unless it's just a simple utility. – Robert Harvey May 15 '17 at 16:22
  • 10
    I would note that *representing database operations as strings* leads to the belief that *techniques for manipulating strings are a good way to manipulate databases*. But this leads to a variety of ills, such as injection attacks. – Eric Lippert May 15 '17 at 16:54
  • How is using LINQ 2 SQL not embedding the inner-level details of queries into the source code, which is the crux (tight coupling) of the complaint about writing SQL in source code. I say this knowing how popular LINQ2SQL has become in certain quarters, and also knowing how slow it actually is in comparison to many alternatives. And I totally agree that embedding SQL in imperative source code is often, though not always, an anti-pattern. – Craig Tullis May 15 '17 at 17:11
  • 2
    SQL injection is not a problem if you use parameterized queries. – Craig Tullis May 15 '17 at 17:15
  • 2
    Embedding SQL isn't a problem per se. Using basic ADO.NET code in 2017 where it is most certainly unnecessary (developing for compact framework? No? Well then I'm out of ideas why you'd want to do it) on the other hand is pretty awful. There are just so many superior options available that are much less verbose and more powerful at the same time. – Voo May 15 '17 at 20:28
  • 3
    The real problem in this code is not the `select id from Person` part, but all the code around it. With proper technology choices, architecture, and API design, the whole method should get down to something like `public List getPersonIDs() { return appDb.find("select id from Person"); }`. – Rogério May 16 '17 at 03:37
  • listPersonIDs.Add(datareader.GetInt32(0)); is more efficient and wrap the reader in a using – paparazzo May 16 '17 at 15:48
  • 4
    @EricLippert The problem is that databases actually accept database operations as strings. You can put some abstraction over that (and in many cases, you should), but it has all the issues of an abstraction (it might not expose all the operations of the underlying database, it has performance overhead, etc.). – svick May 16 '17 at 16:46
  • In my humble opinion it has anything to do neither with patterns nor with anti-patterns. A software design pattern is something like strategy, or visitor, or decorator, you can read more about them in the [wikipedia article](https://en.wikipedia.org/wiki/Software_design_pattern) – Andrew Savinykh May 16 '17 at 19:58
  • @Craig Is LINQ-to-SQL still popular? In any event, LINQ is a generic querying syntax, unrelated to SQL; the same syntax can be used to query in-memory collections and XML. I could write `var qry = from p in db.Persons where p.LastName.StartsWith("a") select p;` whether `db.Persons` is a simple `List` or a more complex ORM object such as Entity Framework's `DbSet`. (I assume you aren't saying that the outer layers don't need to have knowledge of a `Person` entity.) – Zev Spitz May 17 '17 at 11:36
  • 1
    Entity framework sucks. Sprocs are good, but can be a pain. Just use dapper.net – TheCatWhisperer May 17 '17 at 17:52
  • Is it a pattern to call everything that's wrong an anti pattern? – Roberto May 18 '17 at 04:28
  • @ZevSpitz Sure, LINQ is a more or less generic query *syntax*, but SQL is a standards-based *language* with a long, stable history of well-defined behavior. This specific context is about LINQ-to-SQL, because the question is about SQL. LINQ is essentially a naming convention for extension methods (thank you NextStep and Apple!) which represent common filtering and selection behaviors. In that respect, LINQ is fine, but it still generates SQL to interact with databases, adding overhead and **still** has the issue of relatively tight coupling to the database structure. – Craig Tullis May 18 '17 at 05:40
  • @Craig _has the issue of relatively tight coupling in the database structure_ -- If you mean that the query (however it is written) must know about the domain entities, then I don't see how it is ever possible to get around that -- at every point there is knowledge of a `Person` type, with a `string LastName` and a `string FirstName` attributes. – Zev Spitz May 18 '17 at 08:55
  • @Craig _generates SQL to interact with databases_ -- Isn't that the point? The LINQ queries that I write don't have to generate SQL, or even a given variant of SQL; everything depends on the provider of the query source, and the provider could theoretically be defined at runtime. The same LINQ can be run against an in-memory collection, XML, or a web service, as long as the starting point of the query is `IQueryable`. – Zev Spitz May 18 '17 at 08:58
  • 3
    I'm a little more concerned that the `SqlDataReader` isn't wrapped in a `using` construct :) – Jesse C. Slicer May 18 '17 at 19:10
  • @ZevSpitz in order to interact with any relational database, the provider does *always* generate SQL. Pretending it’s irrelevant because it’s abstracted away doesn’t change that, and like any ORM-like SQL-generating mechanism, it will often not produce the most efficient SQL. – Craig Tullis Oct 01 '17 at 18:47
  • You (can) loosen the coupling between the application code *and the internal table structure of the database* by programming against clean functional interfaces. Store procedures are one approach to this (and one I like a lot). – Craig Tullis Oct 01 '17 at 18:50
  • Talking about c# and tsql: 1. Not antipattern but without SQLParameter can be an injection risk. 2. Can offer performance benefits over stored procedures, due to nature of precompilation caching. 3. Better to not have dba /admins writing query code 4. Better to have query in code than migrate a db! – Sentinel Apr 11 '18 at 06:06
  • Regardless of pattern or anti-pattern, for or against embedded SQL *depends*, as usual with anything IT. For example, if you're a corporate developer working on a WinForms desktop application, deployed to 100 PCs, do you really want embedded SQL? No, because if you have to fix a query, you have to redeploy software to 100 PCs. As Stored Procedures, the fix is real-time in one place. – HardCode Feb 27 '19 at 21:51
  • @RobertHarvey. While I wont dispute that StackOverflow have written a lot of SQL, and dumped it into their code base. I wouldnt suggest it for any other developer - almost ever. I’d bet that the SO codebase is a nightmare to work in for most developers and for the simple fact that its a tonne of code that I would argue doesnt belong in normal (logic) code, I would nearly always suggest using an ORM. Inline SQL is horrible to work with – Robert Perry Oct 07 '21 at 15:47
  • @RobertPerry: `Inline SQL is horrible to work with` Only if you don't have the proper tools. I use SSMS, test every query I write against a prototype database, and keep all of that code in a separate data layer. – Robert Harvey Oct 07 '21 at 16:05
  • @RobertHarvey: The whole thing is messy - if you want to make schema changes, you need to go look up potentially 100's even 1000's of different places for the column or table you have changed and change code. Its often difficult to visualise a query and really understand what it is you're doing. There's far too many negatives for the positives for me. I stick to my original statement - use SQL, if you absolutely, definitely, can't do it any other way. It shouldn't be the first/only thing you do. To answer the original point - I'm firmly in the "Yes. Its a horrible anti-pattern - refactor" crew – Robert Perry Oct 07 '21 at 16:13
  • Plus, its not 1990 any more. ORM's were made to solve this exact problem. And they do a really good job at it nowadays – Robert Perry Oct 07 '21 at 16:14

14 Answers14

117

You excluded the crucial part for simplicity. The repository is the abstraction layer for persistence. We separate out persistence into its own layer so that we can change the persistence technology more easily when we need to. Therefore, having SQL outside of the persistence layer completely foils the effort of having a separate persistence layer.

As a result: SQL is fine within the persistence layer that is specific to a SQL technology (e.g. SQL is fine in a SQLCustomerRepository but not in a MongoCustomerRepository). Outside of the persistence layer, SQL breaks your abstraction and thus is considered very bad practice (by me).

As for tools like LINQ or JPQL: Those can merely abstract the flavours of SQL out there. Having LINQ-Code or JPQL queries outside of an repository breaks the persistence abstraction just as much as raw SQL would.


Another huge advantage of a separate persistence layer is that it allows you to unittest your business logic code without having to set up a DB server.

You get low memory-profile, fast unit tests with reproducible results across all platforms your language supports.

In an MVC+Service architecture this is a simple task of mocking the repository instance, creating some mock-data in memory and define that the repository should return that mock data when a certain getter is called. You can then define test data per unittest and not worry about cleaning up the DB afterwards.

Testing writes to the DB is as simple: verify that the relevant update methods on the persistence layer have been called and assert that the entities were in the correct state when that happened.

marstato
  • 4,538
  • 2
  • 15
  • 30
  • 1
    Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/58802/discussion-on-answer-by-marstato-is-it-considered-an-anti-pattern-to-write-sql-i). – maple_shaft May 15 '17 at 17:36
  • 86
    The idea that "we can change the persistence technology" is unrealistic and unnecessary in the vast majority of real-world projects. A SQL/relational DB is a completely different beast than NoSQL DBs such as MongoDB. See [this detailed article](http://cryto.net/~joepie91/blog/2015/07/19/why-you-should-never-ever-ever-use-mongodb/) about it. At most, a project which started using NoSQL would eventually realize their mistake and then switch to an RDBMS. In my experience, allowing direct use an object-oriented Query Language (such as JPA-QL) from business code provides the best results. – Rogério May 16 '17 at 03:24
  • 6
    What if there is a very little chance the persistence layer is changed? What if there isn't a one to one mapping when changing the persistence layer? What is the advantage of the extra level of abstraction? – pllee May 16 '17 at 03:46
  • I updated my answer to include the benefits of a separate layer regarding automated testing. – marstato May 16 '17 at 06:12
  • 1
    @Rogério I haven't had to use Java in a relatively long time, but how do you cleanly mock your data access if it is so closely interleaved with your business code? – Voo May 16 '17 at 09:33
  • 23
    @Rogério Migrating from a NoSQL store to an RDBMS, or vice versa, is as you say probably not realistic (assuming the technology choice was appropriate to begin with). However, I have been involved on a number of real-world projects where we migrated from one RDBMS to another; in that scenario, an encapsulated persistence layer is definitely an advantage. – David May 16 '17 at 13:37
  • @Rogério You may be correct that _projects_ seldom change their underlying persistence, but _libraries_ need to be able to encompass a broad range of projects. If you wish to have any sort of modular development (where modules can exist in multiple projects), then you can't just assume the persistence layer will be the same across them all. – Jeff Lambert May 16 '17 at 16:34
  • Excellent answer.. one point I might add though to those belittling the "tech swap" argument. The abstraction of persistence layer and data access layer doesn't *just* allow for an easier swap of differing technology but also provides the framework for easier redesign/refactor of the *same* technology. –  May 16 '17 at 20:26
  • 6
    "The idea that "we can change the persistence technology" is unrealistic and unnecessary in the vast majority of real-world projects. " My company previously wrote code along that assumption. We had to refactor the entire codebase to support not one, but three entirely different storage/query systems and are considering a third and fourth. Worse, even when we had only one database, the lack of consolidation led to all kinds of code sins. – NPSF3000 May 17 '17 at 01:22
  • 1
    _LINQ can merely abstract the flavours of SQL_ -- LINQ is a general querying syntax, and is independent of SQL. Thus you could theoretically keep the same LINQ queries, and swap out your persistence layer, so long as the persistence layer exposed data stores as `IQueryable`; the provider should tailor the underlying API calls -- SQL / alternate query syntax / alternate API -- based on the structure of the query. – Zev Spitz May 17 '17 at 11:22
  • 3
    @Rogério You know the "vast majority of real-world projects"? In my company, most applications can work with one of many common DBMS, and this is very useful, as mostly our clients do have nonfunctional requirements regarding that. – BartoszKP May 17 '17 at 11:27
  • @ZevSpitz It aims to do that, and so does JPQL. But i think that compiling LINQ/JPQL to languages like Cypher is almost impossible because they arejust too close to SQL. Also, you cannot run those queries by themselves on a `Collection`; you always need a DB server for that and that makes Unit-Tests against code that uses LINQ/JPQL impossible. – marstato May 17 '17 at 11:29
  • 2
    @marstato _Also, you cannot run those queries by themseves on a `Collection`_ -- Why not? The same query -- `IQueryable qry = from p in db.Persons where p.StartsWith("a") select p; var persons = qry.ToList();` -- can be used whether `db` is an Entity Framework `DbContext` or a mocked object with in-memory data exposed as `IQueryable` properties -- e.g. `return new List().AsQueryable();` – Zev Spitz May 17 '17 at 11:49
  • @ZevSpitz didnt know that - im impressed! Thats impossible with JPQL however. And i still doubt that this query could easily be turned into an edficient cypher query – marstato May 17 '17 at 12:23
  • @marstato This works because the C# compiler converts the query into method calls -- `db.Persons.Where(x => x.LastName.StartsWith("a"))` -- and an `Expression>`, which is a representation of the code. The entire query -- its method calls, and the expressions -- are then parsed by the query's LINQ provider (there are different providers for in-memory objects, LINQ2SQL, XML, and Entity Framework built-in to .NET; but the providers are simply classes, and are referenced and used in the same way as other .NET types). In the case of the EF provider, the call to the `Where` ... – Zev Spitz May 17 '17 at 12:33
  • method, and the call to `StartsWith` with an argument `"a"`, called on the `LastName` property of the passed-in `Person` will result in an SQL query like the following: `SELECT * FROM Persons WHERE LastName LIKE 'a%'`. If the query uses the in-memory provider (LINQ-to-IEnumerable), the expression will be compiled to an actual delegate and called. Do you see any reason why a similar provider could not be constructed for Cypher? (I'm not familiar with Cypher.) – Zev Spitz May 17 '17 at 12:38
  • @marstato See [here](http://stackoverflow.com/documentation/linq/842/getting-started-with-linq/24398/linq-methods-and-ienumerablet-vs-iqueryablet). (Disclaimer: I am the author). – Zev Spitz May 17 '17 at 12:46
  • @ZevSpitz Cypher is a query language for graph databases. The queries are kind-of similar but work very differently under the hood. There is no such thing as a table join in cypher; instead, cypher has object traversal. Possibly LINQ and cypher fit together nicely, but im doubtful. – marstato May 17 '17 at 14:08
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/58900/discussion-between-marstato-and-zev-spitz). – marstato May 17 '17 at 14:25
  • @rogerio your comment assumes the driver to change the database persistence layer is to move to vastly different technologies. People expect to have to re ode when changing from rdms to nodal. But there are plenty of organizations that make you move your app DB from Oracle to MySQL then to msssql and back to Oracle. This is usually driven by licensing costs and change of command of whomever sets the organizations strategy for DB products. – Thomas Carlisle May 17 '17 at 19:16
  • The idea that abstracting away the persistence layer is always an advantage is not entirely uncontroversial. DHH (the creator of Rails) has spoken out against this pattern: http://david.heinemeierhansson.com/2014/test-induced-design-damage.html – James_pic May 18 '17 at 10:26
  • 1
    @Voo Simple, I *don't* mock data access. Instead, I write (nearly every day) integration tests which create the persistent data a test needs, with everything occuring in a single database transaction which is always rolled-back at the end of the test. Works very well (used this approach in literally hundreds of tests over many years). Years ago, I did try the mocking approach, but it didn't prove itself (in general) to be a good one. – Rogério May 18 '17 at 16:03
  • @David You are absolutely right. I did that too (from Oracle to Sybase, then back to Oracle, in a web app with about 400 tables persisted with Hibernate). – Rogério May 18 '17 at 16:05
  • @ThomasCarlisle Moving from Oracle to MySQL would be a simple matter of changing a small configuration file, in the case of a Java project which made proper use of JPA (the standard ORM API for Java, implemented by Hibernate, which supports all RDBMSs, and even some NoSQL engines like MongoDB). – Rogério May 18 '17 at 16:17
  • @James The whole "oh we can easily just use integration tests to cover the whole application" has some downsides which really only manifest in large applications. I've seen large applications where integration tests took *over a day* to complete. And then the fun part of tracking down what went wrong when 400 tests fail after a complex merge that introduced a problem in one core function. If your code base is <100k LOC that's not going to be a big problem and there are upsides (mostly time saved and the fact that your architecture can be less strictly designed), but it just doesn't scale well. – Voo May 18 '17 at 16:35
  • @Voo Yeah, there's no one-size-fits-all approach that works for all codebases, and at those kinds of scales you definitely need different abstractions. That said, my experience is that within the context of the "pyramid of testing" approach, many projects neglect the middle tier of the pyramid. There's often only unit tests and end-to-end integration, which makes anything that isn't a unit test very expensive. – James_pic May 19 '17 at 11:16
  • This answer does not address the question at all. – Sentinel Apr 11 '18 at 05:59
  • 1
    @Sentinel what would you have liked to see in this answer instead? Me doing a survey amongst thousands of developers asking for their opinion on "SQL in the source code" and presenting the results? This site discourages opinion-based questions and answers. So I provided some reasons why people may consider SQL in the source bad practice. – marstato Apr 11 '18 at 08:14
  • @Rogério: I agree that companies switching their database provider are exceedinly rare compared to how much we're suggested to account for it. _However_, unit testing is a valid reason to make sure the persistence layer is sufficiently abstracted, so that you can easily swap it out for in-memory tests. _"allowing direct use [of db queries] from business code provides the best results"_ Define "best results". I'm pretty sure you're thinking of making development easier by cutting that corner; but you are making your maintenance harder because of it. Code is maintained more than it is created. – Flater Dec 29 '20 at 00:51
  • @Flater Most automated (JUnit) tests I write let the DB access code (either JPQL or actual SQL) run on a remote DEV db. This is preferable to tests which fully mock such external dependencies, as they tend to be detached from actual business requirements, and end up being more "for show" than truly helping test/evolve the software. That said, I can and do mock DB access in some tests, when there is a good reason for it, and it's easy enough (I use my own Java mocking library for that). The "best results" come from the simplicity and directness of the approach. – Rogério Jan 12 '21 at 19:59
  • @Rogério: If connected to a DB, regardless if a separate test db instance or not, it's an integration test, not a unit test. Which is fine, integration tests definitely have a purpose and are strongly advised, but it's not a replacement for unit tests. If tomorrow your db provider is updated and there's a new behavior you're not aware off, a test failure is going to send you on a wild goose chase. But with additional unit tests, you're able to quickly confirm that the issue is not in your code, but rather in how the db provider behaves in the scheme of your application. – Flater Jan 13 '21 at 10:18
  • @Rogério: You can argue whether you prefer terseness and simplicity over rigid but stable testing, but that's just a milder flavor of the process of getting people to write tests when they don't already. The first stage is "testing is a waste of time we could've been developing new features instead". When you're past the first stage, the second stage tends to become "meh, integration tests are good enough coverage". Your argument is somewhere past stage 2, as you do mention using _some_ unit tests, but only selectively. I respectfully argue that this is still an unnecessary concession. – Flater Jan 13 '21 at 10:22
  • @Flater They are integration/acceptance tests, yes, which BTW are the only ones worth writing IMO. From my experience, people who insist on "unit tests" have little real-world experience and don't really care to have proper tests. See, for example, what David Farley (co-author of "Continuous Delivery" book) says on his YouTube channel. – Rogério Jan 14 '21 at 15:50
  • @Rogério: If we're doing opinions: people who think integration tests are sufficient fail to understand that integration test failures simply do not reveal _which_ of the integrated components failed, making them insufficient as the sole source of proper failure detection. A unit test reveals specifically which component failed, and what it's failing at. Especially in sufficiently complex systems where an integrated graph's failures are less than obvious to attribute to a given component, unit testing the components saves a significant amount of debugging and hunting for the source of the bug. – Flater Jan 14 '21 at 17:47
  • @Flater But is your opinion based on actual experience? I suspect not. In my experience, finding the source of an integration test failure hasn't been a problem, in the majority of cases. In fact, it's much more a function of the inherent complexity of the business scenario being tested. Martin Fowler [talks about this](https://martinfowler.com/articles/mocksArentStubs.html) in a classic article, saying that "classicists (ie, as opposed to mockists) don't express this as a source of problems." – Rogério Jan 27 '21 at 15:44
  • @Flater Also, consider [this YouTube video by Ian Cooper](https://www.youtube.com/watch?v=EZ05e7EMOLM), where he talks about the real-world disadvantages of mock-based unit tests, from actual experience. He even says that if such tests do get written, it's best to delete them right after. So, essentially my point is that real-world experience shows that tests that target isolated units are a bad idea. If you have any *concrete evidence* otherwise, I would love to see it. – Rogério Jan 27 '21 at 15:55
55

Most standard business applications today use different layers with different responsibilities. However, which layers you use for your application, and which layer has which responsibility is up to you and your team. Before you can make a decision about if it is right or wrong to place SQL directly in the function you have shown us you need to know

  • which layer in your application has which responsibility

  • from which layer the function above is from

There is no "one-size-fits-all" solution to this. In some applications the designers prefer to use an ORM framework and let the framework generate all SQL. In some applications the designers prefer to store such SQL exclusively in stored procedures. For some applications there is a hand-written persistence (or repository) layer where the SQL lives, and for other applications it is ok to define exceptions under certain circumstances from strictly placing SQL in that persistence layer.

So what you need to think about: which layers do you want or need in your particular application, and how do you want the responsibilities? You wrote "I would normally have a repository layer", but what are the exact responsibilities you want to have inside that layer, and what responsibilities do you want to put somewhere else? Answer that first, then you can answer your question by yourself.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • 1
    I have edited the question. Not sure if it sheds any light. – w0051977 May 14 '17 at 18:00
  • 18
    @w0051977: the link you posted does not tell us anything about **your** application, right? It seems you are looking for some simple, braindead rule where to place SQL or not which fits to every application. There is none. This is a design decision **you** have to make about **your** individual application (probably together with your team). – Doc Brown May 14 '17 at 18:37
  • 7
    +1. In particular I would note that there is no real difference between SQL in a method, and SQL in a stored procedure, in terms of how "hard coded" something is, or how reusable, maintainable, etc. Any abstraction you can do with a procedure can be done with a method. Of course procedures can be useful, but a rote rule "we can't have SQL in methods, so let's put it all in procedures" will probably just produce a lot of cruft with little benefit. –  May 15 '17 at 16:48
  • 1
    @user82096 I would say generally speaking putting db access code in SP's has been detrimental in nearly every project I have seen. (MS stack) Aside from actual performance degradations (rigid execution plan caches), maintenance of the system was made harder by having logic split around, and SPs maintained by a different set of people than the app logic developers. Especially on Azure where code changes between Deployment Slots is like seconds of work, but non code-first / db-first schema migrations between slots are a bit of an operational headache. – Sentinel Apr 11 '18 at 07:21
36

Marstato gives a good answer but I'd like to add some more commentary.

SQL in source is NOT an anti-pattern but it can cause problems. I can remember when you used to have to put SQL queries into the properties of components dropped onto every form. That made things really ugly really fast and you had to jump through hoops to locate queries. I became a strong advocate of centralising the Database access as much as possible within the limitations of the languages I was working with. Your colleague may be getting flashbacks to these dark days.

Now, some of the comments are talking about vendor lock-in like it's automatically a bad thing. It isn't. If I'm signing a six figure cheque each year to use Oracle, you can bet that I want any application accessing that database to be using the extra Oracle syntax appropriately but to its fullest. I will not be happy if my shiny database is crippled by coders writing vanilla ANSI SQL badly when there is an "Oracle way" of writing the SQL that doesn't cripple the database. Yes, changing databases will be harder but I have only seen it done at a big client site a couple of times in over 20 years and one of those cases was moving from DB2 -> Oracle because the mainframe that hosted DB2 was obsolete and getting decommissioned. Yes that is vendor lock-in but for corporate customers it's actually desirable to pay for an expensive capable RDBMS like Oracle or Microsoft SQL Server and then utilize it to the fullest extent. You have a support agreement as your comfort blanket. If I'm paying for a database with a rich stored procedure implementation, I want it used for those cases where it makes sense to.

This leads to the next point, if you are writing an application that accesses a SQL Database you have to learn SQL as well as the other language and by learn, I mean query optimisation as well; I will get angry with you if you write SQL generation code that flushes the SQL cache with a barrage of nearly identical queries when you could have used one cleverly parameterised query.

No excuses, no hiding behind a wall of Hibernate. ORMs badly used can really cripple the performance of applications. I remember seeing a question on Stack Overflow a few years ago along the lines of:

In Hibernate I'm iterating through 250,000 records checking the values of a couple of properties and updating objects that match certain conditions. It's running a bit slow, what can I do to speed it up?

How about "UPDATE table SET field1 = <value> where field2 is True and Field3 >100". Creating and disposing of 250,000 objects may well be your problem...

i.e. Ignore Hibernate when it's not appropriate to use it. Understand the database.

So, in summary, embedding SQL in code can be bad practice but there are much worse things you can end up doing trying to avoid embedding SQL.

Pang
  • 313
  • 4
  • 7
mcottle
  • 6,122
  • 2
  • 25
  • 27
  • 5
    I didn't state "vendor lock-in" is bad thing. I stated It comes with trade-offs. In today's market with db as xaaS such old contracts and licences for running selfhosted db are going to be more and more rare. It's quite easier today to change the db from vendor A to B compared how It was 10 years ago. If you read the comments, I'm un favour of taking advantages of any feature of the data storage. So It's easy to put some specific vendor details into something (app) meant to be vendor agnostic. We strive to follow SOLiD just to the end lock the code to a single data storage. Not a big deal – Laiv May 15 '17 at 06:26
  • 2
    This is a great answer. Often the right approach is the one which leads to the least bad situation if the worst possible code is written. The worst possible ORM code might be lots worse than the worst possible embedded SQL. – jwg May 15 '17 at 09:19
  • @Laiv good points PaaS is a bit of a game changer - I'll have to think more about the implications. – mcottle May 16 '17 at 00:23
  • 3
    @jwg I'd be inclined to say that above average ORM code is worse than below average embedded SQL :) – mcottle May 16 '17 at 00:27
  • @macottle Assuming that the below average embedded SQL was written by ppl above the average of the one who write a ORM. What I doubt It very much. Many devs out there are not capable to write left JOINs whitout checking out S.O first. – Laiv May 17 '17 at 03:48
  • @Laiv You're making a really big assumption that the ORM developer shackled by having to make a generic product will write better code than a developer with specific domain knowledge and access to the database designer. This comes down to ANSI SQL vs being able to use Oracle (etc.) specific syntax and optimisations. Most ORM are generic IME - if there was an Oracle-Hibernate and SQLServer-Hibernate which optimised for the RDBMS I'd have fewer objections. At the end of the day though, no competent dev writing a system that accesses a DB has any right NOT being a competent SQL developer – mcottle May 17 '17 at 04:04
  • Https://docs.jboss.org/hibernate/orm/3.5/api/org/hibernate/dialect/package-summary.html Not that generic I think. Plus you can extend these dialects and bring some of the above average on Sql scripting you are refering, however not many of these developers who write so eficients SQLs know about them. IMO the big assumption is to speak about averages. I don't want to start a discussion ORM yes/no. But **don't blame the tool, blame the developer** – Laiv May 17 '17 at 04:55
  • @Laiv interesting, but the Oracle dialect API includes a call to "getCreateTemporaryTableString". Temporary Tables are not the "Oracle Way" - my limited experience with Informix is that you used Temp tables because the syntax wasn't as rich as Oracles. So Oracle antipatterns are actually baked into the Hibernate API. The API itself looks to be just a way of generating the correct joining syntax and other low-grade optimisations. My belief that you'd need to create a DB specific Hibernate is unshaken :) – mcottle May 17 '17 at 05:12
  • Absolutely true. When using ORM your first modelate your domain and then the db tables accordingly to the domain model. Not viceversa. See now what I mean about not blaming the tool? :-) – Laiv May 17 '17 at 06:10
  • @Laiv So, if I have a new project that will create a few new tables and use data from a hundred existing tables containing between 10 and 1,200,000 records each. You're trying to tell me that in order to use Hibernate I would have to refactor the legacy database to fit the limitations of Hibernate? Also, whether or not you choose to use a temporary table is not a database design decision - it's an old (80s) Sybase / Informix solution to a particular problem that Oracle worked around differently but the designers of Hibernate seem to have baked into the API thus crippling Hibernate for Oracle – mcottle May 17 '17 at 06:18
  • No, I'm telling you several things: 1) Such project was not designed to work with ORM. Deal with that. 2) There's no easy migrations from one model to another, no matter you like or dislike ORMs. 3) Be sure that what you need is a ORM and be aware of the consequences. For instance a huge refactor. If you strive to just throw the Hibernate dependencies and start mapping classes then you are in the big average of developers with a golden hammer. I have worked with models similar to the one you expose with JPA. The main problem was that db and domain data model were not designed accordingly. – Laiv May 17 '17 at 06:32
  • @Laiv Ok, I understand, Hibernate is useless for Brownfield projects and useless with some of the target databases it "supports". I've just spent my career becoming expert in stuff it doesn't do which is why I see the cracks. Maybe Hibernate needs to be more explicit about this in their marketing. I'm seeing more and more young coders using Hibernate expertise as a replacement for SQL expertise - hence all they have is the hammer and everything has to become a nail. – mcottle May 17 '17 at 06:40
  • If you strive to design tables first and then your domain data model then you will face all sort of inconvenients while implementing the ORM. If you find it could not be otherwise, then evaluate the possibility of not implementing off-the-shelf tools rather build your one or enhance the one you find more convenient. Large companies build their own tools for this reason. – Laiv May 17 '17 at 06:41
14

Yes, hard coding SQL strings into application code generally is an anti-pattern.

Let's try to set aside the tolerance we have developed from years of seeing this in production code. Mixing completely different languages with different syntax in the same file is generally not a desirable development technique. This is different than template languages like Razor which are designed to give contextual meaning to multiple languages. As Sava B. mentions in a comment below, SQL in your C# or other application language (Python, C++, etc.) is a string like any other and is semantically meaningless. The same applies when mixing more than one language in most cases, though there are obviously situations where doing so is acceptable, such as inline assembly in C, small and comprehensible snippets of CSS in HTML (noting that CSS is designed to be mixed with HTML), and others.

Clean Code by Robert C. Martin, pg. 288 (Robert C. Martin on mixing languages, Clean Code, Chapter 17, "Code Smells and Heuristics", page 288)

For this response, I will focus SQL (as asked in the question). The following issues may occur when storing SQL as an à la carte set of disassociated strings:

  • Database logic is difficult to locate. What do you search for to find all your SQL statements? Strings with "SELECT", "UPDATE", "MERGE", etc.?
  • Refactoring uses of the same or similar SQL becomes difficult.
  • Adding support for other databases is difficult. How would one accomplish this? Add if..then statements for each database and store all queries as strings in the method?
  • Developers read a statement in another language and become distracted by the shift in focus from the method's purpose to the method's implementation details (how and from where data is retrieved).
  • While one-liners may not be too much of a problem, inline SQL strings start to fall apart as statements become more complex. What do you do with a 113 line statement? Put all 113 lines in your method?
  • How does the developer efficiently move queries back and forth between their SQL editor (SSMS, SQL Developer, etc.) and their source code? C#'s @ prefix makes this easier, but I have seen a lot of code that quotes each SQL line and escapes the newlines. "SELECT col1, col2...colN"\ "FROM painfulExample"\ "WHERE maintainability IS NULL"\ "AND modification.effort > @necessary"\
  • Indentation characters used to align the SQL with surrounding application code are transmitted over the network with each execution. This is probably insignificant for small scale applications, but it can add up as the software's usage grows.

Full ORMs (Object-Relational mappers like Entity Framework or Hibernate) can eliminate randomly peppered SQL in application code. My use of SQL and resource files is but an example. ORMs, helper classes, etc. can all help accomplish the goal of cleaner code.

As Kevin said in an earlier answer, SQL in code can be acceptable in small projects, but large projects start out as small projects, and the probability most teams will go back and do it right is often inversely proportional to the code size.

There are many simple ways to keep SQL in a project. One of the methods that I often use is to put each SQL statement into a Visual Studio resource file, usually named "sql". A text file, JSON document, or other data source may be reasonable depending on your tools. In some cases, a separate class dedicated to ensconcing SQL strings may be the best option, but could have some of the issues described above.

SQL Example: Which looks more elegant?:

using(DbConnection connection = Database.SystemConnection()) {
    var eyesoreSql = @"
    SELECT
        Viewable.ViewId,
        Viewable.HelpText,
        PageSize.Width,
        PageSize.Height,
        Layout.CSSClass,
        PaginationType.GroupingText
    FROM Viewable
    LEFT JOIN PageSize
        ON PageSize.Id = Viewable.PageSizeId
    LEFT JOIN Layout
        ON Layout.Id = Viewable.LayoutId
    LEFT JOIN Theme
        ON Theme.Id = Viewable.ThemeId
    LEFT JOIN PaginationType
        ON PaginationType.Id = Viewable.PaginationTypeId
    LEFT JOIN PaginationMenu
        ON PaginationMenu.Id = Viewable.PaginationMenuId
    WHERE Viewable.Id = @Id
    ";
    var results = connection.Query<int>(eyesoreSql, new { Id });
}

Becomes

using(DbConnection connection = Database.SystemConnection()) {
    var results = connection.Query<int>(sql.GetViewable, new { Id });
}

The SQL is always in an easy-to-locate file or grouped set of files, each with a descriptive name that describes what it does rather than how it does it, each with space for a comment that will not interrupt the flow of application code:

SQL in a resource

This simple method executes a solitary query. In my experience, the benefit scales as use of the "foreign language" grows more sophisticated. My use of a resource file is just an example. Different methods may be more appropriate depending on the language (SQL in this case) and platform.

This and other methods resolve the list above in the following manner:

  1. Database code is easy to locate because it is already centralized. In larger projects, group like-SQL into separate files, perhaps under a folder named SQL.
  2. Support for a second, third, etc. databases is easier. Make an interface (or other language abstraction) that returns each database's unique statements. The implementation for each database becomes little more than statements similar to: return SqlResource.DoTheThing; True, these implementations can skip the resource and contain the SQL in a string, but some (not all) problems above would still surface.
  3. Refactoring is simple -- just reuse the same resource. You can even use the same resource entry for different DBMS systems much of the time with a few format statements. I do this often.
  4. Use of the secondary language can use descriptive names e.g. sql.GetOrdersForAccount rather than more obtuse SELECT ... FROM ... WHERE...
  5. SQL statements are summoned with one line regardless of their size and complexity.
  6. SQL can be copied and pasted between database tools like SSMS and SQL Developer without modification or careful copying. No quotation marks. No trailing backslashes. In the case of the Visual Studio resource editor specifically, one click highlights the SQL statement. CTRL+C and then paste it into the SQL editor.

Creation of SQL in a resource is quick, so there is little impetus to mix resource usage with SQL-in-code.

Regardless of chosen method, I have found that mixing languages usually reduces code quality. I hope that some issues and solutions described here help developers eliminate this code smell when appropriate.

Charles Burns
  • 323
  • 1
  • 6
  • 13
    I think this focuses much too much on a bad criticism of having SQL in source code. Having two different languages in the same file is not necessarily terrible. It depends on a lot of things like how they are related and how they are each presented. – jwg May 15 '17 at 09:00
  • 9
    "mixing completely different languages with different syntax in the same file" This is a terrible argument. It would also apply to the Razor view engine, and HTML, CSS and Javascript. Love your graphic novels! – Ian Newson May 15 '17 at 12:54
  • 1
    @IanNewson: Razor is a templating language; a language designed to attempt a reasonably elegant mesh of languages, and one of the only templating languages to succeed at the task in my opinion. JavaScript and especially CSS and are expressly designed around HTML, and still their use inside the HTML rather than separate files is considered an anti-pattern in most cases, especially for nontrivial code. – Charles Burns May 15 '17 at 15:59
  • @jwg: I agree, the post focuses strongly on SQL (vs generically mixing languages). To be fair, the original question is asking specifically about mixing SQL, though many, but not all, of my points apply variously to other mixed languages. – Charles Burns May 15 '17 at 16:02
  • 4
    This just pushes the complexity somewhere else. Yes, your original code is cleaner, at the expense of adding an indirection that the developer now has to navigate to for maintenance purposes. The real reason you would do this is if each SQL statement was used in multiple places in the code. In that way, it's really no different from any other ordinary refactoring ("Extract method"). – Robert Harvey May 15 '17 at 16:25
  • 3
    To be clearer, this isn't an answer to the question "what do I do with my SQL strings," so much as it's an answer to the question "what do I do with any collection of sufficiently complex strings," a question which your answer eloquently addresses. – Robert Harvey May 15 '17 at 16:36
  • 1
    Finally somebody else who actually wants to use Resource files for SQL. – Bryan Boettcher May 15 '17 at 19:09
  • @RobertHarvey: I think you nailed it. I was a bit too general, among other things. – Charles Burns May 15 '17 at 19:14
  • 1
    @RobertHarvey: Moving the complexity somewhere else is to me the goal. Isn't that why we use function and files and libraries rather than creating one giant text file that represents the whole of program logic? I think it more important in this context than even those examples, because at least with functions and files, all are the same language. With different languages, I feel it is natural to abstract them from each-other and unnatural to concatenate them together. – Charles Burns May 15 '17 at 19:20
  • 1
    Sure. But I've seen this kind of abstraction taken too far, so I'm naturally skeptical. The last overengineered system I had to suffer through had a minimum of *six hops* (through three different C# solutions and a database) to get from one side to the other. The rewrite I performed made the system much flatter (i.e. less hierarchical, a lesson I learned from Google search), much easier to understand, and much faster to write services in (the time to market for each service was decreased by 80 percent). – Robert Harvey May 15 '17 at 19:24
  • This change of philosophy is mirrored in newer development techniques which favor microservices over large, monolithic architectures. Of course, none of this has much to do with sensibly moving all of your strings to a central location, unless that's merely one cog in a very large, impossible to understand wheel. – Robert Harvey May 15 '17 at 19:26
  • 2
    @RobertHarvey: I am sure our experiences differ, but in my own I have found the abstraction in question to be a win in most cases. I will surely refine my abstraction tradeoffs as I have done up and down for years, and may one day put SQL back in code. YMMV. :) I think microservices can be great, and they too generally separate your SQL details (if SQL is used at all) from core application code. I am less advocating "use my specific method:resources" and more advocating "Generally don't mix oil and water languages." – Charles Burns May 15 '17 at 19:29
  • 1
    Mixing html, CSS and JavaScript is not an anti pattern. Look at the source for this very page and you'll find quite a bit of JavaScript and some css. Razor is not a language, it's a view engine. That's also a terrible argument anyway, as you can easily counter it by declaring any language a templating language. I hereby c# a templating language! Therefore, use as much SQL in cs files as you like! To be clear, I don't think your sentiment is necessarily wrong but it's too general. The only correct answer for something like this is "it depends". – Ian Newson May 15 '17 at 20:09
  • 1
    Nearly all of your objections are answered simply by isolating the SQL in its own "layer" (a particular namespace containing classes or static methods that contain no more than a simple query execution). ORMs tend to make your concerns here *worse*, not better; a 113 line query in ORM form is a *nightmare*. (They don't "avoid the question" at all.) Your "resource file" answer doesn't really help, either; it just moves the problems around. Also, working with embedded resources in a C# project is more annoying than strings in code. – jpmc26 May 16 '17 at 02:15
  • 1
    @jpmc26: I think there is a misunderstanding. I do not recommend use of an ORM (except in applications that need to support many simple queries across more than 3 or 4 DBMSs). I use Dapper, where a 113 line query is just that -- 113 lines of SQL. The resource file was only one suggestion. As a suggestion, I think it does isolate SQL into its own layer. In fact, resource files compile to ordinary C# classes under the hood. Regarding resources being more annoying than strings in code, I respectfully disagree, though I can say the VS resource editor could use some improvement. – Charles Burns May 16 '17 at 02:31
  • "or whichever language we use to development maintainable, quality software" -- who would like to contribute a Java example, 1/ using hibernate, 2/ or a smaller example using a SQL 'mapping' class. See http://www.javaworld.com/article/2077706/core-java/named-parameters-for-preparedstatement.html – rleir May 16 '17 at 06:58
  • 1
    Css/Js/Hml and templating languages are not the same thing as SQL embedded in c#. SQL in C# is, as far as the compiler is concerned, a meaningless string literal, devoid of any semantics. That, and the fact that this makes the c# file large and difficult to read, is what makes it bad. – Sava B. May 16 '17 at 16:03
  • 1
    The first code is **much, much more readable**. The later is total pain to read, because you have to go hunting for the SQL to a completely separate system that is not even handled by your version control system, while in the first it is right there where you need it. You made the samples suggestive with the bad naming. Name the variable in the first `var getViewableQuery` and the stored procedure in the later `eyesoreSQL` and suddenly you have absolutely no argument why the later should be more readable. – Jan Hudec May 17 '17 at 10:36
  • @JanHudec Resources are handled by version control. All you have to do to see the query in the second snippet is click on it, and press f12. – Sava B. May 17 '17 at 14:33
  • @SavaB., 1) that's still a click and a keypress more than in the first case and 2) even then only true with a very specific tooling. In the first case, it is no clicks and no keypresses, in any and every environment. – Jan Hudec May 17 '17 at 17:42
  • @JanHudec 1) A keystroke is a lot of effort, but it's worth it for the sake of readability - it reduces the size of your c# code, removes the need to keep the string literals formatted, improves signal-to-noise ratio.. etc. You can argue that it's a matter of taste, but short, concise code without a bunch of huge string literals is usually more maintainable. 2) The resources as described above are a .NET thing. Your stack of choice probably has something similar. – Sava B. May 17 '17 at 20:56
  • @SavaB., no, keystroke is blocker of readability when you need to see the two things *at once*. The code is neither shorter, nor more concise, when you split it to a stored procedure and function wrapper (remember, comparing the same query with the same binding, written in different styles). As long as you need both parts to understand what it does, splitting it always hurts and never helps. So ultimately it is about where you put the layer division. And simpler applications often don't warrant an abstract API at the database level. – Jan Hudec May 17 '17 at 21:10
  • @JanHudec You don't always need to see the query and the DAL code at once. The concern of the DAL code is to call a query. It doesn't need to know what the query does, just what it takes and returns. The C# code becomes shorter, more concise, and less poluted. Obviously, one 10k-line file contains the same amount of code as 100 100-line ones, but the monolith file is almost always less maintainable. And sprocs are many time more maintainable that resource files - not only will .sql files likely have auto-formatting, syntax coloring, and autocomplete, their semantics will also be checked. – Sava B. May 17 '17 at 21:39
  • 1
    I think that stored procedures are a perfectly legitimate solution to the problem. In fact, the stored procedure name is the analog of the descriptive name in a resource file. The point is to avoid having a bunch of SQL strings powdered throughout the application code. I usually use resources because they require no DBA involvement and centralize versioning and maintenance. The best choice depends on one's company culture, core competencies, and project history. – Charles Burns May 17 '17 at 21:51
  • @SavaB., no, you don't *always* have to see the query and the DAL code at once, but you *sometimes do*. In the above example, the SQL makes it obvious what is queried from where and how, while “get viewable” does not say all that much. Of course, this means the SQL is embedded in the DAL—so the first example should be pretty much the *only* content of a function. If that is embedded in 200-line function, it is not readable, but the reason is not embedded SQL, the reason is excessively long function. – Jan Hudec May 17 '17 at 21:55
  • @CharlesBurns, stored procedures are a perfectly legitimate solution to a problem *if you have it*. Simple applications often *don't*, though. – Jan Hudec May 17 '17 at 21:57
  • 1
    @JanHudec You sometimes need to see parts of code that have separate concerns, but this doesn't mean that violating the separation of concerns principle is usually a good idea. – Sava B. May 17 '17 at 21:59
  • @SavaB., as an aside, I never said anything about resource files. Putting SQL in resource files has all the disadvantages of both approaches an advantages of neither. The SQL needs to be wrapped in reasonably sized, well named, functions. But these functions may be either stored procedures, or they may be functions in the DAL embedding that SQL. Both approaches are readable, and each has its advantages. But that means embedding SQL is not anti-pattern per se. – Jan Hudec May 17 '17 at 21:59
  • @SavaB., but the thing is that the DAL code is part of the *same* concern as the query, really. – Jan Hudec May 17 '17 at 22:00
  • @SavaB., … and sometimes even non-DAL. There are cases where you are processing the data on the database in a way that is impractical as a stored procedure and end up with algorithm that is part SQL and part something else. And there separating the two won't help anything, because it is one algorithm with one concern. – Jan Hudec May 17 '17 at 22:39
  • Wow... I don't really see how this is better than just using Linq to SQL/EF. – Andy Sep 19 '17 at 23:02
  • @Andy: EF and LINQ are fine for many applications. Both have performance and flexibility limitations so are not suited to all applications. – Charles Burns Sep 19 '17 at 23:05
  • Fine, but putting the SQL into a resx is bizarre. A repository with the SQL embedded directly there would be better. But honestly, some sort of ORM is likely fine for the vast majority of cases. – Andy Sep 19 '17 at 23:11
13

It depends. There are a number of different approaches that can work:

  1. Modularize the SQL, and isolate it into a separate set of classes, functions, or whatever unit of abstraction your paradigm uses, then call into it with application logic.
  2. Move all complex SQL into views, and then only do very simple SQL in the application logic, so that you don't need to modularize anything.
  3. Use an object-relational mapping library.
  4. YAGNI, just write SQL directly in the application logic.

As is often the case, if your project has already chosen one of these techniques, you should be consistent with the rest of the project.

(1) and (3) are both relatively good at maintaining independence between the application logic and the database, in the sense that the application will continue to compile and pass basic smoke tests if you replace the database with a different vendor. However, most vendors do not fully conform to the SQL standard, so replacing any vendor with any other vendor is likely to require extensive testing and bug hunting regardless of which technique you use. I'm skeptical that this is as big of a deal as people make it out to be. Changing databases is basically a last resort when you can't get the current database to meet your needs. If that happens, you probably chose the database poorly.

The choice between (1) and (3) is mostly a matter of how much you like ORMs. In my opinion they are overused. They are a poor representation of the relational data model, because rows do not have identity in the way that objects have identity. You are likely to encounter pain points around unique constraints, joins, and you may have difficulty expressing some more complicated queries depending on the power of the ORM. On the other hand, (1) will probably require significantly more code than an ORM would.

(2) is rarely seen, in my experience. The problem is that many shops prohibit SWEs from directly modifying the database schema (because "that's the DBA's job"). This is not necessarily a Bad Thing in and of itself; schema changes have a significant potential for breaking things and may need to be rolled out carefully. However, in order for (2) to work, SWEs should at least be able to introduce new views and modify the backing queries of existing views with minimal or no bureaucracy. If this is not the case at your place of employment, (2) probably won't work for you.

On the other hand, if you can get (2) to work, it's much better than most other solutions because it keeps relational logic in SQL instead of application code. Unlike general purpose programming languages, SQL is specifically designed for the relational data model, and is accordingly better at expressing complicated data queries and transformations. Views can also be ported along with the rest of your schema when changing databases, but they will make such moves more complicated.

For reads, stored procedures are basically just a crappier version of (2). I don't recommend them in that capacity, but you might still want them for writes, if your database doesn't support updatable views, or if you need to do something more complex than inserting or updating a single row at a time (e.g. transactions, read-then-write, etc.). You can couple your stored procedure to a view using a trigger (i.e. CREATE TRIGGER trigger_name INSTEAD OF INSERT ON view_name FOR EACH ROW EXECUTE PROCEDURE procedure_name;), but opinions vary considerably as to whether this is actually a Good Idea. Proponents will tell you that it keeps the SQL that your application executes as simple as possible. Detractors will tell you that this is an unacceptable level of "magic" and that you should just execute the procedure directly from your application. I'd say this is a better idea if your stored procedure looks or acts a lot like an INSERT, UPDATE, or DELETE, and a worse idea if it is doing something else. Ultimately you'll have to decide for yourself which style makes more sense.

(4) is the non-solution. It may be worth it for small projects, or large ones which only sporadically interact with the database. But for projects with a lot of SQL, it's not a good idea because you may have duplicates or variations of the same query scattered around your application haphazardly, which interferes with readability and refactoring.

Kevin
  • 2,668
  • 2
  • 15
  • 23
  • 2 as written essentially assumes a read only database. Stored procedures used to be not uncommon for queries that modify. – jpmc26 May 16 '17 at 00:32
  • @jpmc26: I cover writes in the parenthetical... did you have a specific recommendation for improving this answer? – Kevin May 16 '17 at 00:45
  • Hm. Apologies for commenting before completing the answer and then getting distracted. But updatable views make it rather difficult to track down where a table is being updated. Restricting updates to directly on the tables, regardless of mechanism, does have its advantages in terms of understanding the code logic. If you're constraining the logic to the DB, then, you can't enforce that without stored procedures. They also have the advantage of allowing multiple operations with a single call. Bottom line: I don't think you should be quite so hard on them. – jpmc26 May 16 '17 at 01:12
  • @jpmc26: That's fair. – Kevin May 16 '17 at 01:16
8

Is it considered an anti-pattern to write SQL in the source code?

Not necessarily. If you read all the comments here you will find valid arguments for hardcoding SQL statements in the source code.

The problem comes with where you place the statements. If you place the SQL statements all over the project, everywhere, you will be probably ignoring some of the SOLID principles we usually strive to follow.

assume that he meant; either:

1) Use LINQ

or

2) Use stored procedures for the SQL

We can't say what he meant. However, we can guess. For instance, the first that comes to my mind is vendor lock-in. Hardcoding SQL statements may lead you to couple tightly your application to the DB engine. For instance, using specific functions of the vendor that are not ANSI compliant.

This is not necessarily wrong nor bad. I'm just pointing to the fact.

Ignoring SOLID principles and vendor locks have possible adverse consequences you may be ignoring. That's why is usually good to sit with the team and expose your doubts.

The benefit of stored procedures I think is that SQL Developers can get involved with the development process (writing stored procedures etc)

I think It has nothing to do with the advantages of stored procedures. Moreover, if your colleague dislikes hardcoded SQL, It's likely moving the business to stored procedures will dislike him too.

Edit: The following link talks about hard-coded SQL statements: https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/hard-coded-sql-statements. Is there any benefit of preparing an SQL statement?

Yes. The post enums the advantages of prepared statements. It's a sort of SQL templating. More secure than strings concatenation. But the post is not encouraging you to go this way nor confirming that you are right. It just explains how can we use hardcoded SQL in a safe and efficient way.

Summarising, try asking your colleague first. Send a mail, phone him, ... Whether he answers or not, sit with the team and expose your doubts. Find the solution that best suites your requirements. Don't make false assumptions based on what you read out there.

Laiv
  • 14,283
  • 1
  • 31
  • 69
  • 1
    Thanks. +1 for "Hardcoding SQL statements may lead you to coupling tightly your application to the db enginee.". I wander if he was referring to SQL Server rather than SQL i.e. using SQLConnection rather than dbConnection; SQLCommand rather than dbCommand and SQLDataReader rather than dbDataReader. – w0051977 May 14 '17 at 21:35
  • No. I was refering to the usage of expressions non-ANSI compliant. For instance: `select GETDATE(), ....` GETDATE() is SqlServer's date function. In other enginees, the function has different name, different expression, ... – Laiv May 15 '17 at 03:46
  • 15
    "Vendor lock-in"... How often do people/enterprises *actually* migrate their existing product's database to another RDBMS? Even in projects using exclusively ORM, I never ever saw that happen: usually the software is rewritten from scratch as well, because the needs have changed in between. Therefore, thinking about this seems pretty much like "premature optimization" to me. Yes, it's totally subjective. – Olivier Grégoire May 15 '17 at 16:01
  • 1
    I don't care how often happens. I do care It may happens. In greenfield projects, the cost of implementing a DAL abstracted from the db is almost 0. A possible migration is not. The arguments against ORM are quite weak. ORMs are not like 15 years ago when Hibernate was quite green. What I have seen is a lot of "by default" configurations and quite more bad data model designs. It's like many other tools, many people do the "getting started" tutorial and they don't care what happens under the hood. The tool is not the problema. The problem is at who doesn't know how and when to use it. – Laiv May 15 '17 at 17:54
  • On the other hand, vendor locked-in is not necessarily bad. If I were asked I would say *I don't like*. However, I'm already locked to Spring, Tomcat or JBoss or Websphere (still worse). Those I can avoid, I do. Those I can't I just live with It. It's matter of preferences. – Laiv May 15 '17 at 18:00
  • Finally, please read well -**Ignoring SOLID principles and vendor locks come with trade-offs you may be ignoring.**-. I do say trade-offs because that's something nobody can argue. And I agree with. – Laiv May 15 '17 at 18:01
  • [Not all of us strive to follow SOLID.](https://youtu.be/QM1iUe6IofM) Just FYI. ;) (I don't agree with *everything* in that video, but I find it a useful discussion of OO principles. Gets the blood pumping in your head, at the very least.) – jpmc26 May 16 '17 at 01:15
3

I think it is a bad practice, yes. Others have pointed out the advantages of keeping all your data access code in its own layer. You don't have to hunt around for it, it's easier to optimize and test... But even within that layer, you have a few choices: use an ORM, use sprocs, or embed SQL queries as strings. I would say SQL query strings are by far the worst option.

With an ORM, development becomes a lot easier, and less error-prone. With EF, you define your schema just by creating your model classes (you would have needed to create these classes anyway). Querying with LINQ is a breeze - you often get away with 2 lines of c# where you'd otherwise need to write and maintain a sproc. IMO, this has a huge advantage when it comes to productivity and maintainability - less code, less problems. But there is a performance overhead, even if you know what you're doing.

Sprocs (or functions) are the other option. Here, you write your SQL queries manually. But at least you get some guarantee that they are correct. If you're working in .NET, Visual Studio will even throw compiler errors if the SQL is invalid. This is great. If you change or remove some column, and some of your queries become invalid, at least you're likely to find out about it during compile time. It's also way easier to maintain sprocs in their own files - you'll probably get syntax highlighting, autocomplete..etc.

If your queries are stored as sprocs, you can also change them without re-compiling and re-deploying the application. If you notice that something in your SQL is broken, a DBA can just fix it without needing to have access to your app code. In general, if your queries are in string literals, you dbas can't do their job nearly as easily.

SQL queries as string literals will also make your data access c# code less readable.

Just as a rule of thumb, magic constants are bad. That includes string literals.

Sava B.
  • 195
  • 1
  • 5
  • Allowing DBAs to change your SQL without updating the application is effectively splitting your application into two separately developed, separately deployed entities. You now need to manage the interface contract between those, synchronize the release of interdependent changes, etc. This may be a good thing or a bad thing, but it's a lot more than "putting all your SQL in one place", it's a far-reaching architectural decision. – IMSoP Sep 20 '17 at 17:29
2

It is not an anti-pattern (this answer is dangerously close to opinion). But code formatting is important, and the SQL string should be formatted so it is clearly separate from the code which uses it. For example

    string query = 
    @"SELECT foo, bar
      FROM table
      WHERE id = @tn";
rleir
  • 129
  • 5
  • 7
    The most glaring problem with this is the issue of where that value 42 comes from. Are you concatenating that value into the string? If so, where did it come from? How has it been scrubbed to mitigate SQL injection attacks? Why doesn't this example show a parameterized query? – Craig Tullis May 15 '17 at 17:19
  • I just meant to show the formatting. Sorry, I forgot to parameterize it. Now fixed after referring to https://msdn.microsoft.com/library/bb738521(v=vs.100).aspx – rleir May 16 '17 at 06:49
  • +1 for importance of formatting, though I maintain that SQL strings are a code smell regardless of formatting. – Charles Burns May 17 '17 at 16:45
  • 1
    Do you insist on the use of an ORM? When an ORM is not in use, for a smaller project, I use a 'shim' class containing embedded SQL. – rleir May 21 '17 at 05:18
  • SQL strings are most definitely not a code smell. Without an ORM, as a manager and architect I would encourage them over SPs for both maintenance and sometimes performance reasons. – Sentinel Apr 12 '18 at 06:17
1

I can not tell you what you colleague meant. But I can answer this:

Is there any benefit of preparing an SQL statement?

YES. Using prepared statements with bound variables is the recommended defense against SQL injection, which remains the single biggest security risk for web applications for over a decade. This attack is so common it was featured in web comics nearly ten years ago, and it is high time effective defenses were deployed.

... and the most effective defense to bad concatenation of query strings is not representing queries as strings in the first place, but to use a type-safe query API to build queries. For instance, here is how I'd write your query in Java with QueryDSL:

List<UUID> ids = select(person.id).from(person).fetch();

As you can see, there is not a single string literal here, making SQL injection neigh impossible. Moreover, I had code completion when writing this, and my IDE can refactor it should I choose to ever rename the id column. Also, I can easily find all queries for the person table by asking my IDE where the person variable is accessed. Oh, and did you notice it's quite a bit shorter and easier on the eyes than your code?

I can only assume that something like this is available for C# as well. For instance, I hear great things about LINQ.

In summary, representing queries as SQL strings makes it hard for the IDE to meaningfully assist in writing and refactoring queries, defers detection of syntax and type errors from compile time to run time, and is a contributing cause for SQL injection vulnerabilities [1]. So yes, there are valid reasons why one might not want SQL strings in source code.

[1]: Yes, correct use of prepared statements also prevents SQL injection. But that another answer in this thread was vulnerable to an SQL injection until a commenter pointed this out does not inspire confidence in junior programmers correctly using them whenever necessary ...

meriton
  • 4,022
  • 17
  • 18
  • To be clear: Using prepared statements does not prevent SQL injection. Using prepared statements with bound variables does. It is possible to use prepared statements that are built from untrusted data. – Andy Lester May 18 '17 at 18:13
1

A lot of great answers here. Apart from what already has been said, I'd like to add one other thing.

I don't consider using inline SQL an anti-pattern. What I do however consider an anti-pattern is every programmer doing their own thing. You have to decide as a team on a common standard. If everyone is using linq, then you use linq, if everyone is using views and stored procedures then you do that.

Always keep an open mind though and don't fall for dogma's. If your shop is a "linq" shop you use that. Except for that one place where linq absolutely doesn't work. (But you shouldn't 99.9% of the time.)

So what I would do is research the code in the code base and check out how your colleagues work.

Pieter B
  • 12,867
  • 1
  • 40
  • 65
  • +1 Good point. Supportability is more important than most other considerations so don't be too clever :) That said, if you're an ORM shop don't forget that there are instances where ORM is not the answer and you do need to write direct SQL. Don't optimize prematurely but there will be times in any non-trivial application when you will have to go down this route. – mcottle Sep 20 '17 at 03:55
0

The code looks like from the database-interacting layer that is between the database and the rest of the code. If really so, this is not anti-pattern.

This method IS the part of the layer, even if OP thinks differently (plain database access, no mixing with anything else). It is maybe just badly placed inside the code. This method belongs to the separate class, "database interfacing layer". It would be anti-pattern to place it inside the ordinary class next to the methods that implement non-database activities.

The anti-pattern would be to call SQL from some other random place of code. You need to define a layer between the database and the rest of the code, but it is not that you must absolutely use some popular tool that could assist you there.

h22
  • 905
  • 1
  • 5
  • 15
0

In my opinion no isn't an anti pattern but you will find yourself dealing with string formatting spacing, especially for big queries that cannot fit in one line.

another pros is it get much harder when dealing with dynamic queries that should built according to certain conditions.

var query = "select * from accounts";

if(users.IsInRole('admin'))
{
   query += " join secret_balances";
}

I suggest using a sql query builder

amd
  • 181
  • 3
  • It may be you just using shorthand but it's a really bad idea to blindly use "SELECT *" because you will bring back fields you don't need (bandwidth!!) Also although I don't have a problem with the conditional Admin clause it leads down a very slippery slope to a conditional "WHERE" clause and that's a road that leads straight to performance hell. – mcottle Sep 18 '17 at 08:33
0

For many modern organizations with rapid deployment pipelines where code changes can be compiled, tested and pushed to production in minutes, it totally makes sense to embed SQL statements into the application code. This is not necessarily an anti-pattern depending on how you go about it.

A valid case for using stored procedures today is in organizations where there is a lot of process around production deployments that can involve weeks of QA cycles etc. In this scenario the DBA team can resolve performance issues that emerge only after the initial production deployment by optimizing the queries in the stored procedures.

Unless you are in this situation I recommend that you embed your SQL into your application code. Read other answers in this thread for advice on the best way to go about it.


Original text below for context

The main advantage of stored procedures is that you can optimize database performance without recompiling and redeploying your application.

In many organizations code can only be pushed to production after a QA cycle etc which can take days or weeks. If your system develops a database performance issue because of changing usage patterns or increases in table sizes etc then it can be quite difficult to track down the problem with hard coded queries, whereas the database will have tools/reports etc that will identify problem stored procedures. With stored procedures solutions can be tested/benchmarked in production and the issue can be fixed quickly without having to push a new version of the application software.

If this issue is not important in your system then it's a perfectly valid decision to hard code SQL statements into the application.

bikeman868
  • 879
  • 6
  • 9
  • Wrong........... – Sentinel Apr 11 '18 at 06:01
  • Do you think that your comment is helpful to anybody? – bikeman868 Apr 11 '18 at 06:03
  • This is just plain incorrect. Let's say we have a Web API2 project with EF as the data access and Azure Sql as the storage. No, let's get rid of EF and use handcrafted SQL.The db schema is stored in an SSDT SQL Server project.Db schema migration involves making sure the codebase is synched to the db schema and fully tested, before pushing out the change to a test db in Azure.A code change involves pushing out to an App Service deployment slot, which takes 2 seconds. Plus handcrafted SQL have performance plusses over stored procedures, generally speaking, despite the 'wisdom' to the contrary. – Sentinel Apr 11 '18 at 07:16
  • In your case deployment might take seconds, but this is not the case for many people. I did not say that stored procedures are better, just that they have some advantages under some circumstances. My final statement is that if these circumstances don’t apply then putting SQL in the application is very valid. How does this contradict what you are saying? – bikeman868 Apr 11 '18 at 07:23
  • Stored procedures are hand crafted SQL??? – bikeman868 Apr 12 '18 at 05:48
  • There is no way tinkering with production db SPs is any different than tinkering with production code. – Sentinel Apr 12 '18 at 06:07
  • That depends how you build, test and deploy your production code. This might be true for some people, but your experience of different organizations is obviously very limited. – bikeman868 Apr 12 '18 at 06:11
  • Nonsense. Your argument is that tinkering with SPs allows you to avoid a QA cycle. This kind of advice is bordering on criminal. – Sentinel Apr 12 '18 at 06:14
  • Did you actually read my answer? I said that the only advantage to stored procedures is that it allows you to change queries without redeploying your application. My advice was to embed SQL statements into the application code. – bikeman868 Apr 12 '18 at 15:27
  • Yeah. No it doesn't. – Sentinel Apr 12 '18 at 16:02
0

Compared to the use of an SQL View, yes it is an anti-pattern.

In the past, I have been against SQL anything. I have used ORMs and been an advocate for LINQ everywhere. I was wrong.

The case for SQL Views:

  • Some people half-correctly discuss the need for a separate persistence layer. But the RDBMS is already the persistence layer, and SQL is fairly standardised. (I have seen projects where there is a database [DML], ORM [POCOs], Repository Pattern, Inversion of Control [Interfaces], Data Service - so 6 layers of "persistence layer". Less is more).
  • When you define a VIEW in the RDBMS, it remains adjacent to the schema (and that version of the schema). This is a good thing.
  • Version control for Views is the same as Version control for tables. If your ORM doesn't support migration of Views, get a better one, or don't use ORM migrations, but instead maintain your own master-script (eg: if not exists - add column to table; replace view).
  • Just like any "reuse" scenario, another web server in the future can be written to leverage the same database and views. But also, another system on the same network may also integrate and leverage the same Views, and reporting engines can use the Views.
  • The Web Service SHOULD be there as an adaptor only, so that the client application in the browser can query the database. (In practice, people like to also implement extra login inline with such RPCs - but that's another story)
  • But what about mockability? The logic should be in the View - unit test that. If you want more integration testing, then by definition that includes the "database" too: have a testing environment complete with database.

With the above in mind, having SQL in code is worse:

  • With an ORM, you can specify SQL and provide a POCO to materials; that way you do get type-safety. However, it's better to have a VIEW and tell the ORM to rebuild the Pocos for you. For a View, the ORM can see the schema and derive the correct POCO.
  • If you want to slightly adjust the logic of one Web Service, if it's in code, you need to compile the code and upgrade. If it's monolithic, this is tedious and risky. If it's a microservice, this is still tedious and with dependencies still risky. However, if it's a View, you can easily deploy and be sure that it's only affecting just that View itself.
  • Literal SQL in code is fine for development by the way. But prototyping it in a View should still be encouraged because it's so much easier to "test" that functionality in a query tool - that's right, squarely in the persistence layer

SQL in source code isn't an anti-pattern; that's a symptom of the actual anti-patterns:

  • "Don't put it in the RDBMS" - this is just a distaste from younger coders that isn't rational. If you are a full-stack developer or backend-developer that includes the "database".
  • "Do it all in code" - this is probably a result of the success of git. When a full-stack coder is making this kind of a decision, a coder will put it in code.
Kind Contributor
  • 802
  • 4
  • 12