217

It's an idea I've heard repeated in a handful of places. Some more or less acknowledging that once trying to solve a problem purely in SQL exceeds a certain level of complexity you should indeed be handling it in code.

The logic behind the idea is that for the large majority of cases, the database engine will do a better job at finding the most efficient way of completing your task than you could in code. Especially when it comes to things like making the results conditional on operations performed on the data. Arguably with modern engines effectively JIT'ing + caching the compiled version of your query it'd make sense on the surface.

The question is whether or not leveraging your database engine in this way is inherently bad design practice (and why). The lines become blurred further when all the logic exists inside the database and you're just hitting it via an ORM.

PhonicUK
  • 1,047
  • 3
  • 11
  • 12
  • 62
    This is one of those sayings that has to be taken thoughtfully. It's whipped out whenever one finds another engineer doing 'select * from table' and then combing through the result set instead of using a where clause and specifying columns. But if you take it too far, you end up with a different mess. – Michael Kohne Oct 23 '12 at 17:08
  • 159
    Starting a phrase with "never" or "always" is *almost* always a recipe for a bad design. – vsz Oct 23 '12 at 18:49
  • Could you can expand the question to also ask people to weigh in on scalability especially distributing application across servers? – Kinjal Dixit Oct 23 '12 at 19:05
  • 36
    While its certainly *possible* to try to do too much in SQL, I can honestly say that in 30 years of development and consulting, I have never seen an actual serious case of it (a few minor ones). On the other hand, I have seen literally hundreds of serious cases of developers trying to do to much in "code" that they should have been doing in SQL. And I still do see them. Frequently ... – RBarryYoung Oct 23 '12 at 21:14
  • As always.. It depends... In my case I would start writing sql statements only for report/audit functionality and heavy load operations. In most cases ORMs do fine – CoffeeCode Oct 23 '12 at 22:15
  • Is there an option for declaring questions as trolling? SE is becoming littered with questions that given a little common sense and research never need to be asked, the only reason for asking such questions, that I can see, is that people want more rep. – MrEdmundo Oct 24 '12 at 07:31
  • 2
    @MrEdmundo Take it to meta. – ta.speot.is Oct 24 '12 at 10:39
  • 4
    This question is two in one - I think it should be split. 1) How much should be done in SQL? 2) How much should be done in the DBMS? Stored procedures fall in the middle. I've seen entire applications coded in stored procedures. – reinierpost Oct 24 '12 at 12:36
  • I had a similar query > http://programmers.stackexchange.com/questions/166539/is-having-functionality-in-db-a-road-block-to-scalability you may get additional knowledge to your question. – Estefany Velez Nov 05 '12 at 06:41
  • I work in a company that tried to do the first iteration of their premier application almost entirely in SQL. That is well (10+ years) in the past, but we still have to maintain it :( – Dominique McDonnell Jul 05 '13 at 07:16
  • As admin in the programming company i saw a practice to use stored procedures in Oracle very widely. They not only worked with data but also parsed input strings, XML, etc. With XML we got a bug in Oracle which caused a segfault. So I think **it is a bad idea to do things unrelated to the database in SQL**. – zb' Oct 24 '12 at 00:31
  • @DominicMcDonnell doesn't that mean the application has been very successful? It's not every day you hear about an in-house developed application that's still chugging along after a decade. – Yawar Amin May 03 '17 at 01:30
  • @YawarAmin, welcome to the financial software industry. Terrible applications survive long after they would have been eaten by the competition in the consumer world because the competing products "work differently" to the crap that they are used to – Dominique McDonnell May 08 '17 at 05:20

14 Answers14

330

In layman's words:

These are things that SQL is made to do and, believe it or not, I've seen done in code:

  • joins - codewise it'd require complex array manipulation
  • filtering data (where) - codewise it'd require heavy inserting and deleting of items in lists
  • selecting columns - codewise it'd require heavy list or array manipulation
  • aggregate functions - codewise it'd require arrays to hold values and complex switch cases
  • foreign key integrity - codewise it'd require queries prior to insert and assumes nobody will use the data outside app
  • primary key integrity - codewise it'd require queries prior to insert and assumes nobody will use the data outside app

Doing these things instead of relying in SQL or the RDBMS leads to writing tons of code with no added value, meaning more code to debug and maintain. And it dangerously assumes the database will only be accessed via the application.

Tulains Córdova
  • 39,201
  • 12
  • 97
  • 154
  • 2
    +100 - I could not say better. – Fabricio Araujo Oct 23 '12 at 16:46
  • 93
    +10000000000 for pointing out that it dangerously assumes everything will only happen through the application. – HLGEM Oct 23 '12 at 17:09
  • 2
    I am a new developer, but regarding "dangerously assumes the database will only be...", why is it dangerous? – jn1kk Oct 23 '12 at 20:23
  • 12
    @skynorth It leads to bad database design. Down the line you end up with a database which *can* only be accessed meaningfully by that application due to all the post-processing it does. – Sirex Oct 23 '12 at 20:42
  • And what about large number of optional filters? – redcalx Oct 23 '12 at 21:05
  • @locster benchmark it -- if the filters are presumably defined in code but executed in the SQL statements, they can be a huge time saver for network traffic. However they can also lead to security problems if you trust their output too heavily. – mikebabcock Oct 23 '12 at 21:12
  • 1
    I was once forced to use a in-house sort-of-ORM framework that didn't support joins. We had thousands of lines of Java to rejoin related records. – kevin cline Oct 23 '12 at 21:21
  • In my case I have lots of filters (which perform OK), but also a choice of sort order - this is currently done with a horrible CASE WHEN beneath an ORDER BY clause. From a performance point of view I feel this has crossed a line (in a big way) and would be best executed as dynamic SQL, however, that is not an option for other well know reasons. But in the .Net MS SQL Server world you now have LINQ which is essentially dynamic SQL generation behind some framework classes. Not sure if this is a good option but it's my current research angle. – redcalx Oct 23 '12 at 21:36
  • 22
    @skynorth If you rely on the code to make sure your keys maintain integrity, then you are removing a fundamental principle of RDBMS from the DB. That makes no sense, because then every application that accesses the DB will have to make sure to precisely replicate that functionality. Why not just let the DB handle that, since that's what it's designed for. The DB can prevent duplicate keys natively, for example. – Buttle Butkus Oct 23 '12 at 22:18
  • What about full outer joins in MySQL? What about repeating an aggregated value for each of the original columns instead of collapsing the unique values to one column each? What about sending a sequence of IDs the user chose and getting back a result set in the same order as that sequence? – f1r3br4nd Oct 24 '12 at 07:17
  • 11
    don't forget transactions! – Sklivvz Oct 24 '12 at 07:49
  • 25
    @skynorth: tl;dr: The rules that keep your data consistent should be implemented in the database. i.e. for 99% of applications ever written, the data (and therefore the database) lives _looooooooooong_ after your application is dead and gone. I've seen this many, _many_ times down the years (Hey, we need to deploy a version on Windows/iPhone/Android/whatever-the-new-thing-is, because {insert old platform here} is dying, we'll host or Oracle database _here_ and create a new UI _there_). There's no reason for this trend to stop today, or any time soon. – Binary Worrier Oct 24 '12 at 09:53
  • 5
    Not only will you be duplicating the functionality available in the SQL server, but you will likely be producing less-efficient code. Most SQL servers are very optimized, and writing similarly efficient code would probably take a long time (certainly a lot longer than leveraging the server). – weberc2 Oct 24 '12 at 12:16
  • 1
    I dare say that functional language with their map/filter/reduce handle a bunch of these function quite well. But join and primary key are still best done in SQL. And of course functional language work mostly in memory. – Laurent Bourgault-Roy Oct 24 '12 at 12:38
  • While I agree, these are best done in queries. You make some of these things sound very programmer labour intensive. Well that depends on the tools the programmer has, points 1 to 4 are trivial in LINQ for example. – weston Oct 24 '12 at 13:13
  • 1
    `JOINs` need to be done in the code if you have vertical partitioning. That's why so many NoSQL solutions don't have `JOINs`, they are not scalable. – vartec Oct 26 '12 at 11:36
  • 1
    @vartec You are right. But OP questions is about SQL and RDBMs. – Tulains Córdova Oct 26 '12 at 12:38
  • @user1598390: [vertical partitioning](http://en.wikipedia.org/wiki/Partition_(database)#Partitioning_methods) is something that you do with SQL and RDBMS. – vartec Oct 26 '12 at 12:39
  • 3
    My take is that get experienced and skilled SQL developers (not DBA) as part of the development team and then SQL becomes code as well with code review, coding standards etc. Then the question is whether code is deployed in the application or in database and that is a different issue already answered above. You wouldn't expect a Java developer to write idiomatic Ruby code so why do you also expect him to write good SQL. – softveda Nov 02 '12 at 09:47
  • I like that you left sorting out... sorting is for many common DB engines (MySQL) a real performance killer. We were advised by Oracle support to do sorting in code rather then using ORDER BY.. Now, I think this is MySQL fault, but this leads to an interesting point: you should consider both what SQL is made to do AND what your engine is really good at. – Lorenzo Dematté Jul 05 '13 at 07:28
  • 1
    @LorenzoDematté WOW. It seems MySQL is not a industry strength RDBMS. I've even heard that it doesn't enforce referential integrity by default. Try PostgreSQL or Oracle. You'll thank me later. – Tulains Córdova Jul 05 '13 at 07:45
  • That means the queries written using LINQ or entity framework is not a good option? – saurabh64 Feb 13 '18 at 09:11
124

I would rephrase that to "Never do in code what SQL Server can do for you well".

Things like string manipulation, regex work and such I would not do in SQL Server (barring SQL CLR).

The above tends to talk about things like - joins, set operations and queries. The intention behind it is to delegate much of the heavy lifting to SQL Server (at things it is good at) and reduce the amount of IO as much as possible (so let SQL do the joins and filter down with a WHERE clause, returning a much smaller data set than otherwise).

Oded
  • 53,326
  • 19
  • 166
  • 181
  • 28
    If everything SQL would do better than app code were put into the SQL layer, there's a lot of business logic which would end up in the database, for better or worse. I've seen this and yes, performance was stellar. But luckily the dev team all knew app development and SQL extremely well because the border between the two became very amorphous. I would not suggest this as a starting point but rather an ending point after the system becomes enormously popular and performance degrades over time. – Jimmy Hoffa Oct 23 '12 at 15:52
  • 5
    Horses for courses innit guv? – StuperUser Oct 23 '12 at 16:19
  • 2
    @JimmyHoffa: certainly. Dealing with resultsets more than the enough to be visualized certainly fit better on the SQL Engine rather than application code. It's an architectural choice decide what fits where - and sometimes is not easy to detect. – Fabricio Araujo Oct 23 '12 at 16:40
  • @JimmyHoffa - if your business logic is in the database, how do you keep it under version control, update it on deploy, etc? – Nathan Long Oct 23 '12 at 17:15
  • 29
    @NathanLong I don't know why so many people still think you can't keep your SQL in source control. At first we just had all our stored procedures/table scripts/etc necessary to create the database from scratch in source control, then later used visual studio database projects. It worked fine without the projects and better with them. SQL as with every other changeable thing necessary for creating your system should be under version control! Deployment can be done with redgate diff tools for most RDBMS if you keep your create scripts under version control, don't maintain diff scripts use tools – Jimmy Hoffa Oct 23 '12 at 17:28
  • 1
    @JimmyHoffa - I wasn't asserting that you couldn't; I just wanted to know how you have done it. :) I'm on a different tech stack than you, but I'm sure we could do this if need be. – Nathan Long Oct 23 '12 at 18:05
  • 3
    If your SQL has support for REGEX operations and string manipulation, doing them in SQL can be a good choice. – kevin cline Oct 23 '12 at 18:29
  • @kevincline - SQL Server can have access to these using [SQL CLR](http://en.wikipedia.org/wiki/SQL_CLR) - it is apparently quite fast. I believe Oracle has similar mechanisms to access Java code (but don't quote me on that). – Oded Oct 23 '12 at 18:50
  • @Oded Oracle has regex support built into PL/SQL – Conrad Frix Oct 23 '12 at 19:07
  • @ConradFrix - Fair enough. I wasn't limiting my comment to regex :) – Oded Oct 23 '12 at 19:08
  • The problem is "for you", not whether you put "well" in front of it or after. :) – Kaz Oct 23 '12 at 21:49
  • 3
    @NathanLong: think of it like this, a DB table is defined by a chunk of code written in a text file, the syntax is along the lines of "create table...". Now you can store that text file in whatever SCM you like, just like if you have DB table creation code in your favourite app language that calls whatever API is required, and you'd store that text file in your SCM. I think a the problem is that some people think DBs are somehow magic beasts, and they only know how to write VB code (or whatever) and so they only think in terms of the application language they know. – gbjbaanb Oct 26 '12 at 18:06
  • 1
    One of the good reasons of doing something in code (regex, string manipulation) is to NOT be bound to a single SQL server vendor... SQL CLR, PL/SQL are usually bad ideas. For a good design, I would stay always on the standards side. – Lorenzo Dematté Jul 05 '13 at 07:23
  • 2
    @LorenzoDematté - This applies to a very minority of products that need to work across database vendors. – Oded Jul 05 '13 at 07:27
  • 1
    @Oded ok, but the question is about design. I would say that depending too heavily on something (even more if it is out of your control) is not a good design. And I really wish that the "very minority of products" would be the majority.. I hate when I use a product or technology, and then I am bound *forever* to whichever storage tech they used to depend on.. – Lorenzo Dematté Jul 05 '13 at 07:31
  • 1
    @Oded vendor lock-in related to DB server burned me too many times in the past :) – Lorenzo Dematté Jul 05 '13 at 07:31
47

Never do in code what you can get the SQL server to do well for you (emphasis is mine)

The key to the answer is you need to look for SQL doing something well, as opposed to simply doing something, for you. SQL is an amazingly powerful language. Coupled with built-in functions, it can potentially do a lot of things. However, the fact that you can do something in SQL should not be an excuse for actually doing it in SQL.

My specific criteria to make a decision is to look at the amount of data that you get back and the number of round-trips: if you can cut the amount of data by shipping a task to the server, without increasing the number of round-trips, then the task belongs on the server; if the amount of data remains the same or increases without a simultaneous drop in the number of round-trips, the task belongs in your code.

Consider these examples:

  • You store a birth date, and you need to calculate the age for a group of users. You can have SQL server do the subtraction, or you can do it in your code. The number of round-trips stays the same, and the amount of data sent back to you goes up. Therefore, a code-based solution wins
  • You store a birth date, and you need to find users of ages between 20 and 30. You can load all users back on the client, do the subtraction to find the age, and then do the filtering, but shipping the logic to SQL Server would reduce the amount of data without requiring a additional round-trips; therefore, SQL-based solution wins.
Sergey Kalinichenko
  • 17,393
  • 4
  • 57
  • 73
  • 1
    When I worked somewhere the business logic became amorphous with the SQL, we had no trouble with multiple round trips; we just used multiple result sets in a single round trip, so that rule kinda breaks down there, though the spirit of the rule is pretty good in aiming for the golden mean – Jimmy Hoffa Oct 23 '12 at 16:18
  • 2
    +1 this is a fantastic answer because it gives concrete examples to support both directions. – Brandon Jun 28 '13 at 13:37
  • 1
    On your second example. what do you say, If the scenario is as below- The users and bday are caches and say record size is in range of 1000-2000. Isn't this more faster to do this in memory no db call required as the data is cached and so 'in between' sql operation is avoided. The processing will be iterating through a list of 1000 + users in memory and finding where the match occurs. Won't this be faster than the doing it in db – user4677228 Jun 30 '15 at 12:38
  • 1
    @user4677228 But try scaling up :-p. If your code has to scan all the data to calculate all ages and your desired result is just “how many users are at least 20 and younger than 30?”, the caches won’t help you at all. You’ll still end up streaming the whole table to your client, but the database server could do that all in *its* memory/caches and give you a fast answer regardless of whether the db client is connecting through local sockets or remotely over the network if you’re just willing to calculate the age in a `WHERE` clause. – binki Jul 27 '16 at 21:07
22

In short, it would be correct to say that: "Never perform database specific operations in your code base" as they are better addressed in your database.

Look at example of the set base operations. As you may know, RDBMS are build to handle a common data storage and manipulation operations.

In addition, the project choice of database plays important role. Having a RDBMS (MS SQL, Oracle, etc..) is different than NoSQL databases like RavenDB.

Yusubov
  • 21,328
  • 6
  • 45
  • 71
  • Never putting set operations in your code base would mean absolutely everything done in LINQ to collections (select, sum, where, single) should be done in SQL and *not* in your app, this would put a LOT of business logic into your database. – Jimmy Hoffa Oct 23 '12 at 16:10
  • 4
    The things that you describe, are not a client code. It is a Business layer, where you may have your own manipulation logic. However performing this logic on 1M+ records is going to hit you back. – Yusubov Oct 23 '12 at 16:19
  • @JimmyHoffa: That's not true, sometimes you generate transient information that need to be processed with the data you already have on app memory. Linq works wonders on that. – Fabricio Araujo Oct 24 '12 at 16:57
  • @FabricioAraujo I'm aware of why linq is great, but this answer states to *Never* do set based operations in app code, if you *Never* did set operations in app code you would never use linq because that's the entire purpose of linq. I'm making the point that *Never* doing set operations in app code is a bad rule to follow – Jimmy Hoffa Oct 24 '12 at 16:59
  • @JimmyHoffa: No, the rule says "never do in app what the RDBMS can do well for you" . And I'm talking about *transient* information - not information persisted on database. I worked on systems where, to fullfill business rules, I needed to do processing on code. I remember a Business rule that I had, after do heavy processing on DB, do additional processing on that data to generate an (very important) report. I which I could use linq on that (it was done on the now-defunct Delphi.Net). In other words, linq can be used even following that rule. – Fabricio Araujo Oct 24 '12 at 17:14
  • @FabricioAraujo data that you generate in an RDBMS by doing complexe aggregation with SQL but don't persist the resulting aggregates counts as "transient" just the same, transience has nothing to do with whether or not an RDBMS will do set-based operations better than app-code. – Jimmy Hoffa Oct 24 '12 at 17:21
  • @JimmyHoffa: trying again. There are cases where Db-Side processing is not enough and client-code have to do extra processing to generate the information. So, yes, **most** set-based operations go to db-side which are set-based data cruncing engines. But sometimes just it does not do the job and you have to finish it on code (because you need sparse arrays, special collections, etc). There very few "nevers" that stand the test of reality... – Fabricio Araujo Oct 24 '12 at 17:43
  • @FabricioAraujo Exactly why I was saying the "never" in this answer is not appropriate :) – Jimmy Hoffa Oct 24 '12 at 17:52
  • @JimmyHoffa: I personally translate *"never "* to *"are you sure that this is better than the result you'ld get from following ?"* – Fabricio Araujo Oct 24 '12 at 17:55
  • @FabricioAraujo, this is correct. I used word never in general meaning for this case. As it always happens, there are some exception scenarios when traditional approach may not be the best. – Yusubov Oct 24 '12 at 18:00
13

As a rule, your DB has more information to work with than your application, and can do common data operations more efficiently. Your database maintains indices, for example, while your application would have to index the search results on the fly. So all else being equal, your overall workload can be decreased by pushing the work to the database rather than the application.

But as your product scales, it typically becomes easier to scale your app than to scale your db. In large installations, is not uncommon to see application servers outnumber database servers by a factor of 10 to 1 or more. Adding more application servers is often a simple matter of cloning an existing server onto new hardware. Adding new database servers, on the other hand, is dramatically more difficult in most cases.

So at this point, the mantra becomes protect the database. It turns out that by caching the database results in memcached or by queueing updates in a application-side log, or by fetching the data once and calculating your statistics in your app, you can dramatically reduce your database workload, saving you from having to resort to an even more complicated and fragile DB cluster configuration.

tylerl
  • 4,850
  • 21
  • 32
  • 1
    Money can solve hardware scalability issues, whereas no amount of money can solve software complexity. – Tulains Córdova Oct 23 '12 at 18:20
  • 3
    @user1598390 Indeed: [Hardware is Cheap, Programmers are Expensive](http://www.codinghorror.com/blog/2008/12/hardware-is-cheap-programmers-are-expensive.html). Money **can** solve software complexity. Money spent on programmers. But note that we're not talking about clean code versus speghetti. We're talking about performing work on the app side versus the DB side. Software complexity is only marginally related, since both options can follow good design principles. A better question is: "*which design costs more?*". – tylerl Oct 23 '12 at 18:41
  • Once you have a code base that is humongous and full of fat, most of it doing non-business stuff, the only thing you can do is the mother of all re-engineerings, which costs more than hardware and involves too much uncertainty, besides you will always know where to find good hardware, but good programmers is a different story... meanwhile your competitors are using their time to improve, adapt to change and make clients happy. – Tulains Córdova Oct 23 '12 at 18:51
  • 1
    +1 for being the only person to mention scaling in your answer. – Matt Oct 23 '12 at 20:16
  • Hardware was cheap, not any longer - in the datacenter, electricity and hardware amounts to 88% of the running cost (cited by Microsoft) so spending more on programmers to write efficient code is very cost effective, and will be until we get unlimited and cheap fusion power. – gbjbaanb Nov 11 '12 at 14:48
12

I think it would be poor design to not use the database for the things it is meant for. I have never seen any database where the rules were enforced outside the database that had good data. And I have looked at hundreds of databases.

So things that must be done in a database:

  • Auditing (application only auditing will not track all changes to the database and thus is worthless).

  • Data ingerity constrainsts including default values, foreign key constraints and rules which must always be applied to all data. All data is not always changed or inserted through an application, there are one-time data fixes especially of large data sets that are no practical to do one record at a time (please update these 100,000 records that got mismarked as status 1 when they should be 2 due to an application code bug or please update all records from client A to client B because company B bought company A) and data imports and other applications which might touch the same database.

  • JOINS and where clause filtering (to reduce the number of records sent across the network)

Ben
  • 728
  • 2
  • 7
  • 17
HLGEM
  • 28,709
  • 4
  • 67
  • 116
5

"Premature optimization is the root of all evil (most of it, anyway) in computer programming" - Donald Knuth

The database is exactly that; the data layer of your application. Its job is to provide your application with the data asked for, and store the data given to it. Your application is the place to put code that actually works with the data; displaying it, validating it, etc.

While the sentiment in the title line is admirable, and accurate to a point (the nitty-gritty of filtering, projecting, grouping etc should in the overwhelming number of cases be left to the DB), a definition of "well" might be in order. The tasks that SQL Server can execute with a high level of performance are many, but the tasks that you can demonstrate that SQL Server does correctly in an isolated, repeatable manner are very few. SQL Management Studio is a great database IDE (especially given the other options I've worked with like TOAD), but it has its limitations, first among them being that pretty much anything you use it to do (or any procedural code you execute in the DB underneath) is by definition a "side effect" (altering state lying outside the domain of your process's memory space). In addition, procedural code within SQL Server is only just now, with the latest IDEs and tools, able to be measured the way managed code can using coverage metrics and path analysis (so you can demonstrate that this particular if statement is encountered by tests X, Y, and Z, and test X is designed to make the condition true and execute that half while Y and Z execute the "else". That, in turn, assumes you have a test that can set the database up with a particular starting state, execute the database procedural code through some action, and assert the expected results.

All of this is much more difficult and involved than the solution provided by most data access layers; assume the data layer (and, for that matter, the DAL) know how to do their job when given the correct input, and then test that your code provides correct input. By keeping procedural code like SPs and triggers out of the DB and instead doing those types of things in application code, said application code is much easier to exercise.

KeithS
  • 21,994
  • 6
  • 52
  • 79
  • Wait, wait, what? How did you get from correctness proofs to tests, which can prove that bugs exist but can never prove that code is correct? – Mason Wheeler Oct 23 '12 at 17:30
  • 2
    a stored procedure is not procedural code. A SP is a pre-computed SQL query stored and run inside the DB. It is not application code. – gbjbaanb Nov 11 '12 at 14:56
  • 1
    If the SP is limited to a SQL query, then you're right. If it's T-SQL or PL/SQL including conditional breaks, loops, cursors and/or other non-query logic, you're wrong. And a LOT of SPs, functions and triggers in DBs all over cyberspace have these extra elements. – KeithS Nov 12 '12 at 02:09
5

One of the things people don't seem to realize is that doing all of your processing on the SQL server is not necessarily good, regardless of the effects on code quality.

For instance, if you need to grab some data and then compute something from the data and then store that data in the database. There are two choices:

  • Grab the data into your application, compute within your application, and then send the data back to the database
  • Craft a stored procedure or similar to grab the data, compute across it, and then store it all from a single call to SQL server.

You may think that the second solution is always the fastest, but this is definitely not true. I'm ignoring even if SQL is a bad fit for the problem(ie regex and string manipulation). Let's pretend you have SQL CLR or something similar to have a powerful language in the database even. If it takes 1 second to make a round trip and get the data and 1 second to store it, and then 10 seconds to do the computation across it. You're doing it wrong if you're doing it all in the database.

Sure, you shave off 2 seconds. However, had you rather waste 100% of (at least) one CPU core on your database server for 10 seconds, or had you rather waste that time on your web server?

Web servers are easy to scale up, databases on the other hand are extremely expensive, especially SQL databases. Most of the time, web servers are "stateless" as well and can be added and removed at whim with no additional configuration to anything but the load balancer.

So, think not just about shaving 2 seconds off of an operation, but also think about scalability. Why waste an expensive resource like database server resources when you can use the much cheaper web server resources with a relatively small performance impact

Earlz
  • 22,658
  • 7
  • 46
  • 60
  • 1
    you're also forgetting network trips - you cannot scale horizontally by adding servers without some efficiency hit. So reducing the data load by adding a where clause is obvious - but the other sql operations will not necessarily reduce performance. Your point is correct in general though, but not to the point where you treat the DB as a dumb datastore. The most scalable app I've ever worked on used stored procedures for every data call (except 2 complex queries). A 3rd solution is the best - "stored proc to grab just the necessary data", no sure if you meant that as 'compute' or not. – gbjbaanb Nov 11 '12 at 14:54
4

I like to look at it as SQL should only deal with the data itself. The business rules that decide what the query may look like can happen in code. The regex or validation of the informaiton should be done in code. SQL should be left to just join your table, query your data, insert clean data, etc.

What gets passed into SQL should be clean data and SQL should not really need to know anything more than it needs to store it, update it, delete it or retrieve something. I have seen way too many developers want to throw their business logic and coding in SQL because they think of the data as their business. Decouple your logic from your data and you will find your code gets cleaner and easier to manage.

Just my $0.02 though.

Walter
  • 16,158
  • 8
  • 58
  • 95
  • Why would you run a regex or validation on data that's already in the database? Constraints should prevent bad data from ever getting there, and the use of regex probably means you need more useful columns.. – Brendan Long Oct 23 '12 at 23:13
  • I was not saying that I would use regex or validation on data that was coming from the database. I guess I should have clarified that was for data going to the database. My point there was that the data should be cleaned and validated before it gets to the DAL. – Stanley Glass Jr Oct 24 '12 at 12:24
3

Generally I agree that the code should control the business logic and the DB should be a logic free hash. But here are some counter points:

Primary, foreign key, and required (not null) constraints could be enforced by code. Constraints are business logic. Should they be left out of the database since they duplicate what code can do?

Do other parties outside of your control touch the database? If so having constraints enforced close to the data is nice. Access could be restricted to a web-service which implements logic, but this assumes you were there "first" and have the power to enforce the use of the service on the other parties.

Does your ORM perform a separate insert/update for each object? If yes, then you will have severe performance problems when batch processing large data sets. Set operations is the way to go. An ORM will have trouble accurately modeling all the possible joined sets which you could perform operations on.

Do you consider a "layer" to be a physical split by servers, or a logical split? Running logic on any server could theoretically still fall under it's logical layer. You might organize the split by compiling into different DLL's rather than splitting servers exclusively. This can dramatically increase response time (but sacrificing througput) while maintaining separation of concerns. A split DLL could later be moved to other servers without a new build to increase throughput (at the cost of response time).

mike30
  • 2,788
  • 2
  • 16
  • 19
  • why the downvote? – mike30 Oct 23 '12 at 16:54
  • 5
    I haven't downvoted, but any database sepcialist will tell you that considering the database a logic free hash is a very poor idea. It causes data integrity problems or performance problems or both. – HLGEM Oct 23 '12 at 17:12
  • 1
    @HLGEM. The answer describes reasons to keep logic *in* the database or sitting on the DB server. Still doesn't explain it. – mike30 Oct 23 '12 at 17:45
  • They may not have gotten to the counterpoints as I did which is why I didn't downvote. – HLGEM Oct 23 '12 at 17:50
3

The idiom is more to do with keeping the business rules, to do with the data, together with the relations (the data and structure and relationships.) It's not a one-stop-shop for every problem but it helps to avoid things like manually maintained record counters, manually maintained relationship integrity etc, if these things are available at the database level. So if someone else comes along and extends the programs or writes another program that interacts with the database, they won't have to figure out how to maintain database integrity from previous code. The case of a manually maintained record counter is particularly pertinent when someone else wants to author a new program to interact with the same database. Even if the newly created program has exactly the right code for the counter, the original program and the new one running at approximately the same time are likely to corrupt it. There's even code out there that retrieves records and checks conditions before writing a new or updated record (in code or as separate queries), when if possible this can often be achieved right in the insert or update statement. Data corruption can again result. The database engine guarantees atomicity; an update or insert query with conditions is guaranteed to affect only the records meeting the conditions and no external query can change the data half way through our update. There's many other circumstances where code is used when the database engine would better serve. It's all about data integrity and not about performance.

So it's actually a good design idiom or rule of thumb. No amount of performance is going to help in a system with corrupt data.

Chris
  • 31
  • 1
0

As mentioned before, the goal is to send to and receive as little as possible from the database because the round trips are very costly time-wise. Sending SQL statments over and over again is a waste of time especially in more complex queries.

Using stored procedures in the database allows developers to interact with the database like an API, without worrying about the complex schema on the back. It also reduce the data sent to the server since only the name and a few parameters are sent. In this scenario, most of the bussines logic can still be in the code but not in the form of SQL. The code would essentially prepare what is to be sent or requested from the database.

Walter
  • 16,158
  • 8
  • 58
  • 95
0

There are a few things to remember:

  • A relational database should ensure referential integrity through foreign keys
  • Scaling one database can be difficult and expensive. Scaling a web server is a lot easier simply by adding more web servers. Have fun trying to add more SQL server power.
  • With C# and LINQ, you can do your "joins" and whatnot through code so you kind of get the best of both worlds in many cases
Joe Phillips
  • 557
  • 2
  • 7
0

"Premature optimization is the root of all evil" - Donald Knuth

Use the tool most appropriate for the job. For data integrity, this is often the database. For advanced business rules, this is a rule-based system like JBoss Drools. For data visualisation, this would be a reporting framework. etc.

If you have any performance issues, you should then afterwards look whether any data can be cached, or whether an implementation in the database would be quicker. In general, the cost of buying extra servers or extra cloud power will be far lower than the added maintenance cost and the impact of extra bugs.

Jim G.
  • 8,006
  • 3
  • 35
  • 66
parasietje
  • 111
  • 2