45

In my company's codebase, we hardcode sql queries without using an ORM.

Here's an example of a query we would run:

UPDATE client SET status="active" WHERE client_id=123

Since the query is hardcoded and the parameters are passed in, the code would look like this (Python):

'UPDATE client SET status="{}" WHERE client_id={}'.format('active', 123)

The only place that the parameters are coming from is the code where they are set. There is no case where external users can pass in arguments.

The codebase is not large but is quite sprawling and interrelated, so sanitizing the SQL would require a legitimate reason for the time spent.

Therefore, is there a reason to worry about sanitizing the SQL?

  • 21
    Your code would immediately break the moment someone makes a status that uses the " character. – T. Sar Mar 07 '23 at 11:34
  • 35
    Programming involves wrangling complexity, more complexity than we can reliably track. Thus, we create abstractions and tools to hide and automate parts. We program defensively, not just to guard against unexpected inputs but to **guard against our own fallibility**. It is *possible* to use query interpolation correctly, but it is easy to accidentally mess up. Thus, we prefer fixed query strings or parametrized queries whenever possible. Similarly, it is *possible* to write secure code in C, but it's easy to accidentally make grave errors. Thus, we prefer memory-safe languages like Python. – amon Mar 07 '23 at 13:03
  • 61
    As a data scientist my threat model of SQL injection attacks consists entirely of my own stupidity and ignorance. That is not a trivial threat. – shadowtalker Mar 07 '23 at 15:24
  • 27
    Community, if someone asks a question about a bad idea which inspires good answers, the question deserves upvotes, not downvotes, regardless how bad the idea is. – Doc Brown Mar 07 '23 at 18:15
  • 1
    @T.Sar Not if such a string isn't passed as an argument to `format`. *This* code is not a problem, though there's no point in using `format` here in the first place (and you can and should still use a proper parameterized query instead of string formatting instead). – chepner Mar 07 '23 at 20:33
  • 13
    You don't need to sanitise input at all. You need to use parametrized queries. – OrangeDog Mar 07 '23 at 21:03
  • 7
    Your code looks like Python, where doing it correctly with parametrised queries is exactly the same amount of work: `execute('UPDATE client SET status=%s WHERE client_id=%d', ('active', 123))` – OrangeDog Mar 07 '23 at 21:05
  • 1
    "_is there a reason to worry about sanitizing the SQL?_" Maybe not, to the extent that we believe your assumptions, now and in future. But copy-n-paste coding _is_ a thing. When writing new code, you should definitely be using a parameterized query rather than string interpolation, as there's just no reason not to. And if there is a chance that a future hire will copy-n-paste _this_ query into to _that_ new function, well, you really want to give him some high quality code to paste, something that properly parameterizes the query. Teach by example. The benefits come back to you. – J_H Mar 07 '23 at 22:01
  • 4
    @DocBrown, I suspect that most of the downvotes is because for most of us, this is far from the first time we've seen a question from someone going through contortions to avoid doing the right thing (prepared statements). – Mark Mar 07 '23 at 23:26
  • 1
    @Mark: if someone finds an older duplicate question on this site, I will be happy to support closing it as a duplicate (currently, noone casted a dupe-close-vote). However, the dupe should have a top answer which is at least as good as the top answer here (if not, we could close the older one as a dupe of the current). In both cases, however, downvotes are IMHO not justified. – Doc Brown Mar 08 '23 at 06:25
  • Sanitizing (via some quote() function) is cheap. Really cheap. Note that some APIs/RDBMSes have stupid limitations when it comes to bind variables, so there can reasons not to use bind variables in very special cases. – Klaws Mar 08 '23 at 12:25
  • 1
    why does your codebase use `.format` instead of just including those values in the SQL string? – theonlygusti Mar 08 '23 at 23:56
  • My bet is that the '123' comes from somewhere (perhaps a URL ending in ?client=123, or a hidden form field), and you need to know that those could be altered or spoofed by an attacker doing an injection attack. Even if all values of `status` are hard-coded, the other input could be a risk. – workerjoe Mar 09 '23 at 14:01
  • "_There is no case where external users can pass in arguments._" Does this mean there are internal users which can pass in arguments? Or that the input could come from another system? Or that input could even be coming from a URL or form? Surely, there **must be** inputs coming from somewhere else, the system isn't just rehashing its own data forever, does it? – jcaron Mar 09 '23 at 15:58
  • Consider writing stored procedures in the database itself, and calling those. I always get jitters when client programs have the possibility of doing dangerous updates. – roblogic Mar 09 '23 at 17:52
  • 1
    BTW, the quotes in `status="active"` mean different things in different databases. PostgreSQL (and standard SQL) for example use double quotes for identifiers, not string literals. Using parameterized queries avoids this minor portability issue. – mu is too short Mar 10 '23 at 06:16

5 Answers5

59

If there is never any user input at all, or the program is only used internally, the importance of sanitizing should be reduced.

But there are still a few possible reasons to always do so

  1. If all queries are sanitized there is no risk for some developer to copy paste code to some place where it does matter.
  2. It is possible that future changes may take input from some user. If this is done it will be much more expensive to go through all the code base and fix all the queries. Or worse, this is forgotten about.
  3. Parametrized queries should only be marginally more complex to write, and may have other advantages with regards to performance and/or type safety. With some languages/libraries/frameworks it may be more complex to write a non-parametrized query, and intentionally so.
  4. Figuring out if a query may take user input or not can be costly. It is probably easier to just use the safe version everywhere.
  5. There is less chance of formatting issues, the format of many data types depend on the localization, and there is no guarantee that this will match the localization of the database.
  6. Parametrized queries might be easier to read and maintain, but this might depend on programming language etc. But it will almost certainly be easier to read by someone used to and expecting all queries to be parametrized.
JonasH
  • 3,426
  • 16
  • 13
  • 17
    5. Parametrized queries are more readable and maintainable. WHERE Foo = @Bar is much more meaningful than Foo = {0}. String format with dozen parameters is not fun. As a bonus, string interpolation combined with nameof() will keep query up to date during automatic refactor. 6. Parametrized queries are not bound to peculiarities of String.Format of language/framework/runtime. Not only query will work elsewhere, you won't have to deal with parameter order. 7. Parametrized query won't break when using float/date on system with different locale settings than database. – PTwr Mar 07 '23 at 21:24
  • 4
    @PTwr: 8. Parameterized queries' execution plans are better cached by databases. – Matthieu M. Mar 08 '23 at 08:42
  • 2
    @PTwr: Your 5 and 6 are possibly a little dubious: some database libraries (\*cough* Perl DBI \*cough\*) have parameterised syntax like `WHERE Foo = ?`; conversely some string format libraries (eg Python) allow `WHERE Foo = {foo}`. (Not that this should discourage anyone from using parameterised queries - the advantages still *massively* outweigh the disadvantages!) – psmears Mar 08 '23 at 10:27
  • 2
    @Matthieu M. Query plan caching depends on the RDBMS, and it might even have ill side effects. I gave a hunch that it was Oracle which will happily ignore the parameters (and the corresponding statistics) when it hits a parameterized query where the plan is already cached. A WHERE clause like "status = :status" might give a full table scan with :status = 'closed', while :status = 'open' uses an index (table has a few dozen rows with status 'open', billions with status 'closed'). Additional fun, like when a SORT BY is present, might use a suboptimal index. Using the same plan sucks. – Klaws Mar 08 '23 at 12:17
  • 1
    @Klaws: There are indeed corner cases, in general though it's beneficial. – Matthieu M. Mar 08 '23 at 12:32
  • @MatthieuM. until query cache breaks, then you have fun discovering how to invalidate it :D – PTwr Mar 08 '23 at 17:48
  • 1
    @Klaws Bind-Peeking is a double-edged sword. In essence it can be beneficial to use a mix of hard coded parameters (like open/closed state) and bind-variables (like a search-field) to get different query-plans for different groups of queries. – Falco Mar 10 '23 at 12:01
41

Parameterised queries should be your standard approach to all SQL. If you are trying to find reasons why you don't have to use them, then you are doing yourself a disservice.

In your example you have no parameters at all to pass, the very fact that you have written the sql as a formatted string rather than just putting the parameters in the string suggested that you do pass these in from somewhere and you should be using a parameterised query.

Hackers are sneaky, they don't need a text box to type 'bobby drop tables' in and have it passed right to the database, these tricks can be added to files, other databases, urls etc etc. No parameter is safe.

Ewan
  • 70,664
  • 5
  • 76
  • 161
35

By using parameterized queries your SQL server doesn't have to recalculate the query execution plan each time you use a query.

This can improve performance for queries ran often.

Pieter B
  • 12,867
  • 1
  • 40
  • 65
  • 6
    This is a benefit that people often overlook. I was about to comment on JonasH's answer about this, until I read your answer. Especially since parameterized queries are pretty easy to write. Most database libraries and ORMs support this intuitively. – Greg Burghardt Mar 07 '23 at 17:59
  • Skipping an ORM should only be done for prototypes or something that is actually temporary. The long-term consequence of not having an ORM is always a disaster. – Nelson Mar 08 '23 at 00:43
  • It's risky if the data is skewed. Check my comment in response to Matthieu M.'s comment. – Klaws Mar 08 '23 at 12:19
  • 8
    @Nelson It might be perspective but I have always felt that ORMs were a disaster. Using parameters is a no brainer but an ORM is more situational. – stoj Mar 08 '23 at 12:58
  • I think whether or not one should use an ORM is outside the scope of this answer. – Pieter B Mar 10 '23 at 07:55
10

I just want to point out that your example is actually extremely, extremely nasty, because somebody not familiar or not careful enough would glance at these parentheses and assume that you're using parametrized queries and that the input is already safe.

Your example:

'UPDATE client SET status="{}" WHERE client_id={}'.format('active', 123)

Somebody comes in 5 years later with a task to accept the status from an user input, sees something like this and just changes it to

'UPDATE client SET status="{}" WHERE client_id={}'.format(userInput['status'], 123)

and boom, you've got yourself a fully-fledged SQL injection.

If you're insistent of doing things your way, at least make it obvious that something weird is going on.

'UPDATE client SET status="' + 'active' + '" WHERE client_id=' + '123'

Or even better, just put the values directly so it's known that they're constant.

'UPDATE client SET status="active" WHERE client_id=123

Or, the best way, just use the damn parameters instead of wasting your time thinking about this stuff.

cursor.execute('UPDATE client SET status=? WHERE client_id=?', ('active', 123))

Humans fail all the time. Just try to make it a bit harder for them.

Daniel
  • 109
  • 2
  • 1
    It might be worth noting that *the correct approach is also shortest!* (it's not obvious at first sight because the wrong examples don't include the `cursor.execute` call, but it will be there in the end. – Jan Hudec Mar 09 '23 at 09:39
3

I'm going to contradict some of the other answers and say there's little reason to parametrize a query where all the values are hard-coded.

However, you generally shouldn't be hard-coding so much. In your query, 123 is a magic number, and these should generally be avoided. There should be a named variable (or constant, in languages that support them) that describes the meaning of the number.

And once you move that value into a variable, then you should use a query parameter to substitute it. While there may not be a security purpose for this, since the number isn't supplied by the user, it's still a good practice for the reasons given in the other answers.

client_foo = 123
cursor.execute('UPDATE client SET status="active" WHERE client_id=?', (client_foo,))

The meaning of active is self-evident, so hard-coding it is less of a problem. However, if the script has multiple queries that set different statuses, you may want to create a generic SQL query or procedure that can be used to set different statuses, and this would be best done with another parameter.

update_status = 'UPDATE client SET status=? WHERE client_id=?'
...
cursor.execute(update_status, ("active", client_foo))
cursor.execute(update_status, ("completed", client_bar))
Barmar
  • 324
  • 2
  • 5