11

From my experience, sql code changes almost always tend to be NOT incremental: someone creates a new stored procedure, or modifies an entire embedded sql query for optimization purposes, or creates a brand new table. When I receive one of these code review requests, I could not find any way except understanding the entire sql query. Often times these are really long nested queries, sometimes calling other procedures. Comprehending these changes and verifying them becomes a huge task. Then I am left with three options:

  1. approve without reviewing carefully;
  2. spend a significant amount of time to go line by line, understand the data model, run the query on some test system to see what it produces;
  3. ask the person to walk me through the changes.

I do not want to do 1 or 3. However, I also do not want to spend hours figuring out what the whole query is doing and whether the new version is equivalent but works faster.

Are there any technologies or tools or methodologies to ease this process? How do others perform such reviews without going through too much pain?

One fundamental difference from regular code changes seems like sql changes more frequently become (at least conceptually) entire rewrites which makes the problem more frequent to encounter - for me almost every sql review I do every single day.

Any recommendations are greatly appreciated.

Update:

You can reject a review or tell the author rewrite the change in more clear way, or add more comments or tests. However, what I am looking for is a way as a reviewer to actually take care of the review as it is. Most of the answers so far about improving the code change so that I can review more easily. That is obvious. On the other hand, the practical issue here is that I need to address the review without pushing it back to the author, without changing author behavior or restructure the organization.

I am also hoping for a sql specific recommendation, not some generic code review recommendation. I am hoping that may be there are tools that help the reviewer no matter how bad the code base is and how bad the code writing practices are.

Personally, I worked on many big and small companies as a developer close to 20 years. I maintain open source projects and I have a PhD in Computer Science. I would appreciate some help that says more than “change others and your world gets better” type of non pragmatic answer.

CEGRD
  • 235
  • 1
  • 7
  • My question is specific to sql. I do not think it is a duplicate of what you are referencing. – CEGRD Sep 12 '19 at 08:04
  • Also I am hoping for some sql specific technology that practically ease the process, not some generic coding review guidelines. – CEGRD Sep 12 '19 at 08:05
  • 1
    Pushing the code back to the author may feel like dodging your job as a reviewer, but it is actually the opposite. You are taking responsibility for keeping the software maintainable, so that there remains a fighting chance to make further changes within a reasonable timeframe. – Bart van Ingen Schenau Sep 12 '19 at 10:49
  • 5
    @CERGD, calm down a little. The problem here (as shown by comments on answers) is that you are proceeding on the assumption that an answer (more precisely, an answer within the constraints of your existing organisational procedures, and which meets the criteria of being an "SQL tool") already exists. Contributors here broadly feel that no such answer does exist, and that various emollients (like testing, documentation, liaising with the writer, or other changes in the relation between writer and reviewer) are being sharply rejected when they are actually the most sensible starting point. – Steve Sep 12 '19 at 10:57
  • The stackoverflow site is a question and answer site to help people each other. When I see a question that I do not have an answer for, I do not write an answer. I asked my question to the general community not the specifically to a couple of people who rushed to write an answer saying my question has no answer. Scientifically, saying something has no answer requires a scientific proof. People saying my question has no answer based on their experience has no value, other than polluting the site. – CEGRD Sep 12 '19 at 12:07
  • also, request for a tool recommendation is explicitly off topic. Answerers are being nice by glossing over that and answering the _methodology_ part. – Ewan Sep 12 '19 at 12:20
  • 2
    @CEGRD, it can be a valid (if not scientifically definitive) answer for an experienced person to say "there is no known answer within the strict terms you've set" (although nobody said it quite so baldly). Your own question has not been laid out scientifically with all underpinning assumptions stated - it is just a practical problem casually stated. If you understood the problem with any scientific rigour, you would probably either have an idea what a tool needs to do to solve it (and describe it to us), or you would (like us) think it unlikely that any tool can solve the problem. – Steve Sep 12 '19 at 12:47
  • 2
    please can we _not_ close this question? Its got a bit heated but the question is far more focused than the supposed dupe and someone might come along with a "oh redgate make a SQL comparison tool, its great" any second. – Ewan Sep 12 '19 at 13:46
  • I am using redgate already and it does not address my issues @Ewan – CEGRD Sep 12 '19 at 13:50
  • *sigh* that was an example of a hypothetical answer from a hypothetical person who knows of a tool that helps you and others that read this question in the future. – Ewan Sep 12 '19 at 13:52
  • @Ewan Sorry, I was just trying to clarify myself. Sql compare tools are great for incremental changes such as someone added a column, or removed an argument from a stored proc. It does not work very well for rewrites for performance optimizations, for example. – CEGRD Sep 12 '19 at 13:56
  • @CEGRD, the database itself already incorporates an engine designed (by world experts and scientists over decades) to optimise the performance of queries. If a query has to be totally rewritten to optimise performance, then you are often already into a realm where the equivalence of the old and new queries cannot be automatically proven - they may not be equivalent at all, except under broader constraints on the source data *that are not stated explicitly in the queries being compared*, or because someone has decided that a constraint that originally applied could be sacrificed. – Steve Sep 12 '19 at 16:12
  • What are your company's expectations of you as a code reviewer? Are you a part of the team writing the code, thus, you're expected to find logical errors, or are you a database engineer on a generic "database team" which has been tasked with performing code reviews? In the latter case, you may not be expected to understand the minutia of what the query is doing -- just upholding standards and making sure there are no obviously poor practices being implemented. – bvoyelr Sep 12 '19 at 16:21
  • My definition of a code review (and this is also how it was interpreted in places I worked) is that you are responsible for any issues in that change you approve. So any issues you miss is also your problem. @bvoyelr – CEGRD Sep 12 '19 at 16:23
  • Does your employer agree with this definition? It's not an answer to your question, but if all your employer wants you as a SQL code reviewer to do is ensure "SET TRANSACTION STATE" is present and there are no cross joins, it'll at least help you re-contextualize your role in the code review process. – bvoyelr Sep 12 '19 at 16:34
  • Typically peer code review is done to make sure other people see issues that you may miss. The issues may be bugs, style discrepancies, design flaws etc. May be other places take a subset of those but all the companies I worked on adopts the all in one definition. – CEGRD Sep 12 '19 at 17:05

4 Answers4

11

You may not like this, but:

This is not a problem to be solved easily by additional technologies or tools.

An SQL which contains "long nested queries, sometimes calling other procedures" and cannot be easily understood should at least have proper indentation and comments. Otherwise, it will become an unmaintainable mess. So if it is really that hard to understand the query line-by-line, reject the change and ask the author for making it more readable.

And asking authors to explain their code during a review is not the worst thing , quite the opposite. You could both together go through the code and add the missing comments together.

Of course, the technological solution to this would be to avoid writing SQL queries at all and switch to something like an ORM which generates most queries automatically. But unfortunately, this is not a realistic solution for many real-world systems, since it could mean to rewrite large parts of the system new from the ground (and often ORMs introduce their own class of new problems, sometimes they introduce more problems than they solve).

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • What you are recommending is academic or not pragmatic. In a perfect world we would not have this issue. But what I am describing is a real problem and you are saying simply reject/ignore the practical problem. – CEGRD Sep 12 '19 at 08:12
  • 3
    @CEGRD: quite the opposite. Adding comments to an otherwise unreadable piece of code is a very pragmatic solution. An academic approach would be to insist on breaking down the query into smaller parts, or by trying to make the SQL so "self-docmenting" that no comments are needed. Both of these approaches don't work well for SQL in reality. – Doc Brown Sep 12 '19 at 08:30
  • This essentially throws the review back to the author. You can reject a review or tell the author rewrite the change in more clear way, or add more comments or tests. However, what I am looking for is a way as a reviewer to actually take care of the review as it is. Most of the answers so far about improving the code change so that I can review more easily. That is obvious. On the other hand, the practical issue here is that I need to address the review without pushing it back to the author, without changing author behavior or restructure the organization. – CEGRD Sep 12 '19 at 09:17
  • 3
    @CEGRD, the bottom line is there isn't a way to achieve what you want - which is easy code reviews of complex SQL. SQL is capable of doing a lot of work (and encapsulating a lot of analysis and design) efficiently in few words and in a way that causes very efficient execution, and has at times a fiendish subtlety with all parts set in a careful interlock with each other. To review it is often as much work as to write it, and it's not reducible very far with any known tools or methods, because SQL itself already performs a lot of important reduction of the scale of the challenge. – Steve Sep 12 '19 at 09:40
  • Plus, for the record, I don't agree that ORMs have much impact - they often introduce their own challenges (especially in debugging or improving the "automatic generation of code"). If they work at all, I suspect it's because they allow a division of labour and separate specialisms. – Steve Sep 12 '19 at 10:01
  • 3
    @CEGRD, we could say "we don't know", but that doesn't quite emphasise that we *do know* (from our own experience and that of many others) that there is no easy answer to making complex SQL simple. Many queries are complex not for lack of skill by the writer or deficiencies of the language, but simply because they do a lot of inherently complex things with the data (which the firm has stated they must) or manage a lot of constraints in a carefully balanced solution that may take the writer a relatively long time to conceive and refine. – Steve Sep 12 '19 at 10:19
  • 3
    @CEGRD: when code - in whatever language - is written in an unreadable way, the only way I know to improve the situation is by cleaning up the code (and in case of complex SQLs, since usual refactoring techniques often won't work, commenting is often the only thing you can do). If you find a better way, feel free to post an answer of your own, after more than 3 decades of programming I am still open to learn something new. – Doc Brown Sep 12 '19 at 10:38
  • 1
    @CEGRD, we do not say you should not look for an answer. But a "better" answer seems to be one which does not change your colleagues' work processes in the slightest, so that curtails a huge field where feasible answers may lay. (1/2) – Steve Sep 12 '19 at 11:11
  • 1
    Secondly, I suspect you haven't fully understood the problem to which you seek an answer - (to be colourful with metaphor without intending to be insulting....) a child may ask how he can jump high enough to touch the moon, but a fuller appreciation of all the problems involved, leads one to conclude that there almost certainly cannot be an answer within the realms of muscle power alone, and searching for solutions involving springs and ladders will waste intellectual effort because of obvious mismatch between the problem and the means involved in *that* class of solutions. (2/2) – Steve Sep 12 '19 at 11:17
  • 1
    I have rewritten some rather nasty stored procedures using C# and entity framework. IMO an ORM should be mandatory for most projects. I solved quite many bugs due to that rewrite and was rewarded with easily maintainable code. It doesn't have to be exclusive however. Some stuff is better kept as SQL. – 9Rune5 Sep 12 '19 at 17:47
3

SQL queries are complex and brittle code that I experienced myself to be quite difficult to review. We do have a good portion of our codebase written as SQL templates (for good reasons I won't elaborate on), so I completely understand how the review process can become difficult (and yet, especially valuable), despite our continuous efforts to code quality.

What we did already have and helps a bit is to have code standards for SQL queries : normalized indentation levels, table naming conventions and so on.

But what did eased the process tremendously in our case, is to introduce unit testing of all modified/new SQL queries in our codebase alongside with code to be reviewed. This gives you a quick insight to what the query should be doing and confidence that it is correct in all cases. If the test suite doesn't give you this guarantee, the code should be rejected anyway.

Right now, I read SQL queries mostly to ensure conventions are followed and performance looks decent. Code correction and robustness is almost always better covered by an up-to-date test suite.

Diane M
  • 2,046
  • 9
  • 14
  • I hear you but introducing test cases do not solve the entire problem of having to still going through the entire code and verify whether the test cases are accurately testing the change. Am I misunderstanding? – CEGRD Sep 12 '19 at 08:19
  • 1
    @CEGRD A test (or a suite of tests) should inherently give you the guarantee that the implementation is correct, otherwise it's not a good test or it's not sufficiently tested. You should make sure the tests cover the change, not that the implementation cover the change. E.g. if you have a request that should update table A using B and C, in your test mock A, B and C data, check A updated correspond to the perimeter of your functionnal need, you don't need to actually care how it internally was updated (only should so for performance and style like I mentionned) – Diane M Sep 12 '19 at 08:30
  • I believe your assumption is that it is easy to see “a request should update table A using B and C”. That is not the case most of the time in real world queries. I am talking about 100 lines sql queries going over many tables with iterations, temporary tables, stored procedure calls etc. – CEGRD Sep 12 '19 at 09:07
  • 1
    @CEGRD Temporary tables and PL_SQL are implementation details, not functional needs, as long as in the end of the day the query does its job on a minimal example why would you care ? – Diane M Sep 12 '19 at 10:18
  • Because the example you may not fully cover the code. Without understanding the code you cannot ensure that there is an edge case in the query so that when that when that where condition becomes true, then the query would blow up. So you would still need to look and read the code line by line to sure you get enough coverage. – CEGRD Sep 12 '19 at 10:21
  • 1
    You can never ensure 100% code correctness. You can have tooling that makes you more confident that it is for some cases, including known edge cases. I highly doubt humanly compile and execute a query in your head is faster and more reliable than having it written down and done by machines. This is why as a reviewer you should mostlsy assure tests are covering enough of these cases, and that they pass. – Diane M Sep 12 '19 at 10:34
2

I guess my thoughts on reading your question are

  1. Are you defining what a code review is supposed to check?
  2. Why do you perceive a difference between incremental and non-incremental changes?
  3. omg stop putting business logic in sprocs

For me a code review checks a defined list of things, are there tests, is the CI pipeline configured correctly, is the style guide being followed, does the work actually do what was asked etc etc. Whatever, but its a list of checks that anyone can do on any code.

So, if your SQL change has tests, the tests pass, the test cover enough edge cases and the code does what the ticket asked for... Code Review passed!

In terms of understanding the impact of the change, sure you change the sproc you are calling but not the sproc code thats a big change that you can't understand from looking at the code.

But if even a small code change can break everything, a missing bracket or null check, an integer overflow etc Or a business logic change, which would code wise would seem correct either way but one way is required.

This is why you have tests. So you don't have to follow the code through and work out if it right. You run the test and it passes or doesnt and also the test explains what the code is supposed to do. The sproc updates the order, the select returns office around the world etc. You don't have to work out what the code is attempting because it's right there Assert.AreEqual(results, listOfOfficesinTheWorld)

My guess is that you have too much logic in sprocs which are maybe not even in source control and don't have tests, including performance tests for them.

I would suggest that you start to move logic out of the data layer, keep your sql simple, adding and removing data and have the manipulation logic in code with lots and lots of unit tests.

Update in response to changes:

I still think my answer applies. You want a :

tools or methodologies to ease this process (point 2)

Automating point 2 is equivalent to writing tests.

understand the data model,

Compare the before change and after change data models. If you have a test the commit to that test will show the incremental, or ground up change

run the query on some test system to see what it produces;

Run the sql, compare the result against expected. If you automate that there's your test right there and the commit showing any change in the expected result shows you the incremental or group up change. Naming and comments in the test explain the change EnsureCustomerIsAlwaysRight() SharesCanBeSoldInUnder100ms() etc

Expecting a tool to write the test for you is asking a bit much. If your code follows a pattern you might be able to template some basic stuff. But the expected result of your processes is going to be bespoke to your application. You are always going to have to fill in the hard bit manually.

Either fail the review on the basis that these tests aren't present, or as part of your current manual testing process, write the tests yourself. Then at least next time your job is easier.

This is the pragmatic answer. It's hard work and will take time, but it's simple to implement and understand, can be implemented incrementally, forces you to move towards separation of concerns and it's the normal solution to fix this common problem.

Ewan
  • 70,664
  • 5
  • 76
  • 161
  • If we had tests and continous integration for sql side of things, then obviously we would not have the issue. I am hoping, not sure of course if at all possible, for some approach that helps without changing the “whole work place”. – CEGRD Sep 12 '19 at 08:16
  • Add a new Code Review requirement - sql must have tests – Ewan Sep 12 '19 at 08:20
  • Just having tests does not remove the necessity of understanding the changes. You still have to verify whether tests are accurately testing the changes. – CEGRD Sep 12 '19 at 08:22
  • I am saying that 1. understanding the changes is _not_ required. and 2. The tests explain the changes – Ewan Sep 12 '19 at 08:25
  • That assumes tests are written accurately. Right? – CEGRD Sep 12 '19 at 08:27
  • nope. it doesnt – Ewan Sep 12 '19 at 08:28
  • Tests are also code you need to review. You need to make sure they test the right thing and also the changes are covered by the tests. So in my experience, tests by themselves do not reduce the effort of reviewing the changes necessarily. – CEGRD Sep 12 '19 at 09:20
  • 1
    @CEGRD it seems to me like everyone here is giving you basically the same answer "do more tests!" and you are just saying "that won't work!" But we all do the same job and it works for us. – Ewan Sep 12 '19 at 09:26
  • 1
    I'm not, it's a fine question. My answer is correct and will help you. But there's no way for any of us to make you change you mind if you dont think it will work. It's up to you to _test_ each answer and see if they work or not – Ewan Sep 12 '19 at 09:31
  • I did not specifically ask you the question. I asked to the whole community. If you do not have an answer that helps me, then please respect that. But please let’s not try to get into an argument type of “Who knows better”. I did not say the answers given here would not help the reviewer. What I am saying, when the reviewer has no choice but to review the pull request as it is, are there any tools to ease the process? If you do not know any such tools, then I would rather keep looking for it, instead of stopping. – CEGRD Sep 12 '19 at 12:13
  • @CEGRD, to dial down tensions, *how* exactly do you imagine that a tool could ease your task? *What* aspect of your task is routine or amenable to tooling? Your basic complaint was that you have to spend hours *gaining an understanding* of the code in order to perform your role in verifying it. You state you cannot change anything about your organisational processes, you can only control your own working method and time, in which code is sent to you in its current form. What, in your imagination, can a solution possibly look like in these terms? – Steve Sep 12 '19 at 13:36
  • I can enumerate generic tools but I don’t know what I don’t know. Some hypothetical tool examples (not saying that each individually solves the issue completely, also not saying such tools do not already exists or I am describing them specific enough for someone to implement them) A tool that gathers more context about the PR, shows you dependencies and the impact of the changes on the underlying database, lets you run the query from the code review, runs static and dynamic analysis and warns you about issue patterns, sql inaccuracies, performance issues etc... – CEGRD Sep 12 '19 at 13:44
  • @CEGRD, Rumsfeldian "unknown unknowns" in other words. You seem not to have even begun to analyse (in ways appropriate to a software professional) what the tool you want actually needs to do. I've apprehended your problem as being "the time and effort taken to pick apart and gain an understanding of the query". Is your task just a question of gathering explicit information and performing routine tests on the code? Or is it using judgment to decide what information is relevant to consider, infer (from human experience) likely requirements and ensure the code does what it *should* do? – Steve Sep 12 '19 at 15:38
  • For example, you list "performance issues". There is a partial way to check this - either by executing the code outright, or (if necessary) setting up a test rig to provide the base data and reasonable range of parameters. That will provide the performance data. But a human must then still decide whether the performance is *appropriate* to what the query has to do - that cannot be automated, it cannot be boiled down to a mere metric, and it cannot be done without an understanding similar to (or exceeding that of) the query writer. – Steve Sep 12 '19 at 15:47
  • You also list "impact of the changes on the underlying database". Well, clearly, all queries (and all changes to the query) are expected to have an impact of some sort. The question is whether that impact is proper or not. Should a query delete the entire contents of this table? Should it set a flag in particular circumstances? You can only say if you understand the requirements or have a tacit knowledge of how the database should work as a whole. – Steve Sep 12 '19 at 15:54
1

Gotta be honest; I'm not a fan of these kinds of systems. They are very difficult to maintain, for several reasons which you have already articulated. Your review process is not the problem; your architecture is the problem.

As I see it, your choices are twofold:

  1. Begin having architectural reviews on a regular basis so that everyone stays familiar with your tables and queries, and provide adequate documentation and training that explains how it all works, or

  2. Shift your simple (CRUD) queries to the front end or middleware application and do away with the stored procedures, reserving those mostly for batch and server operations.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • 1
    I am sorry but this is not a pragmatic answer. There is not much I can do with this answer practically. – CEGRD Sep 12 '19 at 08:21
  • Have you given some thought to the other community members' observations that the way you characterize you problem isn't exactly realistic? – Robert Harvey Sep 12 '19 at 12:45
  • In what sense? The problem is real. It is not some imaginary issue I am depicting. And often times I do have to review the code changes without the option of sending them back to the author. And I have no authority to change the organizational processes or enforcing things on others. Therefore, I am not saying the answers given so far do not make sense; what I am saying is that they are simply not applicable in my situation. – CEGRD Sep 12 '19 at 12:49