What can I do to let our team have code reviews of branch merges having hundreds of screens worth of Github diffs?

Question

"Code review" (aka "peer review") seems like a really great idea, so my team started practicing it.

For a little while it worked well, but then a co-worker merged a branch in, and asked for a review of the code. When I went to review her code, the Github diff page was about 420k pixels in height. Given that my screen is about 500px, that works out to 840 screens worth of code to review.

To read the code, fully "grok" it, and write appropriate comments, I probably need an average of one minute per screen, which works out to 14 hours. Now to be fair, some libraries got checked in to this commit, so some portion of that can be skipped ... but even if libraries took up 6 hours worth, that still leaves me with an entire day spent reviewing this merge.

That can't be the most effective use of my time. And this is just one merge; we will no doubt have other large merges to review in the future as well.

So, my question is, what can I do (either in terms of procedure or in terms of utilizing review tools) to let our team have code reviews of branch merges, while at the same time not eating up entire days on reviews?

500 pixels high? You guys haven't replaced [these](http://az413224.vo.msecnd.net/img/8573/m_s_p_8573_3.jpg) yet? You need to tell us how many lines of code are in the diff page, otherwise, I don't know how your question can be meaningfully answered. — Robert Harvey, May 22 '14 at 18:44
Well, that's the full height of my monitor (1080px) minus the system toolbar height, the browser tabs/buttons/etc. height, and the developer console height. But why is the exact number of pixels or exact number of lines is relevant to the actual question? What matters is that I've got a big review to do, I'll have more big reviews to do in the future, and I'm trying to figure out how best to handle them. The answer to that should be the same regardless of whether I have 5k, 10k, or 50k lines in this particular merge right? Because no matter what it's not just about this particular merge. — machineghost, May 22 '14 at 19:12
It's relevant because, if you're spending 14 hours to review code that took 14 hours to write, that's one thing, but if you're spending 14 hours to review code that took 1400 hours to write, that's quite another. The size of your screen is a prominent feature of your question, so it makes me wonder if you're asking about your tools or your process. — Robert Harvey, May 22 '14 at 19:14
Are you asking for a more efficient way to review large, sweeping changes / additions to a codebase? — BrandonV, May 22 '14 at 19:15
@BrandonV Yes, I'm asking what I/my team can do so that when someone does a big merge, the reviewer(s) don't lose entire days to the review. I can see this coming in the form of either different tooling (eg. "using X peer review tool makes it easy to mark certain sections as "don't bother reviewing") or a different process (eg. "the developer asking for the review must answer a pre-review questionaire to better target the review") ... or both. — machineghost, May 22 '14 at 19:18
Not that this is really an answer, but ideally you would review the branches separately, and after a merge you would only smoke test to make sure all went as planned. TDD can help with said smoke tests by ensuring everything works as expected. — BrandonV, May 22 '14 at 19:22
@Robert Harvey I get that if a co-worker is only merging once a year, then it's not un-reasonable to expect a review to take a day. But that isn't the case, so without measuring the exact number of lines my team produces per day, suffice it to say that these large merges are frequent enough to be painful, so I'm wondering what I can do to make them less painful. My request for "how can this process be made less painful" applies whether that pain point manifest every day, every week, or every month. — machineghost, May 22 '14 at 19:23
The obvious answer to your question seems to be to get people to merge a bit more often, or merge smaller bits of code. Aren't large merges like this already problematic, irrespective of the code review process? (conflicts, etc) — Robert Harvey, May 22 '14 at 19:25
Well that's certainly one answer, but does it really solve the problem? Ultimately I would think that would result in more review time, because A) each review has a "spin up/down" time cost that's not related to the actual length of the review, and B) if a developer changes X in commit 1, then changes it back in 2, the reviewer has two diffs to review, vs. if they check in a single merge at the end the reviewer wouldn't have any diff to review in that case. — machineghost, May 22 '14 at 19:26
You should probably tell us how your actual review process ran, as I'm not sure many other people are having this issue. — BrandonV, May 22 '14 at 19:34
It seems to me your problem has more to do with how often people are merging. On my team, a pull request with a couple hundred lines of code would be considered large. How long do people go before they are "done"? The other issue you are going to have if you let changes reach the 10k loc level before review is that if the review turns up issues, it's going to take lots of time to fix. One point of reviews is to find issues early before they waste further time. — Gort the Robot, May 22 '14 at 20:03
@BrandonV My process is basically non-existent: developer1 works on some code, whenever they feel like it they either merge it in to mater or make a pull request, then they ask developer2 for a review. Developer2 looks at the commit/merge diff on github, clicks on interesting lines and leaves comments. — machineghost, May 22 '14 at 20:19
@Steven Burnap Thanks, but I still don't see how that reduces review time overall. If a developer changes line 55 one way, then asks for a review, then changes line 55 to something else, then asks for another review, that line has to be reviewed twice. In contrast, if they had just waited and and asked for a review of the combined commits, the reviewer would only have to review the line once. Plus, it takes me 30 seconds (or whatever) to start a review, so more reviews = more time there too. What am I missing that makes more reviews take less time? — machineghost, May 22 '14 at 20:23
It seems like a couple people have a potential answer in their comments; while I'd love to hear any other ideas on how I can reduce the pain of long reviews, at the very least it seems like one legitimate answer to my question is "review/merge more frequently" ... so perhaps that should be submitted as an answer (perhaps along with an answer to my previous comment)? — machineghost, May 22 '14 at 20:28
@machineghost Because if the coder makes a mistake in the first thing they did, then spends three months building on top of it, it make take a hell of a lot of effort to undo the mistake, which would have been a ten minute fix if caught right away. — Gort the Robot, May 22 '14 at 20:33
@Steven Burnap Your comment assumes we have programmers losing weeks to mistakes that could have been caught in review, but we don't. I don't mean to be disrespectful, but my question wasn't "how do I spend more time to improve my code quality?", my question was "how do I get the benefits of code review without losing huge amounts of time?" A programmer produces X lines of code per day; does everyone out there seriously review every last line of code written? That seems completely unsustainable. If not, what process/tools do they use to reduce that time ... that's all I'm asking. — machineghost, May 22 '14 at 20:53
What benefit of code review are you trying to get, if not code quality? — Gort the Robot, May 22 '14 at 21:29
A) inform the reviewer of what the reviewee has been doing, B) educating newer programmers, C) ensure compliance to our coding standards, D) ensure team coherence/consistent designs. Also code quality of course. — machineghost, May 22 '14 at 21:49
Look, this isn't that hard to understand. There's a spectrum. We could require every member of the team to have every line they ever write reviewed by every other member of the team. Code quality would go up, but productivity would go through the floor. Or, we could do no reviews, spend no time, but lose all the benefits of code review. Obviously the sensible thing is to find the sweet spot of time spent vs. benefit gained somewhere in the middle of the spectrum. When I'm saying "I'm too far on the spending time side", you telling me "spend more time" doesn't help. Does that make sense? — machineghost, May 22 '14 at 21:52
You are rejecting advice and defending your 'position'. I suggest holding off from that while evaluating answers. Your basic assumptions and approach have lead you to this position. That's why people have questions about them. There is no "answer" to your question if too tightly constrained. Tons of code is unreasonable to review aty one go (I will say that once a year is unreasonable too, not "ok" due to infrequency). I will venture that you may be separating code review from code writing too much. If I have have code from 9 months ago.... I will need to re-review it myself! — Michael Durrant, May 23 '14 at 11:26
Saying that breaking up code reviews into smaller bits which will ultimately take more time for the same amount of code is absolutely true. The issue is long term quality and responsiveness to user needs. I used to do waterfall development with a 6 month lead. Now I do Agile where we release every week. It works much better for **the business** — Michael Durrant, May 23 '14 at 11:29
@machineghost Code reviews seem like a really expensive way of taking care of A, B, C, and D. The cheapest way to inform the reviewer what the reviewee has been doing is to *just say it*. Newer programmers will learn more from *doing* (e.g. pair programming with a mentor) than passively sitting at a long review of code they're not particularly invested in. Compliance to coding standards can be automated with tools. And isn't coherence of design mostly determined by the division of tasks? — Doval, May 23 '14 at 13:17
@Michael Durrant I'm defending my question's validity, not my position. You want to answer a question about "what's the best way to improve quality for the business", but that's not the question I asked. As the three answers below demonstrate, there certainly are people willing to provide thoughtful responses *to my query*; if you don't like that query I suggest you find a different one to respond to. — machineghost, May 23 '14 at 16:29
@Doval That's a very good point; we should adopt code reviews in the first place to get real benefits, not just to jump on the bandwagon. I was thinking ... A) we need to be able to fix bugs in each other's code, so actually seeing that code after its written provides more value in that regard than just saying "I wrote some validation on page Foo". B) I was thinking more of reviewing the work of newer programmers, and then "educating" them by providing feedback comments on that code. — machineghost, May 23 '14 at 16:38
C) A tool could help with formatting standards (though I personally don't know of any good ones for JS). But abstract issues like "create a class to encapsulate significant logic, don't just make a giant function" can only be caught by reviews. D) This one's kinda nebulous I guess. I was thinking of coherency across the codebase, but that's really the same issue as C). Scatch D). So yeah, I really do think code reviews can provide my team real benefit in ways that other approaches can't, and that this isn't just bandwagon-jumping. But I still don't want to lose time to them unecessarily. — machineghost, May 23 '14 at 16:43
**Please avoid extended discussions using comments. If you would like to discuss the question further then please use our [chat feature](http://chat.stackexchange.com/). Thank you.** — maple_shaft, May 25 '14 at 12:56

score 4 · Accepted Answer · answered May 23 '14 at 10:29

4

This is actually one of the benefits of introducing code reviews in your organization: getting people to make small, incremental changes. Just wait until the guy that made that huge commit has to review a similarly sized commit of yours.

Other, more constructive ways of getting to the same result without starting a "commit war", could be:

setting a soft upper limit to merge sizes, as suggested by others
do not allow changes that break the limit by several orders of magnitude. Enforce breaking them into smaller chunks. Either this is possible, or the change is not well designed by its author.
do not check libraries into your repository, but handle them through a build system like maven or scons. This is such a general recommendation, that it makes me wonder if there is a special constraint in your team to check libraries in; in that case, I would suggest to you keeping them in a separate repository. If they are not your code, then you do not have to review them.
plan and reserve developer time to do code reviews. This should definitely count as development time. At first, the process will slow you down, but in a few months it will keep you moving at constant speed.

I would discourage trying to find tools that automate code reviews. This defeats the two main goals of the review: have a human review the meaning of a change that would escape a machine, and spread the knowledge about the code base among the developers.

answered May 23 '14 at 10:29

logc

2,190
15
19

"Review smaller chunks of code" was also suggested in the comments. My fear is that by doing that I'm going from the frying pan to the fire. Let's say I review one big commit: there's 2 min of context-switching in to "review mode", then 40 minutes of reviewing code. Then instead let's say I review 2 smaller commits. That's 4 min of context-switching, 40 min of review, plus a few more minutes of review because I review some lines in commit 1 that get taken out in commit 2. That seems like more, not less time spent. Could you possibly explain what's wrong with my math/assumptions? – machineghost May 23 '14 at 17:14
And as for tools, I meant some sort of human-assistance code review tool, like Rietvald; I 100% agree automated code reviews can't replace human ones. – machineghost May 23 '14 at 17:15
You are right that code reviews are very time consuming. There is one thing wrong about your "context switching" assumptions, though: in the process of code review, you can ask the requester to do any necessary changes, and this means it would be strange for the same requester to modify the same lines again in a next merge request. You should try to put a fair share of the workload on the requester: the review is only over once a requirement has been met or a bug fixed, all tests pass, and all style conventions are met. This can be also a lot of work for the requester. – logc May 23 '14 at 17:48
As for tools, I am sorry to say I don't have any experience with tools other than diffs and plain conversation. Maybe such support tools are worth checking out! – logc May 23 '14 at 17:50
`complexity ≈ conflicts²` so for multiple commits/reviews we get `complexity ≈ a² + b² + c² + …` and for a single commit/review we get `complexity ≈ (a+b+c+…)²` and it can be seen that `a² + b² + c² < (a+b+c)²` – ctrl-alt-delor Dec 06 '14 at 15:17

score 4 · Answer 2 · answered May 23 '14 at 12:56

Sometimes you can't get away with a small change. You might need to forgo GitHub's code review tools and use Plain Old Git:

$ git remote add somebody https://github.com/somebody/repo.git
$ git fetch somebody
$ git diff --name-only development somebody/topic_branch
# shows list of files that were touched
$ git diff --name-status development somebody/topic_branch
# Shows list of files plus whether they were added, deleted, modifed, renamed, etc.

Then you can use:

$ git difftool development somebody/topic_branch -- ./path/to/file/or/directory

This will help you whittle down the diff.

Or just see the diffs of all non added files:

$ git diff --name-status development somebody/topic_branch | grep -vE '^A'

If you have a comment to make, you can always head back to GitHub and comment on the file or line once you've had a chance to review things.

I'll take a look at using `git difftool`; Github is so convenient (especially since I can make comments right away as I read), but at the same time it'd also be convenient to eliminate the external libraries directory from the diff. — machineghost, May 23 '14 at 16:46

score 3 · Answer 3 · answered May 23 '14 at 07:07

A working solution is to put a soft upper limit on how large merges you accept, but changes have to be reviewed as some point.

If you don't have much experience with source code reviews yet, you could start out recording metrics of the reviews (too):

Merge/commit size
Issues resolved
Errors introduced
Review reports
- Reviewer
- Time used
- Volume of comments
- Conclusion
...

These numbers will help you figure out what sizes of merges/commits are optimal in your organisation.

For your specific example, just committing the external library separately would have made the whole problem much, much smaller. - My basic assumption about merge/commit reviews is that the time it takes is proportional to the square of the size of the "diff" (I am at least certain that review time is not just linear, as you assume). If you collect the metrics, you will be able to see how wrong my assumption is.

If I had a performance bug in my code, I wouldn't start by trying to solve it, I'd profile first to find out for sure where the issue truly was. Since my question (like a performance bug) seems to have a variety of possible causes, I think it makes a lot of sense to narrow it down by collecting more information. Thanks for the suggestion. — machineghost, May 23 '14 at 17:22

What can I do to let our team have code reviews of branch merges having hundreds of screens worth of Github diffs?

3 Answers3

Linked

Related