298

Why does every serious Github repo I do pull requests for want me to squash my commits into a single commit?

I thought the git log was there so you could inspect all your history and see exactly what changes happened where, but squashing it pulls it out of the history and lumps it all into one commit. What is the point?

This also seems to go against the "commit early and commit often" mantra.

Michael Durrant
  • 13,101
  • 5
  • 34
  • 60
hamstar
  • 3,091
  • 2
  • 11
  • 6
  • 2
    For a big repo, it saves a lot of time and space when people want to do anything with the repo. – raptortech97 Nov 18 '14 at 22:50
  • 15
    "*This also seems to go against the "commit early and commit often" mantra.*" - No, you do commit early/often, but you don't necessarily want to PR each tiny change. And the reviewer don't want to look them through, that's for sure. – JensG Nov 18 '14 at 23:34
  • 2
    possible duplicate of [How often to open pull requests](http://programmers.stackexchange.com/questions/258475/how-often-to-open-pull-requests) – JensG Nov 18 '14 at 23:39
  • 11
    One doesn't need to put an edit in the question to summarize what the perceived answer is. That is the place of accepted answers and upvotes. People can draw their own conclusions from reading the answers. –  Nov 19 '14 at 02:18
  • 2
    Note: Review the page edit history, editors and times closely before making any assumptions about the original edit that MichaelT refers to. For example is wasn't either myself or Andrew Hill. – Michael Durrant Nov 23 '14 at 19:18
  • 2
    It's understandable when we're talking about initial PR. But what about review fixes? So, firstly you create PR with one nice "Added feature X" commit. But then reviewers point you typos, mistakes, etc, so you commit and push "Fix typo according to review", "Changed logic a little bit" etc. Does it mean that I should use "push --force" to rewrite history already pushed? – The Godfather May 25 '16 at 09:49
  • 15
    I still don't understand why there is a desire for squashing commits. I think there are far too many cons to squashing with almost no pros. It is a presentation issue masquerading as a data issue. – mkobit Sep 26 '17 at 19:34
  • 11
    Any useful analytics on the log are lost with squashes. So it seems like a developer made a feature with a single commit. It's also tough to see why an odd piece of code exists. The history of a feature can be important, and can go a long way to explain why code is the way it is. The concise view of the log is useful, but can't it be achieved another way? – Tjaart Apr 24 '19 at 14:31

6 Answers6

215

So that you have a clear and concise git history that clearly and easily documents the changes done and the reasons why.

For example a typical 'unsquashed' git log for me might look like the following:

7hgf8978g9... Added new slideshow feature, JIRA # 848394839
85493g2458... Fixed slideshow display issue in ie
gh354354gh... wip, done for the week
789fdfffdf... minor alignment issue
787g8fgf78... hotfix for #5849564648
9080gf6567... implemented feature # 65896859
gh34839843... minor fix (typo) for 3rd test

What a mess!

Whereas a more carefully managed and merged git log with a little additional focus on the messages for this might look like:

7hgf8978g9... 8483948393 Added new slideshow feature
787g8fgf78... 5849564648 Hotfix for android display issue
9080gf6567... 6589685988 Implemented pop-up to select language

I think you can see the point of squashing commits generally and the same principle applies to pull requests - Readability of the History. You may also be adding to a commit log that already has hundreds or even thousands of commits and this will help keep the growing history short and concise.

You want to commit early and often. It's a best practice for many reasons. I find that this leads me to frequently have commits that are "wip" (work-in-progress) or "part A done" or "typo, minor fix" where I am using git to help me work and give me working points that I can go back to if the following code isn't working out as I progress to get things working. However I do not need or want that history as part of the final git history so I may squash my commits - but see notes below as to what this means on a development branch vs. master.

If there are major milestones that represent distinct working stages it's still ok to have more than one commit per feature/task/bug. However this can often highlight the fact that the ticket under development is 'too big' and needs to be broken down into smaller pieces that can standalone, for example:

8754390gf87... Implement feature switches

seems like "1 piece of work". Either they exist or they don't! Doesn't seem to make sense to break it out. However experience has shown me that (depending on organizational size and complexity) a more granular path might be:

fgfd7897899... Add field to database, add indexes and a trigger for the dw group
9458947548g... Add 'backend' code in controller for when id is passed in url.
6256ac24426... Add 'backend' code to make field available for views.
402c476edf6... Add feature to UI

Small pieces mean easier code reviews, easier unit testing, better opportunity to qa, better alignment to Single Responsibility Principle, etc.

For the practicalities of when to actually do such squashes, there are basically two distinct stages that have their own workflow

  • your development, e.g. pull requests
  • your work added to the mainline branch, e.g. master

During your development you commit 'early and often' and with quick 'disposable' messages. You may wish to squash here sometimes, e.g. squashing in wip and todo message commits. It is ok, within the branch, to retain multiple commits that represent distinct steps you made in development. Most of the squashes you choose to do should be within these feature branches, while they are being developed and before merge to master.
When adding to the mainline branch you want to the commits to be concise and correctly formatted according to the existing mainline history. This might include the Ticket Tracker system ID, e.g. JIRA as shown in examples. Squashing doesn't really apply here unless you want to 'roll-up' several distinct commits on master. Normally you don't.

Using --no-ff when merging to master will use one commit for the merge and also preserve history (in the branch). Some organizations consider this to be a best practice. See more at https://stackoverflow.com/q/9069061/631619 You will also see the practical effect in git log where a --no-ff commit will be the latest commit, at the top of the HEAD (when just done), whereas without --no-ff it may be further down in the history, depending on dates and other commits.

Michael Durrant
  • 13,101
  • 5
  • 34
  • 60
  • 100
    I completely disagree with this philosophy. Small regular commits break a task down into smaller tasks and often give useful information on precisely what was being worked on when a line was changed. If you have small "WIP" commits, you should use an interactive rebase to clean that up before pushing them. Squashing PRs seems to completely defeat the point of regular small commits IMHO. It's almost disrespectful to the author who has gone to the effort of giving log messages with specific commit info and you just throw it all away. – Jez Feb 06 '18 at 11:44
  • 12
    Interesting. At the end of the day I base my 'philosophy' on what I have seen works. To be clear - breaking down work into small parts and committed each is great while working on the change. My experience has been that _once_ this work is merged to master, the _main_ thing that tend to get examined is 'what, in its _entirely_ what is this merge commit that (usually) caused issues x... – Michael Durrant Feb 06 '18 at 17:26
  • 3
    ...and Yes another level of detail examination of how the person did the work would be available if you keeo the commits but my _experience_ has told me that, post master merge we really aren't needing that level of detail. That's my exoerience. ymmv and of course you may still disagree :) – Michael Durrant Feb 06 '18 at 17:27
  • 1
    Most of all i look at the actual situation that comes up and it's usually an issue with the whole merge and individual commits are too much detail. You definitely raise a good point about throwing away the detail work provided by others. That does resonate with me. Then again my favorite code review is 'mostly red' where we delete code and reduce the size of the codebase. – Michael Durrant Feb 06 '18 at 17:30
  • one approach might be to keep the branch around, post master merge. Then again I'm not actually sure if the merge is applied to the branch or the merge commit. hmmm. This has the advantage of potentially keeping history but the downside is you end up with a large number of branches which can be confusing itself. – Michael Durrant Feb 06 '18 at 17:33
  • oh I already addressed that in my answer (it's been 4 years...) "Using --no-ff when merging to master will use one commit for the merge and also preserve history (in the branch). Some organizations consider this to be a best practice." – Michael Durrant Feb 06 '18 at 19:04
  • 22
    I'm also anti squash. You shouldn't write silly commit messages in the first place. And in the example (85493g2458... Fixed slideshow display issue in ie) That would be much more helpful if I was to debug the code associated it with. – Thomas Davis Mar 29 '18 at 03:28
  • 2
    I don't see how the first example is `(what) a mess`. I disagree with this disturbing trend. We already have tools which can show composite diffs for a branch against a base branch. Atomic commits can be more "thoughtful" than squashed commits, by isolating functionality they can provide a means to clearly reverse changes. – ctpenrose Nov 27 '18 at 22:53
  • 16
    It is unfortunate that such careless and misplaced dogma has emerged in SCM practice. Filtering tools should be utilized to manage history readability not commit behavior. – ctpenrose Nov 27 '18 at 23:01
  • 11
    disturbing trends and dogma. yow. I'd prefer a more fact based argument presented in a more agreeable manner. You are welcome to post / vote a different opinion. That's the point of community voting – Michael Durrant Nov 28 '18 at 02:34
  • 1
    There is also a question of how many changes and how many files are you grouping within a single commit. I agree some minor tiny changes may be better squashed (maybe, and not in all cases), but if you are changing different files and adding too many changes, perhaps more granular commits would be better, as the rationale/context for some changes may be lost or not clearly evident. I don't think we can apply a general rule to all cases. – Amy Pellegrini Jun 12 '19 at 11:25
  • 4
    For me, the real value in keeping all of that history is when I am understanding the intentions behind a change. There have been numerous times where I've been debugging or simply reading code, and come across an unusual bit of code that _seems_ functional and important, but without any comments or other info to describe _why_. Often, a suitably granular commit history gives precisely that context and intention; squashing branches to merge them often destroys that granularity, and with it the context of individual changes. – chaos95 Jul 17 '19 at 04:19
  • 7
    You've added "a little additional focus on the messages" to your example of squashed commits, but *not* to your example of unsquashed commits. This is a clear bias and creates an unfair comparison. Apply the same sense to your unsquashed commit and you won't have a commit like `wip, done for the week` in the first place. – lance.dolan Oct 01 '19 at 22:03
  • 1
    i am no longer maintaining the answer. others may edit as they wish. regards, Michael. – Michael Durrant Oct 18 '19 at 23:02
  • @Jez I read your comment as perfectly agreeing with this answer- clean up the 7 commits including "wip" by "squashing" them down to 3 good commits, and then keep them in the final merge with --no-ff. The answer says you "can" squash merge into a single commit, but suggests that usually you wouldn't want to do that (which I think you agree with). Did you interpret it differently? – TTT Nov 24 '20 at 18:16
  • 1
    People have these arguments assuming each other's project contexts are more similar than they are. Everyone's talking past each other. That said, the OP is right so nerrr. – Adamantish Jan 21 '21 at 17:47
  • To elaborate, I agree with the OP because here's my context: Everything goes through code review and PRs are kept as small as reasonably possible. In that context one can see work at two useful granularity levels: First, the functionally atomic, which is the level of the small PR. The database change is meaningless without the corresponding API change. For my backend web work this is the appropriate granularity for the "shared record". Then there's technically atomic; making the database change as a separate commit is helpful just a "save game" checkpoint in my process. – Adamantish Jan 21 '21 at 18:13
  • What you are describing as "carefully managed" is likely not what OP meant by squashed. They're likely referring to GitHub's ["squash and merge"](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/about-pull-request-merges#merge-message-for-a-squash-merge) or something similar because they said "squash my commits into a single commit". While I agree that your example is cleaner (and should be what folks strive for) I don't think it addresses the crux of the question. – Captain Man Apr 06 '22 at 22:25
38

The main reason from what I can see is as follows:

  • The GitHub UI for merging pull requests currently (Oct 2015) does not allow you to edit the first line of the commit message, forcing it to be Merge pull request #123 from joebloggs/fix-snafoo
  • The GitHub UI for browsing the commit history currently does not allow you to view the history of the branch from the --first-parent point of view
  • The GitHub UI for looking at the blame on a file currently does not allow you to view the blame of the file with the --first-parent point of view (note that this was only fixed in Git 2.6.2, so we could forgive GitHub for not having that available)

So when you combine all three of these situations above, you get a situation where unsquashed commits being merged look ugly from the GitHub UI.

Your history with squashed commits will look something like

1256556316... Merge pull request #423 from jrandom/add-slideshows
7hgf8978g9... Added new slideshow feature
56556316ad... Merge pull request #324 from ahacker/fix-android-display
787g8fgf78... Hotfix for android display issue
f56556316e... Merge pull request #28 from somwhere/select-lang-popup
9080gf6567... Implemented pop-up to select language

Whereas without squashed commits the history will look something like

1256556316... Merge pull request #423 from jrandom/add-slideshows
7hgf8978g9... Added new slideshow feature, JIRA # 848394839
85493g2458... Fixed slideshow display issue in ie
gh354354gh... wip, done for the week
789fdfffdf... minor alignment issue
56556316ad... Merge pull request #324 from ahacker/fix-android-display
787g8fgf78... hotfix for #5849564648
f56556316e... Merge pull request #28 from somwhere/select-lang-popup
9080gf6567... implemented feature # 65896859
gh34839843... minor fix (typo) for 3rd test

When you have a lot of commits in a PR tracing where a change came in can become a bit of a nightmare if you restrict yourself to using the GitHub UI.

For example, you find a null pointer being de-referenced somewhere in a file... so you say "who did this, and when? what release versions are affected?". Then you wander over to the blame view in the GitHub UI and you see that the line was changed in 789fdfffdf... "oh but wait a second, that line was just having its indent changed to fit in with the rest of the code", so now you need to navigate to the tree state for that file in the parent commit and re-visit the blame page... eventually you find the commit... it's a commit from 6 months ago... "oh **** this could be affecting users for 6 months" you say... ah but wait, that commit was actually in a Pull Request and was only merged yesterday and nobody has cut a release yet... "Damn you people for merging commits without squashing history" is the cry that can usually be heard after about 2 or 3 code archeology expeditions via the GitHub UI

Now let us consider how this works if you use the Git command line (and super-awesome 2.6.2 which has the fix for git blame --first-parent)

  • If you were using the Git command line, you would be able to control the merge commit message completely and thus the merge commit could have a nice summary line.

So our commit history would look like

$ git log
1256556316... #423 Added new slideshow feature
7hgf8978g9... Added new slideshow feature, JIRA # 848394839
85493g2458... Fixed slideshow display issue in ie
gh354354gh... wip, done for the week
789fdfffdf... minor alignment issue
56556316ad... #324 Hotfix for android display issue
787g8fgf78... hotfix for #5849564648
f56556316e... #28 Implemented pop-up to select language
9080gf6567... implemented feature # 65896859
gh34839843... minor fix (typo) for 3rd test

But we can also do

$ git log --first-parent
1256556316... #423 Added new slideshow feature
56556316ad... #324 Hotfix for android display issue
f56556316e... #28 Implemented pop-up to select language

(In other words: the Git CLI lets you have your cake and eat it too)

Now when we hit the null pointer issue... well we just use git blame --first-parent -w dodgy-file.c and we are immediately given the exact commit where the null pointer de-reference was introduced to the master branch ignoring simple whitespace changes.

Of course if you are doing merges using the GitHub UI then git log --first-parent is really crappy thanks to GitHub forcing the first line of the merge commit message:

1256556316... Merge pull request #423 from jrandom/add-slideshows
56556316ad... Merge pull request #324 from ahacker/fix-android-display
f56556316e... Merge pull request #28 from somwhere/select-lang-popup

So to cut a long story short:

The GitHub UI (Oct 2015) has a number of shortcomings with how it merges pull requests, how it presents the commit history and how it attributes blame information. The current best way to hack around these defects in the GitHub UI is to request people to squash their commits before merging.

The Git CLI doesn't have these issues and you can easily choose which view you want to see so that you can both discover the reason why a particular change was made that way (by looking at the history of the unsquashed commits) as well as see the effectively squashed commits.

Post Script

The final reason often cited for squashing commits is to make backporting easier... if you only have one commit to back port (i.e. the squashed commit) then it is easy to cherry pick...

Well if you are looking at the git history with git log --first-parent then you can just cherry pick the merge commits. Most people get confused cherry picking merge commits because you have to specify the -m N option but if you got the commit from git log --first-parent then you know that it is the first parent that you want to follow so it will be git cherry-pick -m 1 ...

  • 1
    Nice answer! But cherry-picking a range is easy enough, not sure I buy it as a reason. – Stiggler Aug 01 '16 at 03:54
  • 2
    @stiggler I *personally* do not buy it as a reason, but I have seen it cited by others as a reason. IMHO the git tooling is what you want and squashing commits is evil... but I would say that having driven the fixing of `--first-parent` myself in order to win an argument against squashing commits ;-) – Stephen Connolly Aug 02 '16 at 12:57
  • 3
    This ought to be the accepted answer. Far less dogma and it shows that history filtering can be used to "have your cake and eat it too". You don't have to annihilate history to bring clarity to git history explorations. – ctpenrose Nov 28 '18 at 00:07
  • It will not be so easy to cherry pick if by squashing your commits you end up grouping many changes to different files into a single commit. I guess this varies a lot according to the code base, what changes were made, how many people is collaborating, and so on. I don't think we can come up with a single general rule to be valid in all cases. – Amy Pellegrini Jun 12 '19 at 11:12
34

Because often the person pulling a PR cares about the net effect of the commits "added feature X", not about the "base templates, bugfix function X, add function Y, fixed typos in comments, adjusted data scaling parameters, hashmap performs better than list"... level of detail

If you think that your 16 commits are best represented by 2 commits rather than 1 "Added feature X, re-factored Z to use X" then that is probably fine to propose a pr with 2 commits, but then it might be best to propose 2 separate pr's in that case (if the repo still insists on single commit pr's)

This doesn't go against the "commit early and commit often" mantra, as in your repo, while you are developing you still have the granular details, so you have minimal chance of losing work, and other people can review/pull/propose pr's against your work while the new pr is being developed.

Andrew Hill
  • 546
  • 4
  • 6
  • 10
    But when you squash commits you lose that history in your fork as well right? Why would you want to throw away that history? – hamstar Nov 18 '14 at 23:43
  • 8
    a) if you want to preserve history in your branch, its easy enough to have a "for merge" branch, and a "dev" branch (which might be what you need anyway, as you could be adding new commits to your main dev branch after you propose the PR) b) you still are preserving the net effect of those commits, it would only be in cases where you wish to reverse, or cherry-pick a sub-component of the pr that the individual commits would be needed. In that case, you are probably doing multiple things (eg:bugfix X, add feature Y) in a single PR and multiple PR's could again be needed. – Andrew Hill Nov 19 '14 at 00:00
  • @AndrewHill what a great concept! I want to introduce this workflow to my team, how has it worked for you? I wrote [a blog post about it](http://andrew.yurisich.com/work/2015/07/25/squashed-branches-for-pull-requests/), and came across this comment while researching it. – yurisich Jul 25 '15 at 08:29
  • 2
    Keeping the history (without squashing commits) and using Pull Requests support both the use cases: the details are available in the original commits; the high-level history can be read by inspecting only the merge commits, e.g. by performing `git log --simplify-by-decoration`. – Arialdo Martini Mar 06 '20 at 13:59
  • 1
    almost every person to look at the repo only cares about the high level history, once it grows to a certain size. – Andrew Hill Apr 30 '20 at 02:27
4

I would see if the code is going public or not.

When projects stay private:

I would recommend to not squash and see the making of the sausage. If you use good and small commits, tools like git bisect are super handy and people can quickly pinpoint regression commits, and see why you did it (because of the commit message).

When projects go public:

Squash everything because some commits can contain security leaks. For example a password that has been commited and removed again.

Vince V.
  • 153
  • 4
  • Amen. There was the following happen early in my career. I had spent a week on debugging some hardware quirks dependent code. I loved `rebasing` back then. So after I broke the code accidentally again, the `reflog` was so contaminated I couldn't find the green point. Treating commits as immutable (most of the time) mitigates this. – Vorac Aug 08 '20 at 06:32
  • Security leaks seems like a very specific issue versus general squash or not squash workflow. – qwr Sep 12 '22 at 19:18
3

Because of the perspective ... The best practice is to have a single commit per single <<issue-management-system>> issue if possible.

You could have as many commits as you want in your own feature branch / repo, but it is your history relevant for what you are doing now for YOUR perspective ... it is not he HISTORY for the whole TEAM / PROJECT or application from their perspective to be kept several months from now ...

So whenever you would like to commit a bug fix or feature to a common repo ( this example is with the develop branch ) what you could do it as follows:

how-to rebase your feature branch into develop quickly

    # set your current branch , make a backup of it , caveat minute precision
    curr_branch=$(git rev-parse --abbrev-ref HEAD); git branch "$curr_branch"--$(date "+%Y%m%d_%H%M"); git branch -a | grep $curr_branch | sort -nr

    # squash all your changes at once 
    git reset $(git merge-base develop $curr_branch)

    # check the modified files to add 
    git status

    # add the modified files dir by dir, or file by file or all as shown
    git add --all 

    # check once again 
    git log --format='%h %ai %an %m%m %s'  | less

    # add the single message of your commit for the stuff you did 
    git commit -m "<<MY-ISSUE-ID>>: add my very important feature"

    # check once again 
    git log --format='%h %ai %an %m%m %s'  | less

    # make a backup once again , use seconds precision if you were too fast ... 
    curr_branch=$(git rev-parse --abbrev-ref HEAD); git branch "$curr_branch"--$(date "+%Y%m%d_%H%M"); git branch -a | grep $curr_branch | sort -nr

    # compare the old backup with the new backup , should not have any differences 
    git diff <<old-backup>>..<<new-backup-branch>>


    # you would have to git push force to your feature branch 
    git push --force 
  • this doesn't even attempt to address the question asked, why squash git commits for pull requests? See [answer] – gnat Apr 10 '18 at 10:34
  • after the explanation attempt it should do it ... – Yordan Georgiev Apr 10 '18 at 10:56
  • I somehow fail to see how added explanation offers anything substantial over points made and explained in top voted answers that were posted several years ago (effort at improving the first revision is appreciated though) – gnat Apr 10 '18 at 11:05
  • the keyword is the "perspective", the perspective concept makes explaining this type of problems in IT much more easier, for example - your perspective for the answer of this question is the one already being provided, my perspective is using the "perspective" keyword and providing a practical example of how-to implement the very same best practice explained via different perspectives ... – Yordan Georgiev Apr 10 '18 at 11:39
2

I agree with the sentiments expressed in other answers in this thread about showing a clear and concise history of features added and bugs fixed. I did however, want to address another aspect which your question eluded to but did not explicitly state. Part of the hangup you may have about some of git's methods of working is that git allows you to rewrite history which seems strange when being introduced to git after using other forms of source control where such actions are impossible. Furthermore, this also goes against the generally accepted principle of source control that once something is committed/checked-in to source control, you should be able to revert to that state no matter what changes you make following that commit. As you imply in your question, this is one of those situations. I think git is a great version control system; however, to understand it you must understand some of the implementation details and design decisions behind it, and as a result it has a steeper learning curve. Keep in mind git was intended to be a distributed version control system and that will help to explain why the designers allow git history to be rewritten with squash commits being one example of this.

Fred Thomsen
  • 239
  • 2
  • 7
  • Strictly speaking, Git doesn’t allow you to change history. Commit *objects* are immutable. It’s only branch *pointers* that are mutable, and only then with a “forced update”, which is generally considered something you only do for development purposes, not on the master branch. You can always get back to a previous state using the reflog, unless `git gc` has been run to discard unreferenced objects. – Jon Purdy Nov 22 '14 at 22:37
  • You are correct, and I perhaps I should have mentioned this above. However I would argue for something like a force push or rebasing commits that have already been pushed to a remote repo, would qualify as rewriting history, as the order and arrangement of commits are as important as the commits themselves. – Fred Thomsen Nov 23 '14 at 04:00
  • That is true for a given commit. In the larger picture I think of interactive rebasing and filter-branch as effectively re-writing history by changing its composition. – Michael Durrant Nov 23 '14 at 15:14
  • 1
    To b fair, SVN keeps the "every commit is sacred" approach, but when merging creates a single commit with that merge and hides the revisions that contributed to that, so it appears like all SVN merges are "rebased". They're not, you just have to tell the history command to show you the merged components. I like this approach the best. – gbjbaanb Jul 23 '15 at 07:32