81

One of the advantages of using a DVCS is the edit-commit-merge workflow (over edit-merge-commit often enforced by a CVCS). Allowing each unique change to be recorded in the repository independent of merges ensures the DAG accurately reflects the true pedigree of the project.

Why do so many websites talk about wanting to "avoid merge commits"? Doesn't merging pre-commit or rebasing post-merge make it more difficult to isolate regressions, revert past changes, etc.?

Point of clarification: The default behavior for a DVCS is to create merge commits. Why do so many places talk about a desire to see a linear development history that hides these merge commits?

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
Jace Browning
  • 2,173
  • 1
  • 15
  • 25
  • 23
    Tools don't always get it right. And when they get it wrong, oh boy, do they get it wrong. – Oded Nov 18 '13 at 18:16
  • 2
    @Oded are you referring to automatic merge commits (such as those created by `git pull`)? I can see why that could be problematic, but my question is about **all** merge commits. – Jace Browning Nov 18 '13 at 18:27
  • possible duplicate of [Are Frequent Complicated Merge Conflicts A Sign of Problems?](http://programmers.stackexchange.com/questions/208513/are-frequent-complicated-merge-conflicts-a-sign-of-problems) "Merges and rebases should cause the exact same conflicts, for those **inherent conflicts that a human must resolve** (i.e. two developers changing the same line of code)..." – gnat Nov 18 '13 at 18:27
  • 1
    A thought I had after posting my answer, that IMO doesn't exactly fit as an answer: `git pull --rebase == svn up` (loose-equals, not strict-equals) – Izkata Nov 19 '13 at 05:22
  • in a centralized store like SourceSafe, changes would continue to be checked in in an ongoing effort, but once a stable plateau was reached we would Label all files as of a certain date. Then at least you can correlate bugs to either before or after such and such a version, or *rebase*-line. – Andyz Smith Nov 19 '13 at 14:20
  • Just to be clear, creating a merge commit when merging a branch into `master` (e.g. `feature`, etc.) is still preferred, correct? Our process is: rebase the `feature` branch from `master` right before merging and then merge into `master` with a merge commit. Let's say there were 10 distinct commits on the `feature` branch, they all show that they were done on that `feature` branch and then there is a merge commit showing that they were all merged into `master` at a specific date. I think that works well, no? – Joshua Pinter Jan 22 '21 at 16:22

7 Answers7

77

People want to avoid merge commits because it makes the log prettier. Seriously. It looks like the centralized logs they grew up with, and locally they can do all their development in a single branch. There are no benefits aside from those aesthetics, and several drawbacks in addition to those you mentioned, like making it conflict-prone to pull directly from a colleague without going through the "central" server.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
  • 7
    You are the only one who has understood my question correctly. How can I make it more clear? – Jace Browning Nov 18 '13 at 20:19
  • 13
    Maybe rephrase to "What are the benefits of a linear development history?" – Karl Bielefeldt Nov 18 '13 at 20:43
  • 1
    Yours is the only answer that answers the actual question being asked, but log aesthetics seems like a petty reason to prefer rebasing over merge commits. – Jay Nov 18 '13 at 23:51
  • 13
    "Often, you’ll do this to make sure your commits apply cleanly on a remote branch — perhaps in a project to which you’re trying to contribute but that you don’t maintain. In this case, you’d do your work in a branch and then rebase your work onto origin/master when you were ready to submit your patches to the main project. That way, the maintainer doesn’t have to do any integration work — just a fast-forward or a clean apply." http://git-scm.com/book/en/Git-Branching-Rebasing – Andyz Smith Nov 19 '13 at 01:40
  • 1
    I'm not quite clear how this makes it 'easier' except that the logs look prettier to a third party applier of your patches. – Andyz Smith Nov 19 '13 at 01:41
  • 2
    This is especially important in larger open source projects, where many people will read the logs and perhaps review the changes. Nobody wants to see a contributor's dozens of merges and incomplete intermediate version that combine into on logically consistent change. – CodesInChaos Nov 19 '13 at 19:25
  • 1
    @CodesInChaos Most of the sites I read are recommending all branches be rebased to a linear history, not just the partial changes within one contributor's logical change. Example: http://www.kerrybuckley.org/2008/06/18/avoiding-merge-commits-in-git/ – Jace Browning Nov 19 '13 at 21:42
  • 22
    You make it sound like it's just cosmetic, but it's actually practical: it's much easier to understand what's going on in a linear history than from a complicated DAG with lines crossing and merging. – Daniel Hershcovich Nov 21 '13 at 19:44
  • 2
    So use one of git log's gazillion history simplifying options, and you can make it look nice and simple. If you rebase, most of those are no longer available to you. You're forced to the one simplification and cannot see the more complex view even if you wish. – Karl Bielefeldt Nov 21 '13 at 20:04
  • 1
    One interesting "mix" of the workflows that I practice rebasing feature branches on top of master and then doing a non-ff merge back into master, noting the branch name. It keeps the history cleaner, makes bisecting easy, and makes it significantly easier to keep the history independent for each feature. – deterb Nov 22 '13 at 13:34
  • 29
    "There are no benefits aside from those aesthetics" — I don't know about that. It's really hard to understand what's going on in your history when your history looks like a circuit board. (Assuming multiple team members, each with their own branches and workflows.) – Archagon Nov 29 '13 at 11:31
  • Personally, I think that having all the different branches intertwining is prettier than just having one branch "out" at any point in time. And even then, doesn't `git rerere` also serve the purpose of avoiding extraneous merge commits in the final history without having to change one's local history? – JAB Jan 11 '16 at 13:43
  • Now that I think about it, using a workflow involving `git rerere` still works even when you've already pushed your branch to other repositories, too, unlike rebasing. – JAB Jan 11 '16 at 14:40
  • Rebasing becomes a nightmare when branches start to fall behind; you'll waste incredible amounts of time just rebasing along. (Going though all the history before of early unresolved conflicts..) – Oliver Dixon May 31 '16 at 10:53
  • 1
    This answer is just plain wrong. Sure it looks prettier, that's because the history is much easier to understand when it is linear. A linear history makes many things like cherry-picking, reverting or bisecting are so much simpler with a linear history. – lukad Jan 21 '20 at 13:24
58

In two words: git bisect

A linear history allows you to pinpoint the actual source of a bug.


An example. This is our initial code:
def bar(arg=None):
    pass

def foo():
    bar()

Branch 1 does some refactoring such that arg is no longer valid:

def bar():
    pass

def foo():
    bar()

Branch 2 has a new feature that needs to use arg:

def bar(arg=None):
    pass

def foo():
    bar(arg=1)

There will be no merge conflicts, however a bug has been introduced. Luckily, this particular one will get caught on the compilation step, but we're not always so lucky. If the bug manifests as unexpected behaviour, rather than a compile error, we may not find it for a week or two. At that point, git bisect to the rescue!

Oh crap. This is what it sees:

(master: green)
|             \_________________________
|                \                      \
(master: green)  (branch 1: green)     (branch 2: green)
|                 |                     |
|                 |                     |
(master/merge commit: green)            |
|                         ______________/
|                        /
(master/merge commit: red)
|
...days pass...
|
(master: red)

So when we send off git bisect to find the commit that broke the build, it's going to pinpoint a merge commit. Well, that helps a little, but it's essentially pointing at a package of commits, not a single one. All ancestors are green. On the other hand, with rebasing, you get a linear history:

(master: green)
|
(master: green)
|
(all branch 1 commits: green)
|
(some branch 2 commits: green)
|
(branch 2 commit: red)
|
(remaining branch 2 commits: red)
|
...days pass...
|
(master: still red)

Now, git bisect is going to point at the exact feature commit that broke the build. Ideally, the commit message will explain what was intended well enough to do another refactor and fix the bug right away.

The effect is only compounded in large projects when the maintainers didn't write all the code, so they don't necessarily remember why any given commit was done/what each branch was for. So pinpointing the exact commit (and then being able to examine the commits around it) is a great help.


That said, I (currently) still prefer merging. Rebasing onto a release branch will give you your linear history for use with git bisect, while retaining the true history for day-to-day work.

Izkata
  • 6,048
  • 6
  • 28
  • 43
  • 3
    Thanks for this explanation. Can you expand on why you still prefer merging? I know a lot of people prefer merging to rebasing in most situations, but I've never been able to understand why. – Philip Nov 19 '13 at 04:53
  • @Philip I'm a little hesitant to do so, as I've only used git for a few months on small personal projects, never on a team. But when tracking what I've done recently, being able to see the flow of commits across branches in `gitk --all` is extremely useful (especially since I usually have 3-ish branches at any given time, and switch between them each day depending on my mood at the time). – Izkata Nov 19 '13 at 04:58
  • for me, i prefer the whole history. Because one day Imay discover that what I thought was a known good baseline actually contained a change that produced a bug. But it wasn't noticed and was thought to be a good baeeline. If I am working on on a local repo, with only the rebase up to now log, i can't tell where changes to a suspect function were made. they've been rebased out of my history, allowing me to ignore them blythely, conveniently, until it's totally not convenient, because i can't find the change that produced a bug I uncovered. Hmm i dont even use git but thats my two cents – Andyz Smith Nov 19 '13 at 05:01
  • 26
    But the merge is the problem, so you *want* a merge commit as a `bisect` result. It's easy enough to find the individual commits with a `blame` or a different `bisect` if you want to dig deeper. With a linear history, the merge is done off book, so you can no longer tell a bad merge was the problem. It just looks like an idiot using an argument that doesn't exist, especially if the argument was removed several commits previous. Trying to figure out *why* he would do that is much more difficult when you destroy the history. – Karl Bielefeldt Nov 19 '13 at 13:39
  • @KarlBielefeldt so, don't rebase until you are pretty good and sure that what you have is bug-free because once you do, all the previous changes go off-book and changes that occurred in the last six months can't be distinguished from older, almost certainly perfect code. – Andyz Smith Nov 19 '13 at 13:49
  • @KarlBielefeldt Look at the very last sentence in my answer. You can have both, AFAIK. – Izkata Nov 19 '13 at 13:52
  • 4
    Really disagree with this answer. Rebasing destroys bisect's utility at times. Except for the head commits of a rebase, the commits from a rebase may not be runnable. Example: You branched at A and made B,C, D, someone pushed E. You're B and C and work but rebased on E, you're B and C are unrunnable or are runnable and don't work. Best case is you give up, worst case you think B or C contains the issue when in reality it is D. – Lan Oct 16 '14 at 16:27
  • @JohnDaniel Assuming commits B, C, and D were made in order, then B and C do contain an issue - they're no longer compatible with the pushed code. D won't be the source of that problem unless you intentionally changed the order of your 3 commits. And, as I describe at the beginning of my answer, your commit message for B should contain enough information for you to fix it in a new commit (and if necessary, `rebase -i` and squash it into B so that the commit passes again) – Izkata Oct 16 '14 at 18:09
  • @Izkata Say the issue is "The program crashes when the user moves their mouse left". Say E removes an old Mouse API. Say commit B says it changed where it called the Mouse Package and commit D says "Now using optional 2.x Mouse Package API". *And D uses it wrong*. If B can't be run, either you're debugging the wrong place or now can't use bisect since you don't know whether to go left or right (trying both directions doesn't work since C is broken too). If B can be run, you're lulled into thinking it was the change of when it called the mouse package instead of interacting with it wrong. – Lan Oct 16 '14 at 20:31
  • @JohnDaniel `And D uses it wrong` why are you ever making commits you haven't tested? That seems like the source of the confusion in your example... – Izkata Oct 16 '14 at 20:40
  • 4
    @Izkata Why are you using bisect? Because bugs/mistakes happen. It's a humbling reality. Maybe "dragging the mouse to the left" was not a part of the automated/manual testing list before and now we notice that that feature is broken but we know it used to work. – Lan Oct 17 '14 at 12:46
  • I don't see why rebasing would help here unless you're not merging regularly, in which case rebasing would also be a problem because you'd end up having to resolve conflicts for multiple commits – 0x777C Feb 02 '21 at 11:30
  • @Faissaloo My examples have nothing to do with resolving conflicts, it's more along the lines of: if a function was simplified and had an argument removed in branch A, while a simultaneous branch B started using that argument in new call in a different part of the code, there would be no failures until they were merged together (and no merge conflicts at all). Whereas in a rebase workflow, bisect would point at the specific commit in the later-applied branch where it started failing. – Izkata Feb 02 '21 at 15:17
17

In short, because merging is often another place for something to go wrong, and it only needs to go wrong once to make people very afraid of dealing with it again (once bitten twice shy, if you will).

So, let's say we're working on an new Account Management Screen, and it turns out there is a bug discovered in the New Account workflow. OK, we take two separate paths - you finish the Account Management, and I fix the bug with New Accounts. Since we are both dealing with accounts, we've been working with very similar code - perhaps we even had to adjust the same pieces of code.

Now, at this moment we have two different but fully working versions of software. We've both run a commit on our changes, we've both dutifully tested our code, and independently we are very confident we've done an awesome job. Now what?

Well, it's time to merge, but...crap, what happens now? We could very well go from two working sets of software to one, unified, horribly broken piece of newly buggy software where your Account Management doesn't work and New Accounts are broken and I don't even know if the old bug is still there.

Maybe the software was smart and it said there was a conflict and insisted we give it guidance. Well, crap - I sit down to do it and see you've added some complex code I don't immediately understand. I think it conflicts with the changes I've made...I ask you, and when you get a minute you check and you see my code that you don't understand. One or both of us have to take the time to sit down, hash out a proper merge, and possibly retest the whole dang thing to make sure we didn't break it.

Meanwhile 8 other guys are all committing code like the sadists they are, I made a few small bug fixes and submitted them before I knew we had a merge conflict, and man it sure seems like a good time to take a break, and maybe you are off for the afternoon or stuck in a meeting or whatever. Maybe I should just take a vacation. Or change careers.

And so, to escape this nightmare, some people have become very afraid of commitment (what else is new, amiright?). We're naturally risk averse in scenarios like this - unless we think we suck and are going to screw it up anyway, in which case people start acting with reckless abandon. sigh

So there you go. Yes, modern systems are designed to ease this pain, and it's supposed to be able to easily back out and rebase and debase and freebase and hanglide and all that.

But it's all more work, and we just want to push the button on the microwave and have a 4-course meal done before we have time to find a fork, and it all feels so very unfulfilling - code is work, it's productive, its meaningful, but gracefully handling a merge just doesn't count.

Programmers, as a rule, have to develop a great working memory, and then have a tendency to immediately forget all that junk and variable names and scoping as soon as they've finished the problem, and handling a merge conflict (or worse, a wrongly handled merge) is an invitation to be reminded of your mortality.

gnat
  • 21,442
  • 29
  • 112
  • 288
BrianH
  • 6,092
  • 1
  • 21
  • 23
  • 5
    Amazingly worded, sir! – Southpaw Hare Nov 18 '13 at 19:00
  • 11
    This answers why people are afraid of merging, not why merge commits are considered bad by so many. – Jay Nov 18 '13 at 23:48
  • 29
    The problem with this answer is that **this is still true for a rebase**. After all, a rebase is still doing a merge where you've changed a line of code that was already changed. I think this is an issue with the way the question was stated. – deworde Nov 18 '13 at 23:48
  • You need to pursue a career in writing, seriously. – Michael Borgwardt Nov 18 '13 at 23:59
  • 3
    I downvoted this answer because as it is still eloquent it is beyond the point, imho. – Eugene Nov 19 '13 at 00:53
  • 7
    This doesn't seem to hit on rebase vs merge-merge at all. – Andyz Smith Nov 19 '13 at 04:52
  • Never have I felt so much love with so many mixed votes :) Both are appreciated, as I do feel from a closer reading of the question that I missed some of what the OP was asking. I will revisit my answer as soon as time permits to try to fix the misses. – BrianH Nov 19 '13 at 16:31
  • A merge workflow can help with the integration problem by allowing a topic branch to be continuously integrated into another. You try that with a rebase workflow and you are up poo creek once that topic branch is rebased or squash merged. Trying to get my branch in while those 8 sadists keep forcing me to continuously rebase, repeatedly re-resolving conflicts, all to maintain that illusion of linear history, now that is what drives me to insanity. Sadly the majority force the rest of us to live out this nightmare because they don't know how to merge correctly. – steinybot Oct 30 '19 at 02:57
  • I have downvoted this as the scenario presented is about conflict resolution. Conflict resolution is not functionally different between rebase and merge. In this scenario, the situation could be avoided in a 'merge' pattern by regularly merging in the target branch into the feature branch, same as the 'rebase' pattern. – jmathew Feb 15 '21 at 01:04
7

Rebasing provides a moving branch point which simplifies the process of pushing changes back to the baseline. This allows you to treat an long running branch as if it were a local change. Without rebasing, branches accumulate changes from the baseline which will be included in the changes being merged back to baseline.

Merging leaves your baseline at the original branch point. If you merge a few weeks worth of changes from the line you branched off of, you now have a lot of changes from your branch point, many of which will be in your baseline. This makes makes it difficult to identify your changes in your branch. Pushing changes back to the baseline may generate conflicts unrelated to your changes. In the case of conflicts, it is possible to push inconsistent changes. Ongoing merges take effort to manage, and it is relatively easy to loose changes.

Rebase moves your branch point to the latest revision on your baseline. Any conflicts you encounter will be for your change(s) alone. Pushing changes is much simpler. Conflicts are dealt with in the local branch by doing an additional rebase. In the case of conflicting pushes, the last one to push their change will need to resolve the issue with their change.

BillThor
  • 6,232
  • 17
  • 17
  • 1
    so all it does is demote new changes to a level that they are considered 'the wrong thing' because **they are the only *change*** . If all the old changes are included in the merge , all changes are on a level footing, regarding whether they are 'the right thing' today. – Andyz Smith Nov 19 '13 at 03:09
  • @AndyzSmith I wouldn't consider the new changes the _wrong thing_. When trying to determine what is being changed and should be pushed, they are the _right thing_. Once you have rebased you can ignore the old changes as they are part of your baseline. – BillThor Nov 19 '13 at 04:21
  • right, so If you are not sure what is your baseline, ie, whether what you have changed already ready to be merged is 'the right thing' then don't rebase. – Andyz Smith Nov 19 '13 at 04:41
  • And, if you get a change push from someone, and it breaks the build because their function prototype doesn't match an existing prototype, you know instantly, because of the rebase, that the new guy is wrong. you do't have to guess that maybe the new guy has the right prototype and the previously, unrebased, unapplied changset changes are the ones with the wrong prototype. – Andyz Smith Nov 19 '13 at 04:45
5

The automated tools are getting better at making sure that the merging code will compile and run, thus avoiding syntactic conflicts, but they cannot guarantee the absence of logical conflicts that may be introduced by merges. So a 'successful' merge gives you a sense of false confidence, when in reality it guarantees nothing, and you have to redo all of your testing.

The real problem with branching and merging, as I see it, is that it kicks the proverbial can down the road. It lets you say "I'll just work in my own little world" for a week, and deal whatever problems come up later. But fixing bugs is always faster/cheaper when they are fresh. By the time all of the code branches start getting merged you may already forget some of the nuances of the things that were done.

Take the two aforementioned problems together, and you might find yourself in a situation where it's simpler and easier to have everyone work out of the same trunk and continuously resolve conflicts as they come up, even if they make active development a little slower.

MrFox
  • 3,398
  • 2
  • 19
  • 23
  • 5
    This answers why people are afraid of merging, not why merge commits are considered bad by so many. – Jay Nov 18 '13 at 23:49
  • So, This trunk you refer to, does that involve rebase? Expound please. – Andyz Smith Nov 19 '13 at 04:53
  • @AndyzSmith "trunk" is svn's term for its equivalent to git's "master" branch – Izkata Nov 25 '13 at 17:00
  • @Izkata so this answer advocates not using either merge or rebase then? – Andyz Smith Nov 26 '13 at 02:42
  • @AndyzSmith Yeah, that's how I read it, too. And I disagree with it; after working on a team with 7 other developers in the suggested fashion, I don't suggest it. We should have used feature branches more than we did, but most of them are afraid of merging (as BrianDHall gets into in his answer). – Izkata Nov 26 '13 at 03:21
4

Three things haven't been said in any of the answers:

  • Diffing inside a branch:

    • Diffing between an arbitrary pair of commits in a branch becomes extremely difficult when there are merge commits.
  • Merging one item at a time:

    • When you're resolving the difference between two branches, a merge typically happens all at once, and any merge conflicts you're left to do without context of which specific commit was responsible for the conflict. If you rebase, the rebase will stop at the point the conflict occurred and allow you to resolve it in that context.
  • Cleanup before push:

    • If you made a mistake commit that you later need to fix, if you've not pushed interactive rebase will allow you to combine/split/change/move commits. While you can still do that if you've merged, it becomes very difficult if you want to combine/split/change/move across a merge boundary.
Catskul
  • 160
  • 6
1

An additional relevant point is this: With rebasing I can easily cherry pick or revert a feature in my release branch.

Bruno Schäpper
  • 1,916
  • 2
  • 14
  • 24
  • Sure :-) In git flow, we regularly create new release branches. If after creation, a bug is fixed in development branch, we use `git cherry-pick -x ` to apply it to the release branch. Or, we might want to undo something for the release, with `git revert `. If `` is a merge, it gets hairy. It _is_ possible as far as i know, but not easy to do. When using rebase, every 'merge' is one giant but regular commit, easily cherr-pickable and revertable. – Bruno Schäpper Sep 20 '18 at 08:26