23

Possible Duplicate:
When should the VCS history of a project be deleted?

I am experienced using svn and recently started learning git. I was quite shocked to learn that git has features that allow you to "rewrite history".

Coming from svn, I had accepted as sacrosanct the source control principle that once a change is committed, there is no way to "undo" the commit itself... the most one can do is execute a later commit that effectively "reverses" the changes made in the earlier commit, but one can always reproduce the state of any commit that had been done in the past.

What is the rationale for changing this "principle"? Is there something different about git that makes this OK, or does the design of git reflect a different "philosophy" of source control that differs with the "philosophy" of svn?

JoelFan
  • 7,025
  • 4
  • 38
  • 53
  • 14
    git follows the Marxist approach to history, allowing a bit of revisionism here and there... – yannis Jan 16 '13 at 15:54
  • 1
    for an example of when this may be needed, see: [Open sourcing an internal project that has confidential information in the old revisions.](http://programmers.stackexchange.com/q/181933/31260) – gnat Jan 16 '13 at 16:06
  • 3
    I just thought of another reason... suppose you accidentally commit a HUGE video file... why should the repo have to carry around that baggage? – JoelFan Jan 16 '13 at 16:31
  • 1
    @JoelFan: Yes, and that's a bigger issue for Git (and other distributed VCSs) than for centralized systems, since a huge video file imposes that burden on everyone who clones the repository. – Keith Thompson Jan 16 '13 at 18:09
  • 3
    Really disagree on the close... "exact duplicate"? Really? I'm asking about the design decisions (and design philosophy) going into svn vs. git and the other question is asking about whether to delete history or not. My question got some very interesting answers that had nothing to do with whether to delete history. No comparison in my opinion. Disappointed! – JoelFan Jan 16 '13 at 20:16
  • This is really really not an exact duplicate as marked, did @gnat walter et al actually read this question or were they on a closing binge? – Toby Allen Jan 26 '13 at 22:59
  • @JoelFan as of now it looks like answers in duplicate address your question as well, that's why closure. If you believe this is not the case, consider editing your questions to help readers understand why this is so - after that, flag or vote to re-open to bring it to attention of moderators and community – gnat Jan 27 '13 at 13:26

3 Answers3

18

The most important reason is security. Suppose you accidentally check in a file that contains a cleartext password, or some other sensitive information. Even if you check in a new version with the information deleted, it's still visible in the history.

It can also be useful sometimes to make the history "cleaner". For example, when you're working in your own repository, you might check in multiple small changes as you experiment to see what works. When you finally get the change working, you might not want to push all those changes to another repository -- so you can collapse them into a single change. As you're developing in your private sandbox, you probably want a very detailed history of what changes you've made, and the ability to go back to previous versions. Once you've finally solved the problem, and the fix turns out to be, say, a self-explanatory one-line change, there's no need to push all that history to another repo (perhaps one that acts as the central repo for the project).

Keith Thompson
  • 6,402
  • 2
  • 29
  • 35
  • 2
    Undisciplined code-monkey with the chaos in head and fragmented thinking is not an excuse – Lazy Badger Jan 16 '13 at 16:16
  • 7
    @LazyBadger: Not an excuse for what? Mistakes happen. Git gives you ways to correct them. If I accidentally check in a file with sensitive information, what would you suggest I do about it? – Keith Thompson Jan 16 '13 at 16:27
  • 1
    Not excuse for **not using brain-power before instead of after** (excuse me my poor English). And there is a correction of and there is manipulation of (juggling of). – Lazy Badger Jan 16 '13 at 17:15
  • 10
    SCM tools are there to help me do my job, not punish me for failing to meditate long enough before pressing return. Besides, if you never make mistakes, presumably you don't need one in the first place. – Useless Jan 16 '13 at 18:05
  • @LazyBadger: So if you never make any mistakes in the first place, there's no need for a mechanism to correct them. How's that working out for you? – Keith Thompson Jan 16 '13 at 18:13
  • 2
    @LazyBadger I doesn't have to be a mistake. Let's say I have a personal project, which grows into something that others could benefit from. I want to publish that project with its history (mostly) intact but the project also contains some personal information (eg. a poem to a loved one in all header files) Rewriting history helps you cleanup regardless of the intent. For me it's a feature that grows naturally from the distributed nature but it also helps you keeping things private (and maybe ,as an extension, secure) – oschrenk Jan 16 '13 at 18:48
  • @LazyBadger: If the mistake is putting sensitive information into the history, perhaps information that you're not legally permitted to disclose, then you *need* to be able to remove that information. Even if Git didn't permit this, nothing prevents you from creating a new repo from scratch, applying all the changes one at a time excluding the sensitive information, and then publishing the resulting repo. Git just gives you an easier way to do that. **There are times when leaving information history doesn't fix the mistake.** – Keith Thompson Jan 16 '13 at 19:01
  • @LazyBadger: If your advice is to avoid *certain kinds* of mistakes, even that's not always practical. – Keith Thompson Jan 16 '13 at 19:02
  • 1
    @LazyBadger, you realize that git is a distributed version control system, right? That it's primary use case is precisely for when you need to allow lesser humans to collaborate with your giant throbbing brain? This functionality is to allow you to clean up *other peoples' mistakes* when they inevitably make them (because they're mostly "monkeys with grenades"). Think of it that way and maybe you'll feel better. – Carson63000 Jan 16 '13 at 19:57
11

Git follows the principle of not cementing policy into the code, that flexibility in the hands of highly technical users is more important than restricting them from making mistakes. Rewriting history is impossible to do in git without people noticing, as the SHA-1 hash for a commit and all subsequent commits will change. This is actually an improvement from svn, which doesn't make it easy to rewrite history, but does make it possible, and makes it difficult to detect when you do.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
2

CVCS and DVCS have big difference not only in first letter of abbreviations, but in core principles: for CVCS published commit is common shared history, committed but not pushed commit in DVCS - local personal history, which doesn't affect anybody except author.

In latter case mutations of history are problem only of the one person, responsible for *their own mess and chaotic way of working.

And, at last, it's just another style, style with the rights to life also. "Practice is the criterion of truth", thus: if rewriting is not thrown away, nothing prevents to use it in tool, there it can be done easy.

Lazy Badger
  • 1,935
  • 12
  • 16