Are any companies moving from DVCSs to CVCSs?

Question

Are there any actual business cases that have made any company move from a DVCS to a CVCS (regardless of whether they were on a CVCS originally)?.

Other than having a closed mind and rejecting the paradigm shift altogether (for the particular case of companies coming from a CVCS) I cannot think of any cause for this happening.

Double chocolate cookie for anyone with empiric evidence.

score 12 · Accepted Answer · answered Apr 22 '12 at 09:29

12

I do Mercurial consulting and my experience is that big companies spend a lot of time up-front to investigate the pros and cons of DVCS. So when they finally take the jump, they've already been using DVCS for one or two pilot projects and so they're pretty certain that it will work for the rest of the group.

However, I do know of one example where Mercurial was tested in production but then abandoned. It was a client that do hardware design and that involves a huge number of binary files that don't compress or merge very well. We're talking about 10,000+ files, each maybe 10 MB in size. So a checkout alone is more than 100 GB in size.

Before contacting us, the client spend a lot of effort on writing an extension for Mercurial that would externalize the storage of these files. The idea was that files would be downloaded on demand when you do a checkout — not when you do a hg clone. That was the history of each file would be stored on a central storage server and the clients would only download what they really need. Very much like Subversion, but with the advantages of DVCS for all the other project files. (Mercurial now ships with a largefiles extension that implements this idea.)

However, despite their efforts, they were not able to make their extension run efficiently and the hardware team eventually went away from Mercurial and started evaluating more traditional systmes such as Perforce. I don't know all the details and I feel that we should have been able to make it work, especially by using the now-standard largefiles extension. Granted, the extension will not be perfect for 10,000 files as it is now (the per-file overhead is too big), but that is something that can be solved one problem at a time by batching queries and stream-lining things.

So my advice is:

Make sure to do extensive testing before deploying a DVCS. Find a small group of developers that think DVCS is cool and let them run a project or two.
Contact someone who knows what works well and what works less well. Above, we were only called in pretty late in the process and at that point the client had committed themselves to a non-optimal choices. There is a number of support options for Mercurial and I will encourage you to use them.

answered Apr 22 '12 at 09:29

Martin Geisler

893
5
16

1

Why would these gigantic binary files need to be in version control? Datafiles for a needed tool which do not lend themselves well to versioning? – Apr 22 '12 at 10:03
Some times binary files are not necessarily generated from compiling the code. In my company, the projects require a good ammount of binary files (navigation) which are sometimes changed. Also, in a sense, our VCS is being abused by versioning compiled binary files too that probably (need to investigate) not even the integrators use – dukeofgaming Apr 22 '12 at 19:34
@MartinGeisler Would you say largefiles is not ready for production?. Could this problem be partially solved by using subrepos?. Personally I'd just solve this by having the standard procedure for cloning (first checkout) be: *clone from a peer, pull from the blessed repo* – dukeofgaming Apr 22 '12 at 19:37
Just for clarification, they couldn't use Mercurial because a clone contained all of the history and with that many binary files the storage space is unreasonable? I'm not sure about Mercurial, but with git you can do a `git clone –depth 1` to only obtain the tip. This would make it exactly like a traditional CVCS system. – Andrew T Finnell Apr 22 '12 at 22:36
1

@ThorbjørnRavnAndersen: It was my understanding that these binary files were genuine "source" files used for chip design. It was not the usual problem where people add a ton of `.jar` files to version control because they don't use a proper artifact repository. Maybe one could have restructured the workflow so that they could use an artifact repository — but I was only brought in quite late in the process. – Martin Geisler Apr 23 '12 at 08:12
@dukeofgaming: Largefiles is a completely different extension than the one they were working on. Largefiles is used in production by several companies and Fog Creek has been pushing it as a killer feature of Kiln. Largefiles evolved from kbfiles (that's the extension used by Kiln), which evolved from bfiles. Largefiles is not pretty when you look inside, but it's the best we've got and it's being improved with every release. – Martin Geisler Apr 23 '12 at 08:15
1

@AndrewFinnell: Yes, it's the usual story with files that cannot delta-compress well: the history becomes huge and a clone will take ages. You're right that Git can make a clone of only the most recent history, but it's a cripled clone: [git-clone(1) says](http://linux.die.net/man/1/git-clone) "A shallow repository has a number of limitations (you cannot clone or fetch from it, nor push from nor into it)". My client wanted a functional clone that they could use for new development. – Martin Geisler Apr 23 '12 at 08:22
@ThorbjørnRavnAndersen, Because you want to track versions of electronic designs for exactly the same reasons you want to track versions of software code. And EDA software stores the designs in binary files which can be huge. Some EDA tools allows VCS to be as well integrated with them as IDEs do. The major difference is that having checkout preventing others to check out is far more common as merging is often problematic. – AProgrammer Apr 23 '12 at 11:33
@AProgrammer I know this perfectly fine. The question was why it was so _here_. – Apr 23 '12 at 13:46
@Martin: Can this data be compressed? I'm thinking it would take many thousands of man-years to produce 100GB of engineering output. – kevin cline Apr 23 '12 at 19:34
@kevincline: I don't know much about this area, but I can readily believe that files for chip design can grow huge if you need to specify the location of many, many transistors. – Martin Geisler Apr 23 '12 at 20:50

score 4 · Answer 2 · edited May 23 '17 at 12:40

4

The two aren't exclusive alternatives.

CVCS tools provides strong authorization features.
Some have their own built-in authentication mechanism (SVN with svnserve, RTC and its user registry, Perforce and its P4Admin): they can have their own internal user database, dedicated for their tools.

DVCS tools don't. See "Can we finally move to DVCS in Corporate Software?".
No built-in authentication and authorization mechanism.

DVCS: No authentication: While CVCS can interface themselves with external authentication sources (like an LDAP), DVCS have no choice but to interface themselves to external authentication sources, using only external processes (openssh, httpd, ... since they own internal lightweight server has no authentication contrary to a SVN svnserve).

DVCS: No authorization: If you have access to the repo path (through file, ssh or http), then you have all privileges on it (read, write, delete, ... on any branch).
If you need a finer-grained access control, you need to add an extra layer to the tool (RhodeCode for Mercurial or Gitolite for Git).

Those constraints mean that a DVCS has usually different use cases than a CVCS within a large company.
For instance, I have put in place both, with DVCS used to allow third-party enterprises to collaborate on a restricted shared code-base with the company.
Considering a DVCS allows for cloning a all repo with its complete history, that meant we had to come up with an import-export mechanism allowing us to export and publish in the DVCS only certain part of the CVCS (large) repo, in order for said third-parties to access only what they need to work on.

edited May 23 '17 at 12:40

Community

1

answered Apr 22 '12 at 09:19

VonC

2,484
2
20
19

Doesn't RhodeCode with Mercurial pretty much solve the authentication problem? – pdr Apr 22 '12 at 19:31
I was just going to comment this, RhodeCode (http://rhodecode.org/) will be part of the solution/migration/plan I'm proposing – dukeofgaming Apr 22 '12 at 19:40
@pdr: my point is: DVCS alone has no authentication (and no authorization). DVCS + an extra layer has (RhodeCode for Mercurial or Gitolite for Git) – VonC Apr 22 '12 at 19:53
True, however it is not that hard to achieve transparently (even without something like RhodeCode/Gitolite) http://blog.bleathem.ca/2010/04/from-svn-to-mercurial-hg-rises.html#ldap – dukeofgaming Apr 22 '12 at 20:04
@dukeofgaming you achieved authentication. This is not authorization ("process of asking what you want to do and deciding if you're allowed to do it or not"). – VonC Apr 22 '12 at 20:34
Yep, you're right. BTW how did it work out?, putting in place a DVCS for external collaborators – dukeofgaming Apr 22 '12 at 20:47
"DVCS tools don't". Depends on the implementation. – Apr 23 '12 at 08:19
@ThorbjørnRavnAndersen: nope: by their very nature (ie distributed), they distribute data, not acl or id. You need to add a front-end to listen to your request (and do authentication), plus add an extra layer to manage authorization (for a given server). – VonC Apr 23 '12 at 08:20
@VonC Perhaps we talk across each other. For instance, important git repositories can be put behind ssh, and ssh can authorize against LDAP or similar. Are you thinking of something else than that? – Apr 23 '12 at 08:25
@ThorbjørnRavnAndersen my only point is Git or Mercurial ***alone*** have no authorization or authentication in them. You need to activate an *extra* process for authentication alone (openssh or httpd). If you want LDAP, you need to add your libnss-ldap (for SSH http://www.404blog.net/?p=38), or declare your LDAP in your `httpd.conf`. And that is just authentication. It doesn't address authorization. – VonC Apr 23 '12 at 08:59
@VonC: "my only point is Git or Mercurial alone have no authorization or authentication in them." That is not implicit in your original comment, which is why you've had so much reaction to it. Your implication is that by choosing a DVCS, you have to do without authentication. I would argue that wrapping RhodeCode around Hg is much easier than hooking SVN into LDAP via Apache, so if you don't want yet another password to remember, Hg might be the better choice. – pdr Apr 23 '12 at 09:10
@pdr: I don't follow: what part of "DVCS alone has no authentication (and no authorization). DVCS + an extra layer has (RhodeCode for Mercurial or Gitolite for Git) " wasn't clear? – VonC Apr 23 '12 at 09:12
Your **original** comment. Or maybe I should say "answer". "CVCS tools provides strong authorization and authentication features, faciliting their integration with the existing LDAP ... DVCS tools don't." Both parts of this are misleading at best, untrue at worst. SVN does not hook into LDAP without a wrapper any more than Hg does. – pdr Apr 23 '12 at 09:37
@pdr you were right, my initial *answer* was incomplete at best, and I have edited it. My initial intent stands though: no internal authentication and authorization for DVCS. They need more work to be integrated in a large company environment. – VonC Apr 23 '12 at 11:09
Remember that comments are intended to help the author of the answer to improve his/her answer, not for extended discussion. Now VonC has edited his answer to address some of the points raised in these comments, then they could be cleaned up (i.e. deleted), to make way for comments on the current answer. – Mark Booth Apr 23 '12 at 12:30
I'm not at all sure that the kind of file-level access control being promoted is a reasonable thing to want in a version control system. What's actually important is who has the repository copy from which releases are cut (and deployments done), especially as it's pretty simple to keep some changes local to that system. (Some DVCSs have authN/authZ built in, though each user can — usually — configure them independently for their own system, so the meaning is different to in a CVCS.) – Donal Fellows Apr 23 '12 at 13:03
@DonalFellows: fine: file level access removed from the answer. "Some DVCSs have authN/authZ built in"? Not the main ones (Git, Mercurial). And we are not talking about authN/authZ "for each user". In a large company, "authN/authZ" is understood at the level of a centralized server, with backup and DR (Disastry Recovery) services. – VonC Apr 23 '12 at 13:11
Please refrain from extended discussions in comments. If you would like to discuss this answer further then please do so in chat. Thank you. – maple_shaft Apr 23 '12 at 15:01
@maple_shaft chat is blocked at work. – VonC Apr 23 '12 at 15:44
I forgot to mention the ACL extension which comes bundled with mercurial: http://mercurial.selenic.com/wiki/AclExtension – dukeofgaming Apr 23 '12 at 21:30

Are any companies moving from DVCSs to CVCSs?

2 Answers2