13

Currently my company has a Visual Studio solition in an SVN repo that's organized as follows:

SolutionFolder (~3.5 GB)
|-> SolutionName.sln
|-> .. Some source code folders... (~250 MB)
|-> ThirdParty (~3 GB)
|-> Tools
    | -> Tool1
    | -> Tool2

Tool1 and Tool2 are build independently (have their own solutions), but produce executables that are used in the main build. ThirdParty folder contains all dependencies for the project, including some pre-compiled 100+ MB .lib files and large libraries like boost.

It's convenient to have it all in one SVN repo so that (1) developer has to do only one check-out, and (2) we don't need to keep track which versions of dependencies we need for each version of the build. On the flip side, it takes a while to check out this repo.

What would be the best way to move this project structure to git? Presumably it's best to exclude ThirdParty and possibly Tools from the main repo, but we'd like to keep ThirdParty easily downloadable in one step, and we like it versioned (and version mismatches between main repo and ThirdParty/Tools would be bad).

At this point I'm not interested in preserving history, just in figuring out how to organize such project.

ikh
  • 928
  • 7
  • 15
  • Are those sizes above the sizes within the repos, including history, or are those the sizes of the local working copy? – Doc Brown Jan 06 '14 at 06:58
  • 1
    @DocBrown just the local working copy, does not include history. – ikh Jan 06 '14 at 20:01

4 Answers4

10

Use the proper tool for the job. In Windows, that means

Use NuGet for third-party dependencies

That way, you keep the third-party dependencies in a versioned way, but you will not bloat your repository with unneeded stuff. Checkouts are much faster, and the project is organized as it should be. You can enable an option in Visual Studio so it always downloads all dependencies automatically.

Of course you can use a solution that just uses git (another repo, submodules etc), but that's just hacks. Doing it the right way will pay off quickly, and leave you with a future-proof system.

Edit after comments: The best way to use NuGet is to set up a local NuGet source, either on a shared drive, or a full nuget server. Setup shouldn't take more than a few minutes either way. That way, you can guarantee that all the packages you need are always available, no matter where they originated.

Wilbert
  • 1,703
  • 1
  • 12
  • 16
  • Does NuGet support command line builds? I am always looking for a portable build that I can get Jenkins to build and test for me. Does NuGet support CI servers like Jenkins? – uncletall Jan 07 '14 at 00:46
  • One more thought, how long do you need to support your product? If you need to provide support for a very long time I would not count on the correct version of your thirdparty libs to be available in NuGet. You might get into very big problems relying on tools like NuGet to get the correct combination of thirdparty tools, even in 2-3 years from now. – uncletall Jan 07 '14 at 00:51
  • 3
    @uncletall: yes, NuGet has a complete command line interface. And the idea is to setup a local NuGet repository, which may just be a folder on a network share (called "feed", http://docs.nuget.org/docs/creating-packages/hosting-your-own-nuget-feeds) – Doc Brown Jan 07 '14 at 06:29
  • Yes, I assumed of course that you use a local mirror. I will update the answer. – Wilbert Jan 07 '14 at 09:10
  • @uncelatall we may need to support this for up to a few years, so this is indeed a concern. Another concern is specific external dependency not being in NuGet at all, as well as matching external dependency version to a past build version. – ikh Jan 07 '14 at 16:02
  • 2
    @ikh it's quite simple and straight-forward to build nuget packages for external dependencies. I needed about half a day to package 9 dependencies with 50 dlls, having never done it before. – Wilbert Jan 07 '14 at 16:27
5

You can use submodules for the tools. That way you can keep them in a subdirectory like you do now, and use a separate repo for versioning them. That also means you could clone(checkout) the tools and develop them separately, and that other projects could rely on those repos - and on specific, udateable versions of them too.

You could also use submodules for the third party libraries, but if at all possible I would recommend using a dependency manager for those.

Idan Arye
  • 12,032
  • 31
  • 40
4

The entities that you turn into git repositories are necessarily the entities that you version and branch; if SolutionFolder/Tools/Tool1 corresponds to one such thing, that's the level of entity. This is because git regards the entire state of the directory tree to be the versionable entity, whereas with svn it is possible (even if not a good idea) to have a trunk, branches and tags anywhere within the tree.

Derived artefacts should not be kept in the repository, nor should external libraries. There are better ways to handle those. (If you're working with Java, consider using a private Maven repository; they're comparatively easy to work with, and integrate nicely with many other things.)

If you're used to a workflow that has everything in one repo for ease of checkout, consider having a script that sets things up instead.

Donal Fellows
  • 6,347
  • 25
  • 35
  • What are the options for managing external libraries? We work on Visual Studio with C++ and C#, so Maven doesn't look like a good fit. The main issue here is that having the `ThirdParty` folder in the repo is so damn convenient, and it's hard to come up with good alternative. – ikh Jan 05 '14 at 22:55
  • 2
    @ikh: In a Visual Studio environment, you would typically use Nuget for this, http://docs.nuget.org, which is already included in VS 2012 and newer versions. – Doc Brown Jan 06 '14 at 06:54
2

To be honest I wouldn't change anything in your setup. It is exactly what we are doing now. I was playing around with setting up an separate git repository to handle the thirdparty lib that we use but I don't think it weighs up to the cost in portability. Now any developer can just checkout and get started without having to do any manual setup steps. And I any build server/slave can build the project. Unless you have multi repos sharing the thridparty tools I would just stick with your current setup.

What I did play around with was setting up the thirdparty tools in a separate repo. Then I had one simple batch script read a text file with an sha1 ref and checkout the correct version. This would allow me to have different thirdparty versions for different projects. I got this idea from the Facebook Buck build tool. But in the end many developers don't like to use command line tools (MS VC shop here) so I gave up on the idea.

One major reason why not to download your thirdparty libs when you require them (using NuGet) is that if you need to support your product for a long time. In my industry, we need to sometime provide updates for old versions that rely on old thirdparty libs. We do not want to spend lots of time sorting out which libs we can upgrade or not and just use the libs as used in that version. Now imagine you use NuGet, oops... the latest version of the lib you require is 3.98 but you need 2.04..... how to explain to your boss that you need to spend 2 months to upgrade the old version to be able to use the latest libs when he was expecting a small change!

uncletall
  • 159
  • 3
  • 3
    Though I gave you a +1, since "leave everything as it is" is a pragmatic solution, I think "multiple repos" may be not the only problem. DVCS like Git encourage to have multiple local branches, and in each branch a complete local copy of everything. So this may lead to having the same big third party library (typically the same version!) multiple times as a local copy. This may be feasible in some situations, in others I can imagine that this will have a negative impact on the performance of branching and merging. – Doc Brown Jan 06 '14 at 07:04
  • As far as I know, a branch is a very cheap operation in Git that will only create a pointer and take almost zero space. – uncletall Jan 07 '14 at 00:43
  • 1
    @DocBrown related: [What does “branching is free” mean in Git?](http://programmers.stackexchange.com/questions/202432/what-does-branching-is-free-mean-in-git) –  Jan 07 '14 at 01:05
  • Unless I am missing something, branches are "free" in Git. I just checked my .git/refs/heads and all branches are 1KB text files, the .git/logs/refs/head contains the logs where the biggest is 11KB for the master.. My normal project structure is around 500MB in code, thirdparty libs and other tools. I am very happy to take the 1KB hit for creating a branch – uncletall Jan 07 '14 at 01:12
  • 1
    @MichaelT: branching itself is free, of course, but I am talking of the situation where you have multiple *working copies* of different branches on your local workstation in parallel. And if you check the comments below the original question, the OP was referring to 3GB of third party tools as the size of the working copy. – Doc Brown Jan 07 '14 at 06:19
  • @Doc Brown: Even using a package manager you will still end up downloading those 3GB into your working copy right? NuGet defaults to ./packages and each solution will have this so there is no difference to having it in your repository as each working copy will end up having these 3GB – uncletall Jan 07 '14 at 07:08
  • @uncletall: using a package manager, its much easier to organize things so you have the 3GB *just once* on your local harddrive, and not "per local working copy". The package manager does not download the third party libs into your working folder structure, it installs them in a central place (for example, the global assembly cache). Of course, if each of the working copies needs different versions of the third party libs, then you end up with the same situation as before, but for a workflow with a lot of different branches sharing the same version of third party libs ... – Doc Brown Jan 07 '14 at 07:22
  • (... which I consider to be the much more frequent case) the 3GB don't have to be installed again and again once they are already on the machine. Of course, one could homegrow such a functionality using a separate centralized git-repository for the third-party libs and some scripts, but in the end you will just reinvent the wheel and recreate the NuGet functionality by yourself that way. – Doc Brown Jan 07 '14 at 07:24
  • @Doc Brown: Indeed, better not to reinvent the wheel. I was assuming that NuGet will download the packages inside your working folder like it does when you use it inside MS VC. I guess there is some way to configure it to use a central location – uncletall Jan 07 '14 at 08:21
  • @uncletall: NuGet (and other package managers) are optimized for packages which change not too often, providing them in a versioned way on your local machine where you just *use* them (but don't want to change them), no difference between binary and text. Git (and other VCS) are optimized for providing you with the artifacts you want to change, frequently, mostly text files. As long as you have more artifacts than third party libs, you probably don't need a package manager, but in a situation like the one in this question, a package manager makes sense. – Doc Brown Jan 07 '14 at 08:39