282

In a git environment, where we have modularized most projects, we're facing the one project per repository or multiple projects per repository design issue. Let's consider a modularized project:

myProject/
   +-- gui
   +-- core
   +-- api
   +-- implA
   +-- implB

Today we're having one project per repository. It gives freedom to

  • release individual components
  • tag individual components

But it's also cumbersome to branch components as often branching api requires equivalent branches in core, and perhaps other components.

Given we want to release individual components can we still get the similar flexibility by utilizing a multiple projects per repository design.

What experiences are there and how/why did you address these issues?

Johan Sjöberg
  • 3,007
  • 4
  • 16
  • 7
  • 1
    I have a very similar issue right now. I need to release different versions of a project so they will need to be in different repositories. This is a nightmare to manage though. It would be great if there was a way to branch just sub directories. – Andrew T Finnell Aug 18 '12 at 12:09
  • 1
    Each module need to have separate version numbers. And we use `git-describe`. – linquize Sep 02 '13 at 09:26
  • possible duplicate of [Using multiple Git repositories instead of a single one containing many apps from different teams?](http://programmers.stackexchange.com/questions/206668/using-multiple-git-repositories-instead-of-a-single-one-containing-many-apps-fro) – gnat Nov 18 '13 at 03:51
  • http://stackoverflow.com/a/29665889 – Joel Purra Jun 11 '15 at 16:00
  • I'm surprised to see that Bit (https://bitsrc.io) and Lerna (https://github.com/lerna/lerna) are not mentioned! You can learn more here: https://hackernoon.com/5-practical-ways-to-share-code-from-npm-to-lerna-and-bit-732f2a4db512 – JoniS Apr 02 '18 at 17:33
  • 3
    You say "one project per repository" and then you list *one* project (named `myProject`) with multiple folders. But then you are talking about branching folders `api` and `core` as if they were respositories rather than folders. – Jack Miller Aug 17 '18 at 09:52

6 Answers6

249

There are three major disadvantages to "one project per repository", the way you've described it above. These are less true if they are truly distinct projects, but from the sounds of it changes to one often require changes to another, which can really exacerbate these problems:

  1. It's harder to discover when bugs were introduced. Tools like git bisect become much more difficult to use when you fracture your repository into sub-repositories. It's possible, it's just not as easy, meaning bug-hunting in times of crisis is that much harder.
  2. Tracking the entire history of a feature is much more difficult. History traversing commands like git log just don't output history as meaningfully with fractured repository structures. You can get some useful output with submodules or subtrees, or through other scriptable methods, but it's just not the same as typing tig --grep=<caseID> or git log --grep=<caseID> and scanning all the commits you care about. Your history becomes harder to understand, which makes it less useful when you really need it.
  3. New developers spend more time learning the Version Control's structure before they can start coding. Every new job requires picking up procedures, but fracturing a project repository means they have to pick up the VC structure in addition the code's architecture. In my experience, this is particularly difficult for developers new to git who come from more traditional, centralized shops that use a single repository.

In the end, it's an opportunity cost calculation. At one former employer, we had our primary application divided into 35 different sub-repositories. On top of them we used a complicated set of scripts to search history, make sure state (i.e. production vs. development branches) was the same across them, and deploy them individually or en masse.

It was just too much; too much for us at least. The management overhead made our features less nimble, made deployments much harder, made teaching new devs take too much time, and by the end of it, we could barely recall why we fractured the repository in the first place. One beautiful spring day, I spent $10 for an afternoon of cluster compute time in EC2. I wove the repos back together with a couple dozen git filter-branch calls. We never looked back.

Christopher
  • 2,826
  • 1
  • 13
  • 8
  • 16
    As an off topic aside, there are few more enjoyable things as a repository manager than purchasing time on a system that can do in two hours what your laptop couldn't do in 20, for less than the price of lunch. Sometimes I really love the internet. – Christopher Aug 17 '12 at 15:47
  • 4
    How would you release those individual projects as separate releases? Or do you never need to do that? That is the problem I have. With if you need to create a V1 of Project A, and V2 of Project B. – Andrew T Finnell Aug 18 '12 at 12:16
  • 7
    For moving between the "one project per repo" and "multiple repos" consider git-subtree (good explanation at http://stackoverflow.com/a/17864475/15585) – deterb Dec 06 '13 at 20:30
  • 1
    I wrote a script to automate this for common use cases: https://github.com/Oakleon/git-join-repos – chrishiestand Oct 17 '15 at 00:18
  • What is a "VC structure?" – Robert Harvey Mar 22 '18 at 18:19
  • @RobertHarvey My guess is that "VC" means "Version Control". – Eric King Mar 22 '18 at 21:23
  • @RobertHarvey -- yeah it means "version control" in this context – Christopher Apr 27 '18 at 17:12
  • Is there any way to update the first link in the answer? Seems broken – D. Ben Knoble May 16 '19 at 19:13
  • @D.BenKnoble -- unfortunately the link looks dead and I can't find a suitable replacement. I've removed the reference. Basically, you need to `git submodule update` after each bisection. – Christopher Jan 07 '20 at 14:51
  • Things get a lot easier if you stop caring about the repository history. Only thing what "git bisect" does is to find out which developer to tell "You broke this!", we don't assign blame. If there is a bug, we'll fix it and move on. – Calmarius Feb 13 '20 at 11:15
  • 4
    @Calmarius - The commit message is more important than the user. History tells you why changes were made. It's more difficult to remove potentially dead code if you don't know why it was added in the first place. – Nathan Kovner Apr 06 '20 at 19:58
92

Christopher did a very good job of enumerating the disadvantages of a one-project-per-repository model. I would like to discuss some of the reasons you might consider a multiple-repository approach. In many environments I have worked in, a multi-repository approach has been a reasonable solution, but the decision of how many repositories to have, and where to make the cuts has not always been an easy one to make.

In my current position, I migrated a behemoth single-repository CVS repository with over ten years of history into a number of git repositories. Since that initial decision, the number of repositories has grown (through the actions of other teams), to the point where I suspect we have more than would be optimal. Some new-hires have suggested merging the repositories but I have argued against it. The Wayland project has a similar experience. In a talk I saw recently, they had, at one point, over 200 git repositories, for which the lead apologized. Looking at their website, I see now they are at 5, which seems reasonable. It's important to observe that joining and splitting repositories is a manageable task, and it's okay to experiment (within reason).

So when might you want multiple repositories?

  1. A single repository would be too large to be efficient.
  2. Your repositories are loosely coupled, or decoupled.
  3. A developer typically only needs one, or a small subset of your repositories to develop.
  4. You typically want to develop the repositories independently, and only need to synchronize them occasionally.
  5. You want to encourage more modularity.
  6. Different teams work on different repositories.

Points 2 and 3 are only significant if point 1 holds. By splitting our repositories, I significantly decreased the delays suffered by our offsite colleagues, reduced disk consumption, and improved network traffic.

4 and 5 are more subtle. When you split the repos of say a client and server, this makes it more costly to coordinate changes between the client and server code. This can be a positive, in that encourages a decoupled interface between the two.

Even with the downsides of multi-repository projects, a lot of respectable work is done that way -- wayland and boost come to mind. I don't believe a consensus regarding best practices has evolved yet, and some judgement is required. Tools for working with multiple repositories (git-subtree, git-submodule and others) are still being developed and experimented with. My advice is to experiment and be pragmatic.

Stevoisiak
  • 1,264
  • 1
  • 11
  • 20
Spacemoose
  • 1,079
  • 7
  • 6
  • 11
    This answer would be even more helpful with a reference to support the claim: "joining and splitting repositories is a manageable task." – Wildcard Nov 30 '15 at 16:06
  • 8
    Multiple repos can also work against modularity because they make it harder to change shared code. Cross-repo dependencies make integration harder, can break code more easily (even if you have good tooling to check this) and the threat of breaking out-of-repo code discourages refactoring interfaces, which is one of your most powerful tools to make things more modular. – cjs Apr 20 '18 at 00:15
  • Everything about MicroServices and DDD design holds here. You should minimise shared code. – Arwin Feb 28 '19 at 14:01
  • Looking at this a few years later and having some migrations to Microservices behind my belt, I firmly stand behind this. Only coupling that now exists is through client packages that are used to communicate to microservices, which is completely transparent. Tiny repositories, lean pipelines, modular business logic that allows us to decide what to host together in a service, etc. Way better than any large repo setup I've ever worked with previously, except for some really small projects perhaps. – Arwin Sep 22 '21 at 13:57
58

As we use GitHub, we actually have multiple projects in one repo but ensure that those projects/modules are properly modularised (we use -api and -core conventions + Maven + static and runtime checking and might even go to OSGi one day to boot).

What does it save on? Well we don't have to issue multiple Pull Requests if we're changing something small across multiple projects. Issues and Wiki are kept centralised etc.

We still treat each module/project as a proper independent project and build and integrate them separately in our CI server etc.

Martijn Verburg
  • 22,006
  • 1
  • 49
  • 81
  • 2
    Very interesting. I'd suspect this is a common model on github. If you face individual component releases, do you employ something like `submodules` or release/tag the entire repository? – Johan Sjöberg Aug 17 '12 at 14:09
  • submodules if we have to but for now we version from the parent down. – Martijn Verburg Aug 17 '12 at 16:40
  • At my current employer we use a similar strategy, and package metadata about the most recent commit in a project into the various manifest files of artifacts (i.e. the results of `git log -1 -- `). It's really quite great. This answer deserves more upvotes. – Christopher Feb 22 '14 at 04:30
  • "What does it save on? Well we don't have to issue multiple Pull Requests if we're changing something small across multiple projects." - could you give an example of this? Intuitively I would say this is a sign that you didn't successfully ensure proper modularization ... – Arwin Mar 30 '20 at 08:59
  • @Arwin it depends on your design. If your modules are vertically sliced along microservices then you're right, modules can and should? match each service. But for more traditional horizontally sliced apps (client, processing, data layers) then you need to update each separately – Martijn Verburg Mar 31 '20 at 12:16
  • 1
    @MartijnVerburg well then we agree. If you practically have a monolith and lots of moving parts that cannot be meaningfully compartmentalized in say, versioned NuGet packages etc, then it can be better to throw everything together. But then a mono-repo works too. However, in your answer you say "n one repo but ensure that those projects/modules are properly modularised" and then I don't get it. – Arwin Apr 04 '20 at 12:32
29

For me, the main difference in using one or more than one repository are the answers to the following questions:

  • Are the multiple parts developed by the same team, have the same release cycle, the same customer? Then there are less reasons to split the one repository.
  • Are the multiple parts highly dependent on each other? So splitting model, controller and UI (even when they are different parts) is not very sensible, due to the high dependency on each other. But if 2 parts only have a small dependency, which is implemented by a stable interface that is only changed every few years, so it would be wise to divide the 2 parts in 2 repositories.

Just as an example, I have a small application (client only), that checks the "quality" of a Subversion repository. There is the core implementation, that could be started from the command line, and works well with Java 6. But I have started to implement a UI, that uses JavaFX as part of Java 8. So I have split the 2, and created a second repository (with a second build process), with different schedule, ...

I like the answers above (voted them up), but I think they are not the whole true story. So I wanted to add the arguments for splitting repositories as well. So the real answer (when to split) may be somewhere in the middle ...

mliebelt
  • 583
  • 5
  • 7
4

It might be that git-subtree (see Atlassian blog, medium blog, or kernel link) would be a good fit for that you have. So, each of your top level project would use a set of subtree at possibly different version(s).

1

From your example, the repositories should be setup in terms of how interdependent they are. All the reasoning about designing MicroServices and Domain Driven Design apply here: in some cases duplicate code is acceptable, work with interfaces, don't break compatibility unless you really have to, etc.

Now in my view a UI should be independent of the backend. So a UI project repository should typically contain the UI code and the Client Controller. The Client Controller will connect with Service Controllers in an abstract manner. They will use a service client/api abstraction that is versioned separately from the service, so that a service can be updated without breaking the client(s) (there could be several different clients).

So a service itself should be its own repository. In my view, the service is just a wrapper of some single-point-of-thruth business logic. So the business logic should typically be separate from the service technology that hosts it. On the other hand, the repository implementation is typically so tightly connected to the business logic, that this could be integrated in the same repository. But even there your mileage may vary.

Of course, simple projects that are unlikely to change much in terms of technology or supporting multiple stacks, where all UI can be hosted from the same source as the backend and the backend services are typically only used by that same client, can benefit from more tightly integrated repositories.

In that case you would probably be fine with just having the full vertical in one repository, and focus on just making sure your functional domains are properly stand-alone in their own repository. You then still have most advantages of smaller repositories, and little overhead otherwise.

Arwin
  • 157
  • 1