How to decide git repo division with workflow considerations?

Question

We are moving from subversion to git for branching and distributed development reasons. One struggle in the past with our old repository workflow was that every application had its own repository. A huge application had its own repository and a small application that had 15 files and barely ever changed had its own. Most of the applications are inter-dependent at least on the db side, if not sharing code, as part of our overall product suite. There are just a couple primary application but they are becoming more and more integrated and may soon be using the same api for all of their "backend" requests.

As I come up with a proposal for how to move forward using git, repo organization and workflow, I am trying to decide between a few choices...

monolithic: one primary repository (1 repo)
independent: a separate repository for nearly every application/library (total ~20) plus main repo with submodules
compromise: one repository per major application, one for all minor apps and one for libraries (~4 repos) plus main repo with submodules

I initially started with the monolithic approach with the git workflow and felt comfortable with that approach but the more I read I see this is not the best practice for git. I am now evaluating the separate repo and submodules approach, however it adds many steps to the workflow not directly supported in repo systems (ex: pull requests across multiple repositories) and appears to be prone to problems in the main repository if not careful with order of operations.

I'm starting to lean towards the compromise approach however the fact that I've done some testing with the monolithic approach and not seeing any problems makes me think that in trying to follow best practices I'll actually be making our workflow more complex, require more time to deal with pulls and releases, and more prone to errors.

Am I understanding the best practice of "one repo per project" incorrectly? Should our codebase of inter-related applications be considered one project thus one repo?

Thoughts?

also see http://gregoryszorc.com/blog/2014/09/09/on-monolithic-repositories/ and https://news.ycombinator.com/item?id=10007654 — stijn, Aug 11 '15 at 13:56
see http://programmers.stackexchange.com/questions/161293/choosing-between-single-or-multiple-projects-in-a-git-repository — Winston Ewert, Aug 11 '15 at 14:14

score 1 · Answer 1 · answered Aug 11 '15 at 15:59

Most of the time, if you think you need submodules to keep dependent versions between your separate repos in sync, that's a sign you'd be better off with a single repository. Separate repos should usually be highly independent, and be able to be updated and released without worrying about the versions of other projects.

Even with all the projects in one logical repo, git's distributed nature means you can still use separate physical repos if that makes sense for your organization. For example, the Linux kernel has one big logical repo, where everything eventually gets merged, but the various subsystems have their own separate physical repos where the day to day work gets done.

Submodules are mostly intended for situations where you have a strong dependency on third-party code that has no reciprocal dependency. For example, if your application depended heavily on specific bleeding-edge versions of a widely-used library like numpy, to which you contribute regular patches, but otherwise they don't know your application exists.

How to decide git repo division with workflow considerations?

1 Answers1