In git, how to do versioning for a dozen libraries all worked at in parallel

Question

We are doing projects, but we reuse a lot of code between the projects and have lots of libraries that contain our common code. As we implement new projects we find more ways to factor out common code and put it into libraries. The libraries depend on each other, and the projects depend on the libraries. Each project, and all libraries used in that project, need to use the same version of all the libraries they are referring to. If we release a piece of software we will have to fix bugs and maybe add new features for many years, sometimes for decades. We have about a dozen libraries, changes often cut across more than two, and several teams work on several projects in parallel, making concurrent changes to all these libraries.

We have recently switched to git and set up repositories for each library and each project. We use stash as a common repository, do new stuff on feature branches, then make pull requests and merge them only after review.

Many of the issues we have to deal with in projects requires us to do changes across several libraries and the project's specific code. These often include changes of library interfaces, some of which are incompatible. (If you think this sounds fishy: We interface with hardware, and hide specific hardware behind generic interfaces. Almost each time we integrate some other vendor's hardware we run into cases our current interfaces did not anticipate, and so have to refine them.) For example, imagine a project P1 using the libraries L1, L2, and L3. L1 also uses L2 and L3, and L2 uses L3 as well. The dependency graph looks like this:

   <-------L1<--+
P1 <----+  ^    |
   <-+  |  |    |
     |  +--L2   |
     |     ^    |
     |     |    |
     +-----L3---+

Now imagine a feature for this project requires changes in P1 and L3 which change the interface of L3. Now add projects P2 and P3 into the mix, which also refer to these libraries. We cannot afford to switch them all to the new interface, run all the tests, and deploy the new software. So what's the alternative?

implement the new interface in L3
make a pull request for L3 and wait for the review
merge the change
create a new release of L3
start working on the feature in P1 by making it refer to L3's new release, then implement the feature on P1's feature branch
make a pull request, have this reviewed, and merged

(I just noticed that I forgot to switch L1 and L2 to the new release. And I don't even know where to stick this in, because it would need to be done in parallel with P1...)

This is a tedious, error-prone, and very long process to implement this feature, it requires to independent reviews (which makes it much harder to review), does not scale at all, and is likely to put us out of business because we get so bogged down in process we never get anything done.

But how do we employ branching and tagging in order to create a process that allows us to implement new features in new projects without too much overhead?

Changing your tooling shouldn't affect the processes you have in place too much. So, how were you dealing with this problem before you switched to git? — Bart van Ingen Schenau, Jun 10 '15 at 15:39
Is it possible to simply add a new method to the interface without breaking the existing one when too many libraries depend on it? Normally that's not the best idea but at least it would let you "get on with" implementing the new feature and you can properly deprecate the old method whenever there's a spare moment. Or are these interfaces too stateful for "parallel interfaces" like that to work? — Ixrec, Jun 10 '15 at 15:41
@Bart: It's not just the tool changing, we are also in the process of moving more code into libraries, which makes this problem more obvious. However, in SVN, everything was in one repository. So you could merge everything into the trunk in one big merge operation. The third issue: I have worked with SVN for many years, and know the best practices there. I am, however, new to git, and don't know how to tackle these problem with it. — sbi, Jun 10 '15 at 15:42
@Ixrec: From the question: _"Almost each time we integrate some other vendor's hardware we run into cases our current interfaces did not anticipate, and so have to refine them."_ And what you describe is doing versioning in the code. However, this is what SCM was invented for. — sbi, Jun 10 '15 at 15:42
@sbi What about using just one git repository for everything, if that's what you were doing with SVN? — Ixrec, Jun 10 '15 at 15:44
@Ixrec: I was told that "this is not the git way" of doing things. Everybody uses individual repositories for individual projects, so it was decided we do that, too. — sbi, Jun 10 '15 at 15:45
I would argue they are not separate projects if they frequently have to be changed in tandem. The boundaries between "projects" should always have some kind of long-term backwards compatibility guarantee imo. — Ixrec, Jun 10 '15 at 15:48
@Ixrec: Integrating more hardware is similar to porting code to more platforms: the more you have done this, the less you need to change for yet another hardware/platform. So in the long run, the code will stabilize. However, right now we will need to find a process that allows us to stay in the market long enough to get there. — sbi, Jun 10 '15 at 15:51
@Daenyth: Should I be eager to chase after the bone you threw me? You might just as well tell me to look at the moon. _Sigh._ Why do you think semantic versioning is relevant here and how do you think we should employ it to solve the problems I have described? (FWIW, we do use semantic versioning for our stuff. But that we know by looking at a library's version number whether it might be incompatible doesn't seem to help at all here.) — sbi, Jun 10 '15 at 16:13
I think you should probably not have multiple repositories. See this question: http://programmers.stackexchange.com/questions/161293/choosing-between-single-or-multiple-projects-in-a-git-repository. As long as you are frequently doing work cross repository, the split is working against you. — Winston Ewert, Jun 11 '15 at 14:32
If you insist on having multiple repositories see: http://stackoverflow.com/questions/816619/managing-many-git-repositories — Winston Ewert, Jun 11 '15 at 14:33
@Winston: I do not insist at all. See [this comment](http://programmers.stackexchange.com/questions/286400/#comment590702_286400). Thank you very much for the headsup, though, I will make sure to make our team aware that not everyone believes in the multiple repository approach. — sbi, Jun 11 '15 at 14:40

dagnelies · Accepted Answer · 2015-06-19T15:27:50.970

5

Kind of putting out the obvious here, but maybe worth to mention it.

Usually, git repos are tailored per lib/project because they tend to be independent. You update your project, and don't care about the rest. Other projects depending on it will simply update their lib whenever they see fit.

However, your case seems highly dependent on correlated components, so that one feature usually affects many of them. And the whole has to be packaged as a bundle. Since implementing a feature/change/bug often requires to adapt many different libraries/projects at once, perhaps it makes sense to put them all in the same repo.

There are strong advantages/drawbacks to this.

Advantages:

Tracability: the branch shows everything changed in every project/lib related to this feature/bug.
Bundling: just pick a tag, and you'll get all the sources right.

Drawbacks:

Merging: ...it's sometimes already tough with a single project. With different teams working on shared branches, be ready to brace for impact.
Dangerous "oops" factor: if one employee messes up the repository by making some mistake, it might impact all projects & teams.

It's up to you to know if the price is worth the benefit.

EDIT:

It would work like this:

Feature X must be implemented
Create branch feature_x
All developers involved work on this branch and work paralelly on it, probably in dedicated directories related to their project/lib
Once it's over, review it, test it, package it, whatever
Merge it back in the master ...and this may be the tough part since in the meantime feature_y and feature_z may have been added too. It becomes a "cross-team" merge. This is why it is a serious drawback.

just for the record: I think this is in most cases a bad idea and should be done cautiously because the merge drawback is usually higher than the one you get through dependency management / proper feature tracking.

edited Jun 19 '15 at 15:27

answered Jun 19 '15 at 11:46

dagnelies

5,415
3
20
26

Thanks, we are indeed currently looking at this. What I do not understand in the light of this is how we'd do branching with git. In SVN, branchings means copying a subtree (in the repository) to somewhere else (in the repository). With this, it is easy to create branches of any subtree in your repo, so if you have many projects in there, you can branch off each of them individually. Is there something like this in git, or would we always only be able to branch a whole repo? – sbi Jun 19 '15 at 11:54
@sbi: you'd branch the whole repo. You can't work with subtrees in different branches, which would kind of defeat the point in your case. Git won't "copy" anything though, it'll simply track the changes in the branch you're working on. – dagnelies Jun 19 '15 at 12:06
So this requires someone creating a feature branch for one library to also merge all the others when merging or rebasing. This is a real drawback. (BTW, SVN also only ever does lazy copies.) – sbi Jun 19 '15 at 12:09
@sbi : see edit – dagnelies Jun 19 '15 at 12:15
Thanks, but here issues are usually kept small and mostly only one (or rarely two) developers work on one feature. – sbi Jun 19 '15 at 12:17
@sbi: I think it all boils down to wether your coworkers are comfortable with git and merging their stuff in a highly shared codebase. – dagnelies Jun 19 '15 at 12:21
1

Well, currently most of us are not comfortable. `:-/` What's more, even those who are (and who pushed for the move to git), do not know how to make our development process fit with git. _Sigh._ It's going to be a few tough months, I'm afraid, until things start to become smoother. Thanks anyway, yours is the most/only helpful answer so far. – sbi Jun 19 '15 at 14:38
Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/24995/discussion-between-arnaud-and-sbi). – dagnelies Jun 19 '15 at 15:09
I'm now on the train, having promised my oldest I'd take her to Jurassic World. Some other time? – sbi Jun 19 '15 at 16:43
I got a bit caught up in the last few days, so now my time for the bounty nearly ran out. I gave it to you because yours was closest to a helpful answer I got. – sbi Jun 23 '15 at 09:17

score 4 · Answer 2 · answered Jun 10 '15 at 18:01

4

The solution you are looking for is a dependency management tool in coordination with git submodules

Tools such as:

Maven
Ant
Composer

You can use those tools to define dependencies of a project.

You can require a submodule to be at least version > 2.x.x or denote a range of versions that are compatible = 2.2.* or less than a particular version < 2.2.3

Whenever you release a new version of one of the packages you can tag it with the version number, that way you can pull in that specific version of the code into all other projects

answered Jun 10 '15 at 18:01

Patrick

2,922
2
21
24

But dependency management is not our problem, this is solved. We currently regularly make changes across many libraries and need to minimize the overhead of creating new versions while maintaining stable project releases. Your answer doesn't seem to provide a solution to this. – sbi Jun 10 '15 at 18:04
@sbi This would manage the overhead of creating new versions and maintaining stable project releases. Since you can dictate that project x relies on project y version 2.1.1, you can create new versions of project y that would not affect project x. – Patrick Jun 10 '15 at 20:06
Again, declaring dependencies is not our problem. We can do this already. The problem is how to manage changes cutting across several projects/libraries in an efficient manner. Your answer fails to explain this. – sbi Jun 10 '15 at 20:43
@sbi: so what is exactly your issue? You make your changes, bump the version, update dependencies where required, and voila. What you described in your initial post is typical maven & co. stuff. Each distribution is based on clearly defined versionned libs. How could it be more clear? – dagnelies Jun 16 '15 at 14:48
@arnaud: The turnaround times for such a process for (currently rather common) changes cutting through three or more layers would kill us. I thought my question described that. – sbi Jun 16 '15 at 16:14
@arnaud: Maven is something in the Java world, while we're in C++ land. (I'm not saying it wouldn't work for us, only that we don't know it.) – sbi Jun 16 '15 at 16:15
sbi the turnaround time for the process @arnaud suggested is essentially zero. Since all current projects have a particular version they are dependent on, they should always pull from that repo at that version number, if you push a new commit to master on that repo, it should be tagged with a new version number. – Patrick Jun 19 '15 at 18:27
@Patrick: I have described the steps in my question, and not only is the turnaround non-zero, it's also rather big. You, OTOH, haven't even bothered to provide an argument for why that seems neglectable to you. – sbi Jun 23 '15 at 09:22
@sbi I don't know why you're being so antagonistic. I provided an answer to your question that I thought resolved your issue. You obviously disagree, let it go – Patrick Jun 23 '15 at 12:44
@Patrick: And again no argument on the subject. HAND. – sbi Jun 25 '15 at 07:45

coredump · Answer 3 · 2015-06-19T14:31:49.023

Submodules

You should give a try to git submodules, as suggested in one comment.

When project P1 refers to the three submodules L1, L2 and L3, it actually stores a reference to particular commits in all three repositories: those are the working versions of each libraries for that project.

So multiple projects can work with multiple submodules: P1 might refer to the old version of library L1 while project P2 used the new version.

What happens when you deliver a new version of L3?

implement new interface in L3
commit, test, make pull request, review, merge, ... (you cannot avoid this)
ensure L2 works with L3, commit, ...
ensure L1 works with new L2, ...
ensure P1 works with the new versions of all libraries:
- inside P1's local working copy of L1, L2 and L3, fetche the changes you are interested in.
- commit changes, git add L1 L2 L3 to commit the new reference to modules
- pull request for P1, test, review, pull request, merge ...

Methodology

This is a tedious, error-prone, and very long process to implement this feature, it requires to independent reviews (which makes it much harder to review), does not scale at all, and is likely to put us out of business because we get so bogged down in process we never get anything done.

Yes, it requires independent reviews, because you change:

the library
libraries that depend on it
projects that depend on multiple libraries

Would you be put out of business because you deliver crap? (Maybe not, actually). If yes, then you need to perform tests and review changes.

With appropriate git tools (even gitk), you can easily see which versions of the libraries each project use, and you can update them independantly according to your needs. Submodules are perfect for your situation and won't slow your process down.

Maybe you can find a way to automate part of this process, but most of the steps above require human brains. The most effective way to cut time would be to ensure your libraries and projects are easy to evolve. If your codebase can handle new requirements gracefully, then code reviews will be simpler and take little of your time.

(Edit) another thing that might help you is to group related code reviews. You commit all changes and wait until you propagated those changes down to all the libraries and projects that use them before sumbitting pull requests (or before you take care of them). You end up doing a bigger review for the whole dependency chain. Maybe this can help you save time if each local change is small.

You describe how to solve the dependency problem (which, as I have said before, we have sorted out) and negate the very problem we are having. Why do you even bother? (FWIW, we write software that drives power plants. Clean, secure, and thoroughly reviewed code is a prime feature.) — sbi, Jun 19 '15 at 14:36
@sbi What are submodules if not a special case of branching and tagging? You **think** submodules are about dependency management, because they also keep track of dependencies. But sure, please reinvent submodules with tags if you want, I don't mind. I don't understand your problem: if reviewed code is a prime feature, your must budget some time for reviews. You are not bogged down, you go as fast as possible with the constraints put on you. — coredump, Jun 19 '15 at 15:04
Reviews are very important to us. That's one of the reasons we're concerned about an issue (and its reviews) being split across several reviews for several changes in several repositories. Plus we don't want to get bogged down in administrating us to death over one issue is that we'd rather spend the time writing and reviewing code. Re submodules: so far, the only thing I have heard about them is "don't bother, this isn't the git way". Well, given that our requirements seem to be so unique, maybe we should revisit this decision... — sbi, Jun 19 '15 at 16:41

score 0 · Answer 4 · answered Jun 19 '15 at 16:44

0

So what i understand is you for P1 you want to change L3 interface but you want the other P2 and P3 which depend on L3 interface to change right away. This is a typical case of backward compatibility. There is a nice article on this Preserving Backward Compatibility

There are several ways you can solve this:

You have to create new interfaces each time which can extend the old interfaces.

OR

If you want to retire old interface after some time you can have several version of interfaces and once all dependent projects move you remove the older interfaces.

answered Jun 19 '15 at 16:44

Uday Shankar

109
3

1

No, backwards compatibility is ensured through release branches and is not our problem. The problem is that we sit on a currently rapidly changing codebase, which wet want to separate into libraries now, despite the fact that the interfaces are still in the phase where they change often. I know how to manage such beasts in SVN, but do not know how to do that in git without being drowned in administration. – sbi Jun 19 '15 at 16:50

soru · Answer 5 · 2015-06-19T21:07:36.387

If I am getting your problem right:

your have 4 inter-related modules, P1 and L1 to L3
you need to make a change to P1 which ultimately will affect L1 to L3
it counts as a process failure if you have to change all 4 together
it counts as a process failure if you have to change them all 1 by 1.
it counts as a process failure if you have to identify in advance the chunks in which changes have to be made.

So the goal is you can do P1 and L1 in one go, and then a month later do L2 and L3 in another.

In the Java world, this is trivial, and perhaps the default way to work:

everything goes in one repository with no relevant use of branching
modules are compiled + linked together by maven based on version numbers, not the fact that the are all in the same directory tree.

So you can have the code on your local disk for L3 that wouldn't compile if it was compiling against the copy of the P1 in the other directory on your disk; luckily it isn't doing so. Java can straightforwardly do this because compiling/linking tales place against compiled jar files, not source code.

I'm not aware of a pre-existing widely-used solution to this problem for the C/C++ world, and I'd imagine you hardly want to switch languages. But something could easily be hacked together with make files that did the equivalent thing:

installed libraries + headers to known directories with embedded version numbers
changed compiler paths per module to the directory for the appropriate version numbers

You could even use the C/C++ support in maven, although most C developers would look at you strangely if you did...

_"it counts as a process failure if you have to change all 4 together"_. Actually it wouldn't. In fact, this is exactly what we did using SVN. — sbi, Jun 22 '15 at 21:22
In which case then I guess there is no problem with simply putting all the projects in the repository. — soru, Jun 23 '15 at 01:53
We are now evaluating putting the libraries into only two repositories. That's still more than one, but much less than "one for every project", and the libraries can very well be split into two groups. Thanks for your input! — sbi, Jun 23 '15 at 09:25
P.S.: _" I'd imagine you hardly want to switch languages."_ This is embedded stuff. `:)` — sbi, Jun 23 '15 at 09:25

score -1 · Answer 6 · answered Jun 18 '15 at 12:48

-1

There is a simple solution: cut release branches across whole repository, merge all fixes to all actively shipped releases (it is easy in clear-case should be possible in git).

All alternatives will create a horrible mess over time and with project growth.

answered Jun 18 '15 at 12:48

zzz777

456
1
5
11

Could you please elaborate? I am not sure what you are suggesting. – sbi Jun 18 '15 at 13:31
In clear-case you define a branch-point as base branch and timestamp. You have following hierarchy: base branch -> release-development-branch -> private-development branch. All development is done on private-branches and then merged down the hierarchy. Customer release branches are broken off release-development-branch. I am not that familiar with git but it seems the closest thing to clear case among free source control systems. – zzz777 Jun 18 '15 at 13:41
A careful read of my question should have shown you that we have problems with the overhead involved in propagating changes between repositories. This has nothing to do with your answer. – sbi Jun 18 '15 at 14:04
@sbi I am sorry I misunderstood your question. And I am afraid you will face a horrible mess sooner or later. – zzz777 Jun 18 '15 at 14:17

In git, how to do versioning for a dozen libraries all worked at in parallel

6 Answers6

Submodules

Methodology