149

Currently we have one master branch for our PHP application in a shared repository. We have more than 500 clients who are subscribers of our software, most of whom have some customization for different purposes, each in a separate branch. The customization could be a different text field name, a totally new feature or module, or new tables/columns in the database.

The challenge we face is that as we maintain these hundreds of customized branches and distribute to clients, from time to time we provide new feature and update our master branch, and we would like to push master branch changes to the custom branches in order to update them to the latest version.

Unfortunately this often results in many conflicts in the custom code, and we spend many hours going through every single branch to solve all the conflicts. This is very inefficient, and we've found that mistakes are not uncommon when solving these conflicts.

I am looking for a more efficient way to keep our client release branches up to date with the master branch that will result in less effort during merging.

ctote
  • 173
  • 7
Fernando Tan
  • 1,527
  • 2
  • 11
  • 9
  • 11
    Sorry not to give a "you can use X tool" answer, but there isn't one. – Lightness Races in Orbit Nov 09 '15 at 16:27
  • 3
    Or during build (which is probably more common). Just.. not entirely separately codebases. – Lightness Races in Orbit Nov 09 '15 at 16:43
  • 1
    Given the general attitude that this is impossible, I'd like to offer a direction to look: consider trying to treat these branches as organic entities rather than cold hard code. Organic life has been solving the "500 individuals which are mostly alike" problem for millions of years, so there should be some good lessons learned that you can leverage to make your life easier. – Cort Ammon Nov 09 '15 at 17:29
  • 17
    @FernandoTan - Your visible symptom may be code, but the root cause of your disease is your product fragmentation, the cure needs to come from product focus/ product capability mapping, not code clean up - that will eventually happen. I've detailed more in my answer - http://programmers.stackexchange.com/a/302193/78582 – Alex S Nov 10 '15 at 05:36
  • 8
    This is could also be an economic problem. Do you really make money from all of those 500 clients? If not you have to overthink your pricing model and reject change requests if the customer doesn't pay an extra fee. – Christian Strempfer Nov 10 '15 at 11:04
  • 1
    Related question: [Pre-processor usage to separate logic to different versions of product](http://programmers.stackexchange.com/questions/262810/pre-processor-usage-to-separate-logic-to-different-versions-of-product) – CodesInChaos Nov 10 '15 at 11:06
  • How large is the company? (incl. subcontractors if some branches are off-loaded) –  Nov 10 '15 at 15:50
  • 1
    HP had the same problem with their printer firmware: http://itrevolution.com/the-amazing-devops-transformation-of-the-hp-laserjet-firmware-team-gary-gruver/ – Patrick Nov 11 '15 at 10:46
  • 14
    This made my heart break just a tiny bit. Fortunately others are already shouting the right answers -- my only additional recommendation is that you write this up and submit it to TheDailyWTF. – zxq9 Nov 11 '15 at 11:00
  • Are there any good resources for learning about technical design issues such as this? There seem to be lots of resources for learning a language, but very few on how to learn it well, and on how to design a large scale application. – Gavin Coates Nov 11 '15 at 16:10
  • 2
    @GavinCoates: Funnily enough, I was asked just today where I picked up my knowledge/expertise on such things. I couldn't answer. It seems to be a combination of experience, OCD and -- frankly -- common sense. I mean, really, you don't need a book to tell you that when you've reached your third or fourth "branch" like this, it's time to stop and re-think what you're doing. – Lightness Races in Orbit Nov 11 '15 at 23:54
  • @LightnessRacesinOrbit you're right, it is obvious you are doing it wrong when you hit 4 branches. But a book to explain how you should be doing it would be handy :) – Gavin Coates Nov 12 '15 at 10:48
  • @GavinCoates: Okay, good point :P I haven't read Lakos yet but I do have a copy and I've been told by some that it is good for this kind of thing. Then again, I've heard from others that it's to be avoided, so who knows. – Lightness Races in Orbit Nov 12 '15 at 11:04
  • 2
    You should fix this situation by integrating one branch after another into the master branch. If you have hundreds, you will need to automate this at first, even if the result will not be pretty. To integrate a branch, first merge all changes from the master branch into it (or rebase accordingly), then create a diff with the master branch using something like the '-D' option of GNU diff. With a little bit of work, you should be able to all branches into '#ifdef'-like constructs in the master branch. Then you can clean up your code as a second step. – Sebastian Reichelt Nov 12 '15 at 18:35
  • Thank you for comments, suggestion, article link and etcetera. This is very helpful indeed. It is not limited to this question, but also better code design in future. As per suggestion, my team are going to process all branches into single or few branches. Hopefully this would be a success journey :) – Fernando Tan Nov 21 '15 at 17:44

9 Answers9

327

You are completely abusing branches! You should have the customisation powered by flexibility in your application, not flexibility in your version control (which, as you have discovered, is not intended/designed for this sort of use).

For example, make textfield labels come from a text file, not be hardcoded into your application (this is how internationalisation works). If some customers have different features, make your application modular, with strict internal boundaries governed by stringent and stable APIs, so that features can be plugged-in as needed.

The core infrastructure, and any shared features, then only need be stored, maintained and tested once.

You should have done this from the start. If you already have five hundred product variants (!), fixing this is going to be a huge job … but no more so than ongoing maintenance.

Lightness Races in Orbit
  • 8,755
  • 3
  • 41
  • 45
  • 147
    +1 for "You should have done this from the start". This level of technical debt can destroy a company. – Daenyth Nov 09 '15 at 16:05
  • 33
    @Daenyth: Frankly with _five hundred_ custom branches I'm amazed it hasn't already. Who lets things get this bad? lol – Lightness Races in Orbit Nov 09 '15 at 16:06
  • 11
    arghhh, to explain little bit about background, I am the new software architect to this company. Previously I was doing coding related task and this is my first time to turn into high-level design role. – Fernando Tan Nov 09 '15 at 16:33
  • 74
    @FernandoTan I am so, so, so sorry for you... – enderland Nov 09 '15 at 16:36
  • 21
    @FernandoTan: Me too. :( Maybe you should have asked more questions at interview? ;) To be clear, the "you" in my answer is the organisation. It's an abstraction. I'm not looking to assign blame to individuals. – Lightness Races in Orbit Nov 09 '15 at 16:42
  • 58
    First get more insight: Let developers make a diff between the current version and the customized branch. So you at least know what differences there are. That list allows you to see where you can win the quickest reducing of branches. If 50 have custom fieldnames just focus on that and it will save you 50 branches. Then look for the next one. You might also have some which are not restorable, but then at least the amount will be lower and it won't grow further when you get more clients. – Luc Franken Nov 09 '15 at 17:44
  • 9
    Just a general idea of where to look to start fixing this fiasco - Identify some of the smallest "branches" - i.e. a single line different from the norm. Refactor your code to read from a config file (which is NOT in the repository, although a sample config file might be). Instead of just outputting the line from the code, output the line as a default - and if there's a replacement line in the config file, use that instead. Once that's committed, close the branch. Now every customer has 1 thing, and the one customer has their customization. 499 branches to go. – Jake Nov 09 '15 at 18:56
  • 5
    @Jake you should have all your config files for the different customers somewhere under version control, though. Doesn't have to be the same repository, and shouldn't be 500 branches in one, certainly. – Paŭlo Ebermann Nov 09 '15 at 20:51
  • 1
    True - always good to have source control for the configs themselves, I just meant the main application repo should probably be client agnostic. As for the last line, I didn't quite catch what you meant - if you're saying that you don't necessarily need to go down to just 1 branch, you're right there as well! But the branches should have purposes - staging, experimental, test, production - and not be one branch per customer. – Jake Nov 09 '15 at 22:09
  • @Jake You'll probably need to _ping_ Paulo in order to reply to him. Use the "@" symbol. – Lightness Races in Orbit Nov 09 '15 at 22:29
  • 7
    @FernandoTan It's critical to make sure the abstractions you make actually hold. I've been working with a system that adapts to thousands of different customers using configs for a few years now, and it's an absolute nightmare. Understand the reasoning for the differences, found out what they have in common, and slowly move the differences to settings - but only if it makes sense, and if it's something you can reasonably generalize. It's better to have 10 branches (ideally as modules) that make sense than force yourself into a single branch with thousands of interacting settings. – Luaan Nov 10 '15 at 09:11
  • 8
    @FernandoTan You really need to avoid the fun "a function with 20 different `if`s based on configs", otherwise you've just moved your original VCS problem to a config problem - in the end, you could even make it worse. Find pieces that can be handled with customer-specific modules. Find pieces that make sense as configs. Make sure the configs do not interact too much with each other - it's already quite bad to have 1000 different settings, but if many of those interact with others, it can get as bad as if you had 100k's of settings. Complex interactions work well in modules, not configs. – Luaan Nov 10 '15 at 09:14
  • 5
    I'm going to disagree in one substantial way: Configuration doesn't have to come from a data file, as this answer suggests. It can be performed by code callbacks (think `Form.Load` event in C#). The important thing is that the customizations are (1) stored separately from the main logic, and (2) interact with a fairly stable public interface of the main logic. The rest of the guidance is perfect. – Ben Voigt Nov 10 '15 at 14:52
  • 2
    @BenVoigt: Well I did say "for example"... – Lightness Races in Orbit Nov 10 '15 at 17:43
  • 1
    I just came across [this article on Future Proofing from Uncle Bob Martin](http://blog.8thlight.com/uncle-bob/2015/10/30/futureproof.html), which seems appropriate. – AShelly Nov 12 '15 at 21:33
  • @AShelly: _"Cordless garden hoses are impossible"_ pfft just you wait I'll show you – Lightness Races in Orbit Nov 12 '15 at 21:57
  • @AShelly: Yeah I agree that's certainly relevant here. A developer who doesn't somehow innately comprehend the benefit of due diligence in designing reusable interfaces — even _internal_ to my code — is frankly of little use to me. And I have no idea how to explain that, so perhaps it's best just to link everyone to that article. :) – Lightness Races in Orbit Nov 12 '15 at 21:59
95

Having 500 clients is a nice problem, if you had spent the time up front to avoid this problem with branches, you may never have been able to remain trading for long enough to get any clients.

Firstly, I hope you charge your clients enough to cover ALL the costs of maintaining their custom versions. I am assuming that clients expect to get new versions without having to pay for their customizations to be done again. I would start by finding all the files that are the same in 95% of your branches. That 95% is the stable part of your application.

Then, find all the files that only have a few lines different between the branches – try to introduce a configuration system such that these differences can be removed. So, for example, rather than having 100s of files with textfield labels that are different, you have 1 config file that can override any text label. (This does not have to be done in one go, just make a text field label configurable the first time a client wants to change it.)

Then move onto the harder issues using the Strategy pattern, dependency injection etc.

Consider storing json in the database rather than adding columns for client’s own fields – this may work for you if you don’t need to search these fields with SQL.

Everytime you check a file into a branch, you MUST diff it with main and justify every single change, including white space. Lots of changes will not be needed and can be removed before the checkin. This may just be down to one developer having different settings in their editor for how code is formatted.

You are aiming to first go from 500 branches with lots of files that are different, to most branches only having a few files that are different. While still making enough money to live.

You may still have 500 branches in many years time, but if they are a lot easier to manage, then you have won.


Based on the comment by br3w5:

  • You could take each class that is different between clients
  • Make a “xxx_baseclass” that define all the methods that are called in the class from outside of it
  • Rename the class so that xxx is called xxx_clientName (as sub class of xxx_baseclass)
  • Use dependency injection so that the correct version of the class is used for each client
  • And now for the clever insight br3w5 came up with! Use a static code analysis tool to find the now duplicated code, and move it into the base class etc

Only do the above after you have got the easy grain, and trail it with a few classes first.

user310579
  • 103
  • 3
Ian
  • 4,594
  • 18
  • 28
  • 28
    +1 for attempting to provide an approach for the actual problem – Ian Nov 10 '15 at 11:38
  • 38
    I was really worried that you were congratulating yourself on your answer, until I realized you weren't the same @Ian that wrote the answer. – Theron Luhn Nov 11 '15 at 17:48
  • 2
    Maybe they should use a static code analysis tool to narrow down what parts of the code are duplicated (after identifying all the files that are the same) – br3w5 Nov 11 '15 at 19:07
  • 1
    Also creating versioned packages to help the team track which client has which version of the code – br3w5 Nov 11 '15 at 19:08
  • @br3w5, great ideal see edit – Ian Nov 12 '15 at 10:26
  • 1
    It sounds like a long winded way of saying "just refactor your code" – Roland Tepp Nov 12 '15 at 21:37
  • 1
    @RolandTepp, Saying "just refactor your code" does not help, as it may take years and without a careful plan would not give any benefit until it is completed. The refactored code is then lickly to be in yet anther branch! – Ian Nov 24 '15 at 13:02
40

In the future, ask the Joel test questions in your interview. You'd be more likely not to walk into a trainwreck.


This is an, ah, how shall we say... really, really bad problem to have. The "interest rate" on this technical debt is going to be very, very high. It might not be recoverable...

How integrated with the "core" are these custom changes? Can you make them their own library and have a single "core" and each specific customer having its own "add-on?"

Or are these all very minor configurations?

I think the solution is a combination of:

  • Changing all the hardcoded changes into configuration based items. In this case everyone has the same core application, but users (or you) turn on/off functionality, set namings, etc. as needed
  • Moving "client specific" functionality/modules to separate projects, so instead of having one "project" you have one "core project" with modules you can add/remove easily. Alternatively you can make these configuration options too.

Neither will be trivial as if you ended up here with 500+ clients, you likely made no real distinction in this. I expect your changes in separating this is going to be a very time consuming task.

I also suspect you are going to have significant problems in easily separating out and categorizing all your client-specific code.

If most of your changes are specifically wording differences, I suggest reading questions like this about language localization. Whether you are doing multiple languages entirely or just a subset, the solution is the same. This is specifically PHP and localization.

enderland
  • 12,091
  • 4
  • 51
  • 63
  • 1
    Also, since this will be a huge task (to say the least), it will be a significant challenge to even convince your managers to throw great amounts of time and money at this problem. @FernandoTan There may be questions + answers on this site which can help with this specific issue. – Radu Murzea Nov 11 '15 at 21:17
  • 10
    Which question of the joel test would have told you that the company is abusing branches? – SpaceTrucker Nov 12 '15 at 08:01
  • 4
    @SpaceTrucker: Well, "Do you make daily builds?" might have helped. With 500 branches, they probably did not have them, or might have mentioned that they only do it for some branches. – sleske Nov 13 '15 at 12:23
17

This is one of the worst anti-patterns you can hit with any VCS.

The correct approach here is to turn the custom code into something driven by configuration, and then each customer can have their own configuration, either hardcoded in a config file, or in a database or some other location. You can enable or disable entire features, customize how responses look, and so on.

This allows you to keep one master branch with your production code.

Daenyth
  • 8,077
  • 3
  • 31
  • 46
  • 3
    If you do this, do yourself a favor and try to use the [Strategy pattern](https://en.wikipedia.org/wiki/Strategy_pattern) as much as possible. This will make it much easier to maintain your code than if you simply slather on `if(getFeature(FEATURE_X).isEnabled())` throughout. – TMN Nov 09 '15 at 17:17
13

The purpose of branches is to explore one possible avenue of development without risking to break stability of the main branch. They should eventually be merged back at a suitable time, or be discarded if they lead to a dead end. What you have are not so much branches, but much rather 500 forks of the same project and trying to apply the vital changesets to all of them is a sisyphean task.

What you should do instead is have your core code live in its own repository, with the necessary entry points to modify behavior through configuration and to inject behavior as allowed by inverted dependencies.

The different setups you have for clients can then either merely distinguish each other by some externally configured state (e.g. a database) or if necessary live as separate repositories, which add the core as a submodule.

back2dos
  • 29,980
  • 3
  • 73
  • 114
7

All the important things have been proposed by good answers here. I'd like to add my five pence as a process suggestion.

I'd like to suggest you solve this problem in a long or mid-term range and adopt your policy, how you develop code. Try to become a flexible learning team. If somebody allowed to have 500 repos instead of making the software configurable, then it is time to ask yourself how you have worked so far and you will do from now on.

Which means:

  1. Clarify the change management responsibilities: if a customer needs some adaptions, who is selling them, who is allowing them and who decides how the code will be changed? Where are the screws to turn if some things must be changes?
  2. Clarify the role, who in your team is allowed to make new repos, and who is not.
  3. Try to make sure that everybody in your team sees the necessity of patterns that allow flexibility to software.
  4. Clarify your management tool: how do you quickly know what customer has what code adoptions. I know, some "list of 500" sounds annoying, but here is some "emotional economy", if you want. If you can't tell the customer's changes in a quick time, you feel even more lost and drawn as if you have to start a list. Then, use that list to group the features the way other people's answers here have shown you:
    • group customers by minor/major changes
    • group by subject related changes
    • group by changes easy to merge and changes difficult to merge
    • find groups of equal changes made to several repos (oh yes, there will be some).
    • maybe the most important in order to talk to your manager/investor: group by expensive changes and cheap changes.

This is in no way meant to make a bad pressure atmosphere in your team. I rather suggest that you clarify these points first for yourself and, wherever you feel the support, organize this together with your team. Invite people friendly to the table in order to improve all your experience.

Then, try to establish a long term time window, where you cook this thing on a small flame. Suggestion: try to merge at least two repos every week, and so remove at least one. You may learn that often, you can merge more than two branches, as you get routine and oversight. That way, in one year you can deal the worst (most expensive?) branches, and in two years you can reduce this problem to have a clearly better software. But don't expect more, as in the end nobody will "have time" for this, but you are the one who won't allow this any longer as you are the software architect.

This is how I would try to handle it if I were in your position. However I don't know how your team is going to accept such things, how the software really allows this, how you are supported and also what you still have to learn. You are the software architect - just go for it :-)

peter_the_oak
  • 661
  • 3
  • 8
  • 2
    Good points about adressing the social/organizational issues lurking behind the technical problems. This is too often overlooked. – sleske Nov 13 '15 at 12:24
5

Contrasting all the nay-sayers, let's assume real business need.

(for example, deliverable is source code, customers are from same line of business and therefore competitors to each other, and you business model promises to keep their secrets secret)

Furthermore, let's assume that your company has the tools to maintain all the branches, that is either manpower (let's say 100 developers dedicated to merging, assuming 5-day release delay; or 10 devs assuming 50-day release lag is OK), or such awesome automated testing that automated merges are truly tested both to core spec and extension spec in every branch, and thus only changes that don't merge "cleanly" require human intervention. If your customers pay not only for customisations but for maintenance thereof, this may be a valid business model.

My (and nay-sayers) question, is, do you have a dedicated person responsible for delivery to each customer? If you are, say, a 10,000-person company, it may be the case.

This could be handled by plugin architecture in some cases, let's say your core is trunk, plugins could be held in trunk or branches, and configuration for each customer is either a uniquely named file or is held in customer branch.

Plugins could be loaded at run time, or built in at compile time.

Truly many projects are done like this, fundamentally same problem still applies -- simple core changes are trivial to integrate, conflict changes must be either rolled back, or changes are needed to many plugins.

There are cases when plugins are not good enough, that's when so many internals of the core must be tweaked that plugin interface count becomes too large to handle.

Ideally this would be handled by aspect-oriented programming, where trunk is core code, and branches are aspects (that is extra code and instructions how to connect extras to core)

A simple example, you can specify that custom foo is ran before or after core klass.foo or that it replaces it, or that wraps it and can change input or output.

There's a ton of libraries for that, however the problem of merge conflicts does not go away -- clean merges are handled by AOP and conflicts still need human intervention.

Finally such business truly has to concern itself with branch maintenance, namely, is customer-specific feature X so common that it's cheaper to move it to core, even though not all customers are paying for it?

3

You are not solving the root cause of the disease by looking at the symptom. Using a 'code management' approach is symptomatic, but will not solve things for you long term. The root cause is lack of 'well managed' product capabilities, features & their extensions and variations.

Your 'custom' code represents nothing but extensions of the product features & capabilities and data field changes at others.

How extensive the custom features, how different, how contextually similar or not will play a lot into 'sanitizing' your product's code base.

More than how you code and version, this is a place where product management, product architecture and data architecture comes into play. Seriously.

Because, at the end of the day, the code is nothing but your offering of business and product features/ services to your clients. That is what your company is getting paid for.

Getting a better handle on this must come from the 'capabilities' standpoint and not code standpoint.

You, your company, and product can not be everything to everyone. Now that you have a decent revenue base of 500 clients, it's time to productize on what you intend to be.

And if you are offering several things, it would make sense to modularize your product capabilities in an organized fashion.

How broad and deep will your products go? Or else this will lead to 'quality of service' issues & 'product dilution and fragmentation' as you go down the line.

Will you be a CRM or ERP or order processing/ dispatch or Microsoft Excel?

Your existing extensions need to roll up and harmonize, the way a large software major pulls in and merges products acquired from a startup.

You will need to have a strong product management and data architecture person map the following:

  • Master branch, its product capabilities, and features base
  • Custom extension features, types, and variations
  • Significance and variation of 'custom fields'

..to create road map of assimilation and harmonization of all these loose product threads / branches in the grand context of your core application.

PS: Connect with me, I know a person who can help you fix this :)

Alex S
  • 161
  • 1
  • 5
-5

I can relate to this. I have taken many projects on. In fact, 90% of our development work is fixing such things. Not everyone is perfect, so I suggest you use version control in the correct way and where you are, if possible you can do the following.

  • From now on, when a customer asks for an update, move them into new forked repository.
  • If you want to merge them to master do it as the first thing and resolve conflicts.
  • Then manage their issues and sprints with their repository and keep those in master that you want to launch in master. This might put more strain on release cycles, but that will save you over time.
  • Maintain a master branch of the main repository for new customers and the main repository should only have those branches you are working on for future stuff. Legacy branches can then be deleted once they are migrated to customer repositories.

I have personally imported a repository from GitHub with 40 branches to Bitbucket and created 40 repositories. It took only four hours. This was WordPress theme variations so push and pull was quick.

There are many reasons for "not doing it right first time", and I think those who accepts them quickly and move on to "do it right this time" would always be successful.

Farrukh Subhani
  • 208
  • 2
  • 7
  • 16
    How would multiple repositories make maintenance any easier? – Evan Davis Nov 10 '15 at 03:21
  • In some cases like ours the customers need to have access to each repo and manage their own issues when it becomes customised solution so they have their own repo which makes it easier to manage and as i said these are wordpress theme variations it worked well. It may not work in many cases. – Farrukh Subhani Nov 14 '15 at 16:27