45

I'm working on a very large research-led open-source project, with a bunch of other regular contributors. Because the project is now quite big, a consortium (composed of two full-time employees and few members) is in charge of maintaining the project, the continuous integration (CI), etc. They just don't have time for integration of external contributions though.

The project is composed of a "core" framework, of about half-a-milion-or-so lines of code, a bunch of "plugins" that are maintained by the consortium, and several external plugins, most of which we aren't even aware of.

Currently, our CI builds the core, and the maintained plugins.

One of the big issue we face is that most contributors (and especially the occasional ones) aren't building 90% of the maintained plugins, so when they propose refactoring changes in the core (which these days happens on a quite regular basis), they checked that the code compiles on their machine before making a pull request on GitHub.

The code works, they're happy, and then the CI finishes building and the problems start: compilation failed in a consortium-maintained plugin, that the contributor did not build on his/her machine.

That plugin might have dependencies on third-party libraries, such as CUDA for instance, and the user does not want, does not know how to, or simply can't for hardware reasons, compile that broken plugin.

So then - either the PR stays ad aeternam in the limbo of never-to-be-merged PRs - Or the contributor greps the renamed variable in the source of the broken plugin, changes the code, pushes on his/her branch, waits for the CI to finish compiling, usually gets more errors, and reiterates the process until CI is happy - Or one of the two already-overbooked permanents in the consortium gives a hand and tries to fix the PR on their machine.

None of those options are viable, but we just don't know how to do it differently. Have you ever been confronted to a similar situation of your projects? And if so, how did you handle this problem? Is there a solution I'm not seeing here?

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
lagarkane
  • 569
  • 4
  • 4
  • 85
    The topmost rule of providing a plugin API to a system is that **it is kept stable** or at least backwards compatible. Changes to the core without intentionally changes to the Plugin API shall never break the compilation of any plugins (it may happen that it breaks the functionality by accident, but not the compilation). If a simple change of a variable name *inside the core* can lead to a broken compilation of *a plugin*, separation between plugins and core seem to be completely broken. – Doc Brown Aug 08 '19 at 10:51
  • [Martin Fowler's "Public versus Published Interfaces"](https://martinfowler.com/ieeeSoftware/published.pdf) might be a useful read. – Schwern Aug 08 '19 at 18:23
  • @DocBrown Please inform the Visual Studio team. – Kevin Krumwiede Aug 09 '19 at 04:23
  • 1
    @KevinKrumwiede: I am sure they know this already ;-) If you experienced incompatibilities, I am pretty sure they changed the API intentionally. – Doc Brown Aug 09 '19 at 04:56
  • 3
    I would rephrase the question, since it is really misleading. Something like _How can I manage PRs when they break our current CI?_ capture better your situation i think. – bracco23 Aug 09 '19 at 13:40
  • 2
    How difficult/complex is your build/test process? It should just be a matter of running a single command, or clicking a single button. At that point, it becomes reasonable to expect users to run all tests for themselves before submitting a PR. – Alexander Aug 09 '19 at 17:47
  • Is this an open source project? You'll get better answers if you post a link to the core API. – Navin Aug 09 '19 at 20:49
  • The alternative is to merge their changes without any CI, and then everyone's stuff breaks. That's clearly worse. – OrangeDog Aug 10 '19 at 10:28
  • It took me while, to get that the OP doesn't write _CV_-driven development... – keuleJ Aug 10 '19 at 17:08

9 Answers9

68

CI-driven development is fine! This is a lot better than not running tests and including broken code! However, there are a couple of things to make this easier on everyone involved:

  • Set expectations: Have contribution documentation that explains that CI often finds additional issues, and that these will have to be fixed before a merge. Perhaps explain that smallish, local changes are more likely to work well – so splitting a large change into multiple PRs can be sensible.

  • Encourage local testing: Make it easy to set up a test environment for your system. A script that verifies that all dependencies have been installed? A Docker container that's ready to go? A virtual machine image? Does your test runner have mechanisms that allows more important tests to be prioritized?

  • Explain how to use CI for themselves: Part of the frustration is that this feedback only comes after submitting a PR. If the contributors set up CI for their own repositories, they'll get earlier feedback – and produce less CI notifications for other people.

  • Resolve all PRs, either way: If something cannot be merged because it is broken, and if there's no progress towards getting the problems fixed, just close it. These abandoned open PRs just clutter up everything, and any feedback is better than just ignoring the issue. It is possible to phrase this very nicely, and make it clear that of course you'd be happy to merge when the problems are fixed. (see also: The Art of Closing by Jessie Frazelle, Best Practices for Maintainers: Learning to say no)

    Also consider making these abandoned PRs discoverable so that someone else can pick them up. This may even be a good task for new contributors, if the remaining issues are more mechanical and don't need deep familiarity with the system.

For the long-term perspective, that changes seem to break unrelated functionality so often could mean that your current design is a bit problematic. For example, do the plugin interfaces properly encapsulate the internals of your core? C++ makes it easy to accidentally leak implementation details, but also makes it possible to create strong abstractions that are very difficult to misuse. You can't change this over night, but you can shepherd the long-term evolution of the software towards a less fragile architecture.

amon
  • 132,749
  • 27
  • 279
  • 375
34

Building a sustainable plugin model requires that your core framework expose a stable interface that plugins can rely on. The golden rule is that you can introduce new interfaces over time but you can never modify an already published interface. If you follow this rule, you can refactor the implementation of the core framework all you want without fear of accidentally breaking plugins, whether it is a consortium-maintained one or an external one.

From what you described, it sounds like you don't have a well-defined interface, and that makes it difficult to tell if a change will break plugins. Work towards defining this interface and making it explicit in your codebase, so that contributors will know what they should not modify.

casablanca
  • 4,934
  • 16
  • 24
  • Yes, hopefully that will only be a temporary issue (until the API for the core is defined well enough)... But 12 years of Ph.D, interns and research developments does not usually bring you to well-defined interfaces sadly... – lagarkane Aug 08 '19 at 09:16
  • 20
    CI should have automated tests. If you want to ensure plugins have the same interface every plugin should contribute tests that express the interface they need. Come at it this way and when the interface changes, which it will, you'll know which plugins you're breaking. Give me these tests to run locally and I'll know what I'm breaking before I issue the PR. – candied_orange Aug 08 '19 at 10:18
  • 1
    @lagarkane well-definedness is more a policy issue than technical. There are software out there that like yours, simply abandon previous behaviour in an upgrade. Perl5 is not compatible with Perl6, Python2.7 is not fully compatible with Python3.4 etc. Then there are software that whatever happens still support old code. You can still run almost all javascript code written for Netscape Navigator 4 in modern browsers. The Tcl programming language is backwords compatible way back to the original version etc... – slebetman Aug 08 '19 at 23:47
  • ... the key is never modifying old APIs (fixing bugs are OK but they must maintain published behavior) and only adding new APIs – slebetman Aug 08 '19 at 23:48
  • 3
    @lagarkane: Forming the consortium was a step in the right direction, and if the core members focus their energy on carving out these interfaces, then you can harness the power of future PhDs and interns to keep your project going strong while minimizing breakages. :) – casablanca Aug 09 '19 at 03:01
  • Taking a high level view though, However, Apple are, simply, the most successful Thing of any type in human civilization (well, the largest anyway, by value) and their most basic fundamental mantra is "break the past, don't support anything" (in contrast to the Windows way of supporting the past). – Fattie Aug 09 '19 at 14:45
  • 4
    @Fattie: That works for Apple because they build successful consumer-facing products and developers are forced to play along if they want to be part of it. It's unlikely that those developers actually like breaking changes, and it's definitely not a good model for an open-source project. – casablanca Aug 10 '19 at 04:13
  • @casablanca - I disagree. You can choose the "Windows model" or the "Apple model". Truth and philosophy does not change with scale. – Fattie Aug 10 '19 at 16:07
  • @casablanca - BTW thanks for the thoughtful comment. Everything else on the page is drivel :) :) – Fattie Aug 10 '19 at 16:08
  • @Fattie: I'd argue that you need a successful product before you can dictate the rules. Windows (on the desktop) and iPhone are both examples of such products. Windows Phone was an example of how repeated API changes can kill an up-and-coming product. – casablanca Aug 10 '19 at 20:13
  • 1
    @casablanca both the MacOS and WindowsOS lineage are hugely successful. (Arguably, the two greatest products in sheer dollar terms in human existence.) Over the decades they had absolutely opposite approaches. Apparently both were successful! – Fattie Aug 10 '19 at 20:15
8

To be honest, I don't think you can handle this in a better way - if changes result in breaking maintained parts of your project the CI should fail.

Does your project have a contributing.md or something similar to help new and occasional contributors with preparing their contributions? Do you have a clear list, which plugins are part of the core and need to stay compatible?

If it is hard to build everything on a machine due to dependencies etc you could think about creating ready-to-use docker images as build-environments for your contributors to use.

mhr
  • 660
  • 6
  • 18
  • 1
    Thanks for the reply! Yes we do have contributing guidelines publicly available, but it doesn't list the plugins as you suggest, which would already be a good idea. Making docker images sounds like a great improvement to the current contributing process already! Thanks for the input – lagarkane Aug 08 '19 at 09:14
8

so when they propose refactoring changes in the core (which these days happens on a quite regular basis), they checked that the code compiles on their machine before making a pull request on github.

So I think this is where the loose style of open source projects can fall down; most centrally-organised projects are wary of core refactoring, especially when it crosses an API boundary. If they do refactor an API boundary, it's usually a "big bang" where all the changes are scheduled at once with an increment to the API major version, and the old API is maintained.

I would propose a rule "all API changes must be planned in advance": if a PR comes in that makes a backward incompatible change to the API, from someone who has not been in contact with the maintainers to agree their approach in advance, it simply gets closed and the submitter pointed at the rule.

You will also need explicit versioning of the plugin API. This allows you to develop v2 while all the v1 plugins continue to build and work.

I would also question a bit more why so many core refactoring and API changes are being made. Are they really necessary or just people imposing their personal taste on the project?

pjc50
  • 10,595
  • 1
  • 26
  • 29
2

Sounds like the CI process needs to be tighter, more comprehensive and more visible to contributors before they raise a PR. As an example, BitBucket have a pipelines feature that allows this, where you give it a file that defines in code the CI build process, and if it fails, the branch is prevented from being merged.

Regardless of the technology, providing automatic builds when a contributor pushes to a branch will give them much quicker feedback of what gotchas to look out for when making changes and will lead to PRs that don't need fixing up after the fact.

Design issues would be good to fix, but are orthogonal to this problem.

Nathan Adams
  • 179
  • 5
2

The code works, they're happy, and then the CI finishes building and the problems start: compilation failed in a consortium-maintained plugin, that the contributor did not build on his/her machine.

That plugin might have dependencies on third-party libraries, such as CUDA for instance, and the user does not want, does not know how to, or simply can't for hardware reasons, compile that broken plugin.

Your solution is simple: lower the barrier to contribution.

The simplest way to (1) speed up the edit-compile-test cycle and (2) smooth environment differences is to provide build servers:

  • Pick up beefy machines: 24, 48 or 96 cores, 2GB RAM/core, SSD, to speed up compilation.
  • Ensure they have the right hardware: FPGA, Graphic Card, whatever is needed.
  • Create a Docker image with all the necessary software libraries pre-installed.

And then open those build servers to contributors. They should be able to remotely login in a fresh Docker image, and remotely edit-compile-test on this machine.

Then:

  • They have no excuse for not building/testing the maintained plugins: they have everything available.
  • They do not have to wait for lengthy feedback with CI-driven PRs: they have incremental compilation, and the ability to debug (rather than guess).

In general, build servers can be shared across multiple contributors, however when special hardware peripherals are involved it can be necessary for a contributor to use said peripheral by themselves.


Source: working on software using FPGAs, given the price of the beasts, and the variety of models we need, you don't find each model of FPGA installed on every developer's machine.

Matthieu M.
  • 14,567
  • 4
  • 44
  • 65
1

If contributing to the core without changing any contract can break dependent software, it suggests that either:

  • The contracts of your interfaces may be ambiguous. Maybe adding attributes on your functions and function parameters would help in exposing additional constraints to client code to make the contracts clearer. Or if you’re applying contract-breaking changes, maybe adopting semantic versioning can help.
  • The unit tests are not covering enough of the possible call scenarios.

Either problem should be easy to solve, but you mention the core team might not have the capacity to do so. One option would be to ask the community for help in addressing the issue.

jcayzac
  • 111
  • 3
1

No one else seems to have raised this as a potential solution.

  • list all plugins that you can access.
  • run all the tests that these plugins define
  • record all request/responses/interactions between the core and all plugins
  • store those recordings, these are now rough compatibility tests.

When developing the core, encourage developers to run these compatibility tests. If they fail do not check in.

This will not 100% ensure compatibility but it will catch a lot more issues and early.

A secondary benefit is that these recordings can highlight which interfaces are actively used, and what features are being actively used.

Kain0_0
  • 15,888
  • 16
  • 37
0

I'm having trouble understanding the situation as it appears to be: The CI only builds one branch?

Is there a reason you can't build more than one branch with the CI?

The simplest solution to this problem would be to make it possible for any contributor to run the CI build on his/her feature branch.

Then you simply require a successful CI build on the feature branch in order for that branch's pull request to be accepted.

Kyralessa
  • 3,703
  • 2
  • 19
  • 20
  • This seems to sum up the issue. – Fattie Aug 09 '19 at 14:48
  • 1
    The Question says "Or the contributor [...] changes the code, pushes on his/her branch, waits for the CI to finish compiling, usually gets more errors, and reiterates the process until CI is happy" - so I think this is already the case, but the problem is that it's somewhat painful to develop with such a long edit-debug cycle. – npostavs Aug 09 '19 at 15:11
  • @npostavs Thanks, I guess that's what I missed the first time or two I read it. Even so...I guess I don't seem the problem. There are a lot of dependencies, they can't be broken, so a contributor needs to stay compatible with all of them. That's the nature of big software. Certainly work could be done to make the build faster, perhaps, but otherwise, what shortcut could there be? – Kyralessa Aug 09 '19 at 19:39