39

Given a repository which contains two different applications A and B (e.g. bootloader and RTOS), is it ok to copy source code from A to B in order to avoid dependencies (include's, adding A source files to the B compilation) between them, so they stay completely independent both at build-time and runtime?

Note: In addition, let's suppose that the logic to be copied from A is private (that is, it's only meant to be used by certain internal functions in A)

Dan
  • 615
  • 5
  • 7
  • 3
    It is fine with me if after copying, the true dependency has been removed, which is to say you have not substituted an explicit dependency for an implicit dependency, as that would be worse. – Erik Eidt Oct 05 '20 at 15:53
  • It's Fine, But you can find a common way for that. – Ishan Shah Oct 06 '20 at 04:36
  • 5
    My rule of thumb: Copy and paste if you're reusing the code for the second time, refactor into a library/function/class/whatever if you need it for the third time. – cyco130 Oct 06 '20 at 05:25
  • Any code copying should be heavily automated and not a manual procedure – Thorbjørn Ravn Andersen Oct 06 '20 at 11:11
  • Heads up, sometimes this concept is called ["vendoring"](https://stackoverflow.com/q/26217488/1858327) :) – Captain Man Oct 06 '20 at 23:00
  • 2
    Youre violating the DRY principle - Dont Repeat Yourself – vikingsteve Oct 07 '20 at 08:23
  • If the logic is only meant to be used by certain internal functions in `A`, why do you need to copy it in `B`? – jcaron Oct 07 '20 at 14:50
  • We did it. One of our applications was being rewritten to cloud. The old non-cloud version is still being maintained but not actively developed. We just copied the whole app to a new repository, including change history and started new implementation. There is nothing wrong about that if you have a good reason. – Sulthan Oct 07 '20 at 20:59
  • It sounds like you are asking if it is okay to fork code. Yes, of course it is okay to fork code. Before you do so, make sure you understand why you want to fork the code, and if you intend to ever merge them. The approach you are using would make merging them later difficult. But maybe that's okay. If you are starting a new project, and you have no intention of keeping compatible with the old project, and you aren't maintaining it and don't want changes to the new code to be backported - then yes you are forking the code. – Moby Disk Oct 08 '20 at 00:41
  • While making a separate library that would be a dependency of both applications seems a lot of overhead, I think mid to long term you will get the benefits of it. It's very rare that only 1 such functionality will need to be duplicated. It's also very rare that it will be only copied once, becauyse sooner or later somebody will need the functionality. Better start now... – Laurent S. Oct 08 '20 at 15:53

8 Answers8

104

It is acceptable if the copied code can change independently from the original code.

If you are copying code and every future change has to be maintained in two different code bases, you could better create a shared library. Then both applications have a dependency on the library, but not on each other.

Rik D
  • 3,806
  • 2
  • 15
  • 26
  • 4
    Ideally (although real world complexities get in the way), the shared library could be generalized at least to the point where the two clients' needs can be met without needing to divergently specializing the copied code. – Alexander Oct 05 '20 at 13:43
  • If you don't want a shared library, you can also use symlinks to include the shared code into both projects. In this case you get most of the benefits of a single codebase, without any includes or references between the two codebases. – Falco Oct 06 '20 at 07:33
  • 6
    I think **shared** library is the wrong term as it refers to dynamically linked libraries (.so/.dylib/.dll), while it's also OK to have a static library. – ljrk Oct 06 '20 at 07:43
  • 2
    @ljrk Or even an included file. – user253751 Oct 06 '20 at 09:14
  • 8
    The proviso of "can change independently" isn't strong enough. I'd say this sort of copy/paste is only acceptable if 1) the copied code _will_ change independently and 2) updates to the original code _will not_ be incorporated into the copied version. Or in other words, only if any and all relationship between the copied code and the original is completely severed at the time of copying. – aroth Oct 06 '20 at 09:36
  • 8
    A shared library may introduce additional complexities and overhead, which may overshadow the usefulness of centralizing the code. Would you introduce a shared library to avoid duplicating 10-20 lines of code? – Thorbjørn Ravn Andersen Oct 06 '20 at 11:13
  • 2
    @aroth Taking that kind of thinking to the extreme causes a trivial file to become a shared library with a dozen different switches and configuration options. Also creating a shared library for every small feature has a non-trivial performance overhead as well both in terms of size and startup performance. – Voo Oct 06 '20 at 13:02
  • 1
    @ThorbjørnRavnAndersen It depends on the programming culture. In the Perl and Node.js world, yes. Plenty of CPAN and npm modules are just 5-10 lines of code long but people publish them and others use them anyway because they avoid mistakes and/or encapsulate tricky code in a tested implementation – slebetman Oct 06 '20 at 15:00
  • @slebetman And they introduce additional complexities and overhead which usually overshadow the usefulness of centralizing the code. – user253751 Oct 06 '20 at 15:04
  • @user253751 "Usually"? I can't get behind that. Everyone in the js world is using npm modules anyway, so there's not really much additional complexity to worry about here. – corsiKa Oct 06 '20 at 15:49
  • 3
    @corsiKa And luckly, the npm modules system is robust enough that an upstream maintainer can't introduce a bug, or withdraw a library, and the entire house of cards comes tumbling down. As anything else would be additional complexities, which obviously aren't introduced by this style of inclusion of code. – Yakk Oct 06 '20 at 17:34
  • 4
    @Yakk, I thought that very exact thing happened a couple years ago with the left-pad library that vanished from NPM without any notice, breaking a ton of widely used libraries that depended on it, because its creator had a disagreement with the NPM folks. – A. R. Oct 06 '20 at 19:32
  • 2
    @AndrewRay Obviously not, because huge graphs of micro-dependencies don't cause additional complexities. It must never have haappened, nor could less popularly known related instances (like upstream code injecting exploits) happen. – Yakk Oct 06 '20 at 19:35
  • @Yakk I would like to strongly recommend the article [Surviving Software Dependencies](https://dl.acm.org/doi/pdf/10.1145/3347446) by Russ Cox from the August 2019 edition of _Communications of the ACM_, which they have kindly made open-access. – A. R. Oct 06 '20 at 19:40
  • 1
    @AndrewRay Yes, it happened.. and then.. nothing happened. People moved on and kept doing the same thing. Because the usefulness of using tested, trusted code outstrips any small internet drama. California and Japan don't get abandoned just because there are occasional earthquakes. Meanwhile in the C/C++ world projects still grind slowly. It's worth noting that node.js did not start this culture. It started mostly from the crypto world back in the 90s (maybe earlier but I wasn't around) where everyone would tell you to never write your own crypto library. – slebetman Oct 07 '20 at 00:34
  • 1
    ... Then the Perl community wholeheartedly embraced it with CPAN where there are libraries with sometimes just a single regexp expression being used widely. CPAN has been around for more than 20 years now and still no drama – slebetman Oct 07 '20 at 00:36
  • 1
    @slebetman If something introduces a critical vulnerability or severe bug into an OS or a bootloader, that's a big problem that will last a long time. There's more reasons than just 'language culture' for avoiding micro libraries - those tradeoffs aren't good ones in some scenarios. – Iron Gremlin Oct 07 '20 at 00:52
  • @slebetman The crypto guys would be the first people to break out into laughter when being told "hey you said we shouldn't write a leftstrip function ourselves so now we use a million packages for the most trivial things that we never review or even know about, that come from dubious sources which we don't care about and that have been used to insert malware into popular projects". – Voo Oct 07 '20 at 07:40
  • 1
    @slebetman But now your "build-from-scratch"-pipeline depends on CPAN being up and available. Unless you build an internal cache or mirror. That may be considered overkill to avoid copying a file or two. – Thorbjørn Ravn Andersen Oct 07 '20 at 08:10
  • 2
    @ThorbjørnRavnAndersen "Would you introduce a shared library to avoid duplicating 10-20 lines of code?" - absolutely, and that should never be a question. Not liking "micro-packages" or many small dependencies isn't a valid reason to avoid code reuse, nor is the possibility of third-party dependency hosts being unavailable (the latter is an easily solvable devops issue using an internal mirror, and is mostly irrelevant in an always-connected cloud world anyway). – Ian Kemp Oct 07 '20 at 12:17
  • @Voo Quality of third-party micro-packages a la NPM is entirely orthogonal to this discussion, since we're talking about something in-house. – Ian Kemp Oct 07 '20 at 12:17
  • @IanKemp Other reasons to avoid micropackages? Build performance (node build performance is simply awful), debugging complexities (now if you have a compiled language you have to worry about symbols and source servers, debug vs. release problems), shared code makes changes/new features muuch harder to implement (changing the API or behavior of a library that's used by other projects not under your control? yay), versioning challenges, oh and obviously the runtime hit both in startup performance and memory consumption when loading dozens of additional assemblies. – Voo Oct 07 '20 at 13:06
  • @Voo If the code being copy/pasted will be maintained/modified in its original location _and_ those changes are required/desired in the destination project then it's not "trivial" code, imo. Trivial code can't/shouldn't create a logical coupling between the two copies. Nontrivial code should be shared via a more robust mechanism than copy/paste. – aroth Oct 07 '20 at 14:22
  • 1
    @ThorbjørnRavnAndersen "Would you introduce a shared library to avoid duplicating 10-20 lines of code?" - Yes, if there's any chance that, after copying 10-20 lines of code from Project A to Project B, some update/maintenance to A required applying a corresponding patch to those lines in B, absolutely. Whether or not something should be modularized is less a question of SLOC and more a question of where/how the functionality is used and whether a consistent/canonical implementation is required. Continually patching snippets from A to B gets real old, real fast. And is error-prone to boot. – aroth Oct 07 '20 at 14:29
  • @aroth Yes. It is part of the balance. You may also consider that the copied bits are part of a released program where updating the code library means that a complete manual test needs to be run again. There is always pros and cons for everything – Thorbjørn Ravn Andersen Oct 07 '20 at 17:50
16

In theory it's the best practice to put any significant common piece of code in a separate library that both applications use, rather than duplicating the code across both applications.

In reality I would say the choice is a trade-off between:

  • Avoid code duplication

    Having duplicated code means there's more code that needs to be understood to understand what's going on in the applications. You also need to maintain both pieces of code, which means duplicating changes to the code. Even copying a one-line method might require both versions to be changed, whereas you can copy entire packages without ever having to change them. If some bit of code has been stable and unchanged for years, that might be a decent sign that it's not going to need changes any time soon (although that's still not a great reason to duplicate it, and extending the scope of what it's used for is a good way to find things that require changes).

    If you copy the code, but then end up significantly changing it so it doesn't resemble the original all that closely, this might be a sign that you simply have two applications that do similar things and there may not be anything you can really separate out. It may also be a sign that you need to reconsider what your classes and applications do and how you structure them.

  • A library that makes sense out of context

    If you have, for example, a general-purpose Array class, that can make sense in a library (assuming you're using a language that doesn't provide that built-in, obviously). If, on the other hand, you have some class that only makes sense given the specifics of your applications, that's not a great candidate for the library.

    Generally you want a library to have some well-defined purpose or set of functionality it provides (like say to provide common data structures). If the class just does some intermediate step that requires something each of your applications would do first, that probably also shouldn't be in a separate library.

    You also don't really want every change you make in either the application or the library to also require a change to the other because the two are too closely linked (but of course changing the public interface of classes in the library is going to require changes to applications using it).

  • The effort of maintaining a separate library

    This shouldn't matter much if the library has a significant piece of code and that's distinct from your applications.

    But if you just have like one small file in there, that's probably not going to make much sense as a separate library.

I would try to avoid having one application import from the other, unless you have a particularly compelling argument in favour of that.

Bernhard Barker
  • 548
  • 3
  • 8
10

A one-time copy is reasonable, but in my experience, if you don't set up a pattern for sharing code between builds, you will end up copying a lot more.

I used to work in a code base that used copying regularly for common code. One time I made some changes in code, but they didn't take effect. I discovered I was working in the wrong copy, so I made my changes in another place. Oops wrong copy again. That got me curious, and I found seven exact copies of that same code. Later, I did an analysis and found that a solid majority of our source files were exact duplicates of other files.

That amount of duplication didn't happen overnight, but it also took several years to fix. Common libraries you always have to think about how changes affect other builds, but having to constantly verify that you've fixed a bug in all the copies is much worse, trust me. It feels like more work up front to set up a common library, but it will save you time and hassle in the long run.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
5

Duplication is better than premature abstraction.

I have wasted countless hours in the early part of my career/hobby of programming pulling duplicate code out into a separate class, function, or module (it's DRY! it's good!) only to have to add on more and more special handling of slightly different behavior followed by Dark Places in My Code I Dare Not Tread followed by pulling the **** thing back apart again to save my sanity.

You can definitely be too DRY.

The heuristic I mostly follow now (and it is a heuristic, not a hard-and-fast rule) is the rule of 3: if something is similar/duplicated in three places in a codebase I will think about factoring it out. This is again meant to be a guide and not a substitute for thought: you still have to exercise good judgement (same as with being DRY) but you will be less likely to shoot yourself in the foot.

This warning might seem overly dire, but I think the idea that if you have to change the same thing in more than one place you will inevitably forget (i.e. DRY is good) is already in the water supply. I don't think you have to make an argument in it's favor, so I'm giving caution against the opposite extreme.

Jared Smith
  • 1,620
  • 12
  • 18
  • 4
    Your mistake was in thinking _code_ shouldn't repeat, when actually it's _knowledge_ that shouldn't repeat. This comes down to whether "this code is identical to that code" is true for a reason that constitutes the knowledge inherent in the repeated code, or whether it's an accident. – J.G. Oct 06 '20 at 18:07
  • @J.G. totally fair. – Jared Smith Oct 07 '20 at 00:29
  • 3
    Not sure if you're aware, but the rule of three is very much a software development pattern introduced by Fowler - https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming) – Ian Kemp Oct 07 '20 at 12:21
  • @IanKemp thanks for the reference. After years of reading books and blogs I don't always remember the original source of stuff. – Jared Smith Oct 07 '20 at 13:37
4

The question to ask is whether the two pieces of code really represent the same thing or they just happen to look identical.

Can you imagine needing a change in that code for one client (client as in calling code) but not for the other?

Is the same person responsible for both clients?

If we were neighbors, technically speaking we could share a wife and children. It could be most convenient. It could also become most complicated, depending on your point of view.

So as often, there is no straight answer.

Martin Maat
  • 18,218
  • 3
  • 30
  • 57
2

As I'm sure you know, code duplication is generally considered a code-smell, i.e. something to be avoided.

If possible you ideally want to break out the common code into a separate class library within your repository and have both of your applications reference that class library.

However, the shared library approach can then make things more difficult, because (amongst other things) you will need to consider how changes needed to the shared library by one of your applications may then impact on all the other applications that use the shared library.

To get round that you ideally want a programming environment where you can create versioned packages from that shared code (e.g. NPM in JavaScript or NuGet in .NET) so that each application can reference a specific version of your shared code. You can then make changes to that shared code safely and introduce those changes to one application at a time by changing which version of the shared package each application references.

(Those versioned packages would typically be published only within your organisation, not on public NPM/NuGet/etc.)

tomRedox
  • 129
  • 5
2

Let us assume the code is identical because it does the same task for the same reason, not due to happenstance. Otherwise, there is nothing to talk about anyway.

A dependency can be a heavy burden.
It increases the need for coordinating any changes, hinders tailoring to the specific use-case where appropriate, and much of it will be or become useless for any one of the projects. This is exacerbated for non-compiled code, where unused code is an especially heavy dead weight.

Managing independent duplicates is also a heavy burden.
How will you track down (or even remember you should) all of them if you fix or improve any?

In all things, balance:

  1. Is the functionality sufficiently complex and / or commonly needed?
    Put it into the appropriate common library, which need not be a SO / DLL. The overhead is worth it and the extra scrutiny is welcome.

  2. Is it small and easy enough, or should be tailored to the use-case?
    Duplication might be a code smell, but that doesn't mean it isn't the smart choice.

Take the time to get it right.
Remember YAGNI and refactoring, the fewer depend on some interface, the easier it is to change, move, remove, or replace.
Retracting an interface is much more costly than promoting one, and having to keep it around is a drain.

Deduplicator
  • 8,591
  • 5
  • 31
  • 50
2

It has been my experience that the ideal solution to this specific problem is to create a static library in the same repository as your other two apps.

This resolves MOST OF the awkwardness of maintaining library versions, and ensures the code does not diverge.

This works in scenarios in which the systems in question are tightly coupled by nature and the relative likelihood that they will encounter a different build of their counterpart in the wild is quite low.

If the systems are loosely coupled and/or can interact with different builds of their counterparts with moderate frequency, this could be a bad plan, as one becomes less incentivized to think about (and test) BW compat scenarios. This too, can be managed, but requires vigilance, and may be better supported by a typical 'versioned software package' approach.

Iron Gremlin
  • 1,115
  • 6
  • 8