57

I'm working on a software project where we have to build three APIs. One for the home banking channel, one for the agency channel and a third for the mobile channel.

The agency API is the most complete one as it has all the functionalities .. then a bit smaller Home API and then mobile API.

The architects here made a common layer (cross channel EJB services shared by all APIs). But then the APIs are different.

There is no big difference for now between the APIs. The big team started with the agency channel, and we are adapting it now for home channel. We are just enriching objects specifically to our home app. Otherwise, the code is 95% similar between APIs. The APIs is build on top of Spring MVC, and it has (controllers, models & some utilities).

Basically the controllers are doing the mapping BO to ChannelObject (it seems to me not the right place to do that), and some extra utilities and serializers. All is duplicate for now. They are saying that the reason for duplication is they want the APIs independent. "If tomorrow we want a different behaviour for home than agency or mobile we won't struggle!!"

Is there a case where we should accept duplicate code?

Mohamed Amine Mrad
  • 733
  • 1
  • 7
  • 12
  • 22
    And if three tomorrows down the road they decide they want consistent data access and representation between all the APIs ... well ... "Struggle!" – Becuzz Aug 10 '17 at 13:47
  • 26
    Duplicate code isn't necessarily a bad thing. The saying "DRY is the enemy of decoupling" cannot be emphasised enough.Having said that, designing for the future, rather than for now, really really is a very bad thing. That future almost never comes to pass. Instead, design a highly decoupled solution, covered by good automated tests for what's needed now. Then, in the future, if something different is needed, it'll be easier to change. – David Arno Aug 10 '17 at 14:14
  • Possible duplicate of [Is it always a best practice to write a function for anything that needs to repeat twice?](https://softwareengineering.stackexchange.com/questions/269882/is-it-always-a-best-practice-to-write-a-function-for-anything-that-needs-to-repe) – gnat Aug 10 '17 at 14:43
  • 2
    I think the two cases where I have never regretted duplicating code are (a) where the duplicated code is a very small and not very important part of the total, and (b) where I'm copying code from a system that is already dying into a new system designed for longevity. There are many other cases where I have wept bitter tears after forking code. – Michael Kay Aug 10 '17 at 17:58
  • 14
    Where I have worked, it has often been noted that one should (often) duplicate once, and abstract out commonality the third time. This removes the *zeitgeist* for early, possibly inappropriate, abstraction that increases coupling instead of cohesion. When future requirements are actually well understood, exceptions can of course be made. – Pieter Geerkens Aug 10 '17 at 19:48
  • 3
    One trivial case where duplicate code can be acceptable is if it is auto-generated – samgak Aug 11 '17 at 01:18
  • 1
    Sounds like it's just not designed very well if you're making tweaks to shared code in order to cater for the specifics of it's consumers. Distribute common code with nuget, and with SOLID code (more specifically open/closed) you'll be able to plug-in what you need to make it work. – JᴀʏMᴇᴇ Aug 11 '17 at 10:00
  • 1
    Every example here saying "Here's a good reason not to be DRY" looks like over time it will add to a project's complexity and require more to maintain. These days I view "Quality Code" as Dry code and "Poor Code" as duplicate code as pretty much a sole criteria. If it's hard to use but DRY I can fix it easily, if it's easy to use but there are 5 divergent copies it's an absolute nightmare. – Bill K Aug 11 '17 at 19:53
  • **[Two is Too Many.](http://www.codesimplicity.com/post/two-is-too-many/)** Though this isn't a hard and fast rule (as is explained), it's pretty close. You should have a darn good reason for duplicating code. – Wildcard Aug 12 '17 at 03:07
  • Two is too many, but there is always a cost. When you think about adding the third copy, that's definitely when you have to act because the cost of duplicates will grow faster than the cost of fixing it. – gnasher729 Aug 12 '17 at 17:59
  • 1
    I'm not sure whether Java works exactly the same way, but (to the best of my understanding) in C++ the usual way would be to write a backend with common functions that all 3 API's would call upon. If one API had to handle a case differently, the function in question could simply be overloaded in that branch's scope without breaking the backend function for the other two. Duplicating the same code three times sounds like it would be absolutely hellish to maintain. – sig_seg_v Aug 12 '17 at 22:36
  • why couldn't you have a single API with different methods / parameters when they decide they want unique functionality for the various channels - seems like a nightmare in overhead / management as it stands now – NKCampbell Aug 13 '17 at 15:40
  • [point 2](https://medium.com/@rdsubhas/10-modern-software-engineering-mistakes-bc67fbef4fc8) is a good read for this – Abhishek Bhatia Aug 17 '17 at 17:02

4 Answers4

88

Sandi Metz, a renowned software engineer and author in the Ruby ecosystem, has a great blog post and a talk where she also talks about the relationship between duplication and abstraction. She comes to the following conclusion

duplication is far cheaper than the wrong abstraction

And I completely agree with her. Let me give you more context to this quote. Sometimes finding the right abstraction is very difficult. In such cases it is tempting to just go for any abstraction to reduce duplication. But later you might find out that your abstraction doesn't hold for all cases. However, it is costly to change everything again and to go a different route (for a far better explanation watch her talk!).

So yes, for me, there are exceptional cases, where accepting duplication is the right decision, especially when you are not sure of what is to come and requirements are likely to change. From your post I take that there is much duplication now, but your colleagues suggest that this might change and to not couple both application to each other. In my opinion this is a valid argument and can't be disregarded in general.

larsbe
  • 938
  • 6
  • 12
  • 23
    Ah yes, if you cannot deduce the generalized rules, trying to abstract them can only fail. And if correspondence is coincidental, it might be short-lived. – Deduplicator Aug 10 '17 at 14:31
  • 5
    It may be costly to change an abstraction once it's in place, but getting rid of _duplication_ once it's in place can be even more trouble. Of course this depends a lot on the language etc. – modern strong-static type systems can do a good job at helping you to get even large-scale changes to some abstraction right. Keeping duplicated features in sync however is not something a type system can help you much with (because the duplicates are, _to the type system_, just different). But I suppose in dynamic, duck-typed languages it's rather the other way around; so it might make sense for Ruby. – leftaroundabout Aug 10 '17 at 15:49
70

Duplication can be the right thing to do, but not for this reason.

Saying "we might want these three places in the code base to behave different even though right now, they're identical" is not a good reason for large-scale duplication. This notion could apply to every system, and it could be used to justify any duplication, which is obviously not reasonable.

Duplication should be tolerated only when removing it would be overall more costly now for some other reason (can't think of a good one right now, but be assured there can be one - virtually everything in programming is a trade-off rather than a law).

For what you're doing, the right solution could be e.g. extracting the behaviour that is duplicated right now into a Strategy or some other pattern that models behaviour as classes and then use three instances of the same class. That way, when you do want to change the behaviour in one of the three places, you only have to create a new Strategy and instantiate that in one place. That way you only have to add some classes and leave the rest of the code base almost completely untouched.

Kilian Foth
  • 107,706
  • 45
  • 295
  • 310
  • 1
    If two things happen to behave the same, that's duplication. Though unless they behave the same for a reason, they should not be merged. Can one extract some common part of the implementations? Sometimes there are building-blocks not abstracted yet, who knows. – Deduplicator Aug 10 '17 at 14:59
  • 32
    As an example, we had a chunk of code used by three parts of our application, and it was pretty terribly written with lots of little conditional branches everywhere to cover exceptions for each of those three. *But*, and this was the important bit, it had been heavily used for two years without any significant bug reports of any kind. So when a fourth part of the application needed to do something with it, but again slightly different, the decision was made to just *not touch* the flawed code that ran flawlessly, and to just copy it and create a better-written, flexible base for the future. – KRyan Aug 10 '17 at 15:25
  • 2
    Duplication is the right thing to do when the readability costs go up too much when compared with the maintenance costs which end up being tied together. I don't think this applies at all in this scenario, and is often seen at the smaller scale in a project, not something like this. Even then it is very rare that you get into a situation where this is ever the case, it will happen if you have some niche nested duplication which would force you to use Template Method pattern or the like but in small places. This will typically end up obfuscating code. – Krupip Aug 10 '17 at 15:43
  • An example where duplication is useful (and "removing it yould be overall more costly *now*") is input valdation in web applications: We employ a first (possibly simplified) validation at the client so that the user gets immediate feedback about problems; and then we do the same (or more thorough) validation at the server because clients can't be trusted. – Hagen von Eitzen Aug 10 '17 at 18:40
  • 3
    @HagenvonEitzen But that need not even be duplication, depending on the tech stack. E.g., you might have the JavaScript check input in the browser, and then Java check on the server, so you have duplicated functionality. But if you were to run Node.js on the server, you could use the same JavaScript validation in the browser and on the server, eliminating the duplication. You still want the code to *run* in multiple places (a trusted and an untrusted environment), but the *code* doesn't necessarily have to be duplicated. – Joshua Taylor Aug 10 '17 at 20:41
  • A real-life example I have been a part of: We had an API which should, in the future, diverge into two APIs. We made a conscious choice to keep two duplicate APIs during development. Why? Because we expected that the project would be shelved and unshelved before this feature was needed. When it was unshelved, it would be handed to a junior developer to make these changes. The duplicate APIs were a not-so-subtle form of documentation of this intent to have two APIs. If we had perfect knowledge transfer, it would be a waste, but perfect knowledge transfer is just an illusion. – Cort Ammon Aug 10 '17 at 21:03
  • There are other special reasons why code duplication is necessary. I once wrote a GDB debug stub for an RTOS. The first time I set a breakpoint in a stopped task, the entire machine was brought down to its knees. Why? Well, turns out most RTOS tasks are stopped in a take-semaphore wait function. But what was the breakpoints list protected with? A semaphore... The golden rule of a debugger is that _A debugger must be **outside** that which it debugs_. – Iwillnotexist Idonotexist Aug 13 '17 at 00:19
34

If people start reasoning about design with the words "if tomorrow", this is often a big warning sign for me, especially when the argument is used to justify a decision which includes extra work and effort, for which noone really knows if this will ever pay off, and which is harder to change or revert than the opposite decision.

Duplication of code reduces the effort only for a short term, but it will increase the maintainance efforts almost immediately, proportional to the number of duplicated lines of code. Note also that once code is duplicated, it will become hard to remove the duplication when it turns out this was the wrong decision, whilst if one does not duplicate code now, it is still easy to introduce duplication later if it turns out sticking to DRY was the wrong decision.

Said that, in larger organizations, it is sometimes beneficial to favor independency of different teams over the DRY principle. If removing the duplication by extracting the 95% common parts of the APIs two a new component leads to a coupling of two otherwise independent teams, this might not be the wisest decision. On the other hand, if you have limited resources and there will be only one team maintaining both APIs, I am sure it will be in their own interest not to create any double effort and avoid any unnecessary code duplication.

Note further it makes a difference if "Home" and "Agency" APIs are used by fully different applications exclusively, or if one might try to write a component build on top of those APIs which can be used in a "Home" context as well as in an "Agency" context. For this situation, having the common parts of the APIs exactly identical (which you can only guarantee if the common parts are not duplicated), will make the development of such a component probably much easier.

So if it turns out there will be really different sub teams, each one responsible for each of the APIs, each one with a different schedule and resources, then it is time to duplicate the code, but not "just in case".

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • 3
    Even bigger warning sign than "if tomorrow" for me is "That would never change". – abuzittin gillifirca Aug 11 '17 at 06:39
  • @abuzittingillifirca: and what has this to do with the question or my answer? – Doc Brown Aug 11 '17 at 06:41
  • 1
    I'm challenging your first paragraph. – abuzittin gillifirca Aug 11 '17 at 07:04
  • 2
    @abuzittingillifirca: well, read the question again: it is about reasoning with an "if tomorrow" argument for justifiying a decision for duplication which is *hard to revert*, so making the software actually harder to change for certain cases. It might be a little bit counterintuitive, but the best way to keep software changable for the future is not to make any (probably wrong) assumptions about the future, but to keep the software as small and SOLID as possible. – Doc Brown Aug 11 '17 at 07:38
14

Duplication to prevent coupling. Let's say that you have two big systems and you force them to use the same library. You may be coupling the release cycle of both systems. This may not be too bad but let's say that one system needs to introduce a change. The other needs to analyze the change and may be affected. Sometimes it may break things. Even if both parties are able to coordinate the changes it could be a lot of meetings, going through managers, testing, dependencies and the end of the small autonomous team.

So you are paying the price of duplicated code to gain autonomy and independence.

Borjab
  • 1,339
  • 7
  • 16
  • So this is inter-component duplication to avoid coupling. But you would still want to avoid intra-component duplication, right? – TemplateRex Aug 11 '17 at 19:59
  • Well, yes, you want to minimize duplicated code as it will make your code easier to understand and modify. But remember that there are good answers with viable exceptions. The right abstraction to avoid duplication may be hard to find. – Borjab Aug 13 '17 at 19:02