29

I'm looking for a good analogy or metaphor that could illustrate the problems of copy-paste programming to non-programmers. I occasionally do code/system reviews for potential clients, and one of the common problems I see are vast amounts of copy-paste code all over their code bases. It's something I routinely call out in the reviews, and each time I have to explain why this is a problem (this is especially difficult with clients who know just enough about programming to understand that reuse is a good thing, but not enough to understand why copy-paste isn't a good form of reuse). Obviously, I can (and do) explain the problem in terms of code maintenance, but it would be nice to have a good, concise analogy for this problem that would hit home with non-programmers. Bonus if the analogy illustrates why search-and-replace is not an effective solution for this problem. Any suggestions?

Just to clarify (based on Jaroslav's answer below) - I'm not talking about using code snippets here; what I see (disturbingly often) is copying-and-pasting of vast swaths of code, or a ten-line piece of code to get some user data (complete with inline SQL query) pasted into dozens of PHP or ASP.NET pages. So, duplicate code from elsewhere in the same project.

Update: There are several really good answers here; I've explained in the comments why I chose Scott Whitlock's answer, but I would also highly, highly recommend whatsisname's answer if you're dealing with customers who are familiar with manufacturing at all.

E.Z. Hart
  • 1,079
  • 8
  • 12
  • Hmmm, that's a tough one. It doesn't translate well to classic car/building/factory analogies..... – whatsisname Mar 29 '11 at 16:09
  • 3
    Imagine having references to the Republican and the Democrat party in US common law, and then renaming one of the parties while adding a third ... many of the laws will have to be rewritten. – Job Mar 29 '11 at 16:09
  • How about the analogy of: copy-pasting code (insecure, bad structured, etc.) that you don't understand from wikis, forums, etc. is like opening e-mail attachments (viruses, spywares, spams, etc.) from third-parties? – sakisk Mar 29 '11 at 16:34
  • @faif: Copy-pasted code isn't necessarily garbage code. It could be good code the guy in the office next to you wrote. The problem with copy-pasted code is that it very quickly becomes an unmanageable maintenance/debugging nightmare. – whatsisname Mar 29 '11 at 16:36
  • @whatsisname That's why I said **that you don't understand** – sakisk Mar 29 '11 at 16:48
  • 1
    @faif: then zap the parenthesized section – whatsisname Mar 29 '11 at 16:50
  • @Hart - Why do you say explain to non-programmers? Surely the guys writing the php/asp pages are programmers? – apoorv020 Mar 29 '11 at 19:24
  • Have them watch the Michael Keaton movie, [Multiplicity](http://www.google.com/url?sa=t&source=web&cd=1&ved=0CBQQFjAA&url=http%3A%2F%2Fwww.imdb.com%2Ftitle%2Ftt0117108%2F&rct=j&q=michael%20keaton%20%20movie%20clone&ei=SjuSTYCwDeWa0QHjyenMBw&usg=AFQjCNH0W2iK54midEvMmJeg7LV2zbn2yA&cad=rja). – oosterwal Mar 29 '11 at 20:05
  • I use the `cut-and-paste` method of programming. If I need to reuse it, i cut it from the method, and paste it into its own method with generalized parameters. – zzzzBov Mar 29 '11 at 21:45
  • @apoorv020 - It's not the programmers I have to explain it to; it's their supervisors/bosses/CEOs. I have to explain to them what their programmers are doing wrong. – E.Z. Hart Mar 30 '11 at 07:21
  • I'll give a somewhat simple real world example from my own life which shows why copy-pasting can be bad. Also see the comments below the selected answer - https://stackoverflow.com/questions/23285572/insert-5000-records-in-sql-server-2008-with-query I am only a learner and luckily the inefficiency of my code was pointed out. I left my answer for the benefit of the community. Imagine the consequences if your developers do these kind of things repeatedly. Perhaps you could give them a road map so that they just know the RIGHT code to copy paste. – Erran Morad May 11 '14 at 06:46
  • Ignore the small amount of code in my link in the above comment. The main concept is that you should avoid loading data into a database row by row. Do it in bulk whenever possible. A person who does not know this might copy paste code to do a row by row load which can be VERY slow. – Erran Morad May 11 '14 at 06:49
  • @E.Z.Hart - Here is a lame analogy I can think of - Guy X did not know that you can use a hammer to insert a nail into a wall. A guy Y used a heavy metal pipe for nails because someone borrowed his hammer. Guy X asked how to bang a nail and was given the pipe. X now foolishly carries a heavy metal pipe along with saw, chisel and nails to all his projects. The metal pole is analogous to bad quality copy pasted code. – Erran Morad May 11 '14 at 06:58

17 Answers17

39

Imagine you are designing an aircraft. You've got a single engine jet. It sells well. Now you are going to design a 4 engine aircraft for long hauls across the ocean.

Now, you don't create a full set of engineering specifications and drawings for each individual engine, do you? No, you use the same engine in all four places. Now imagine if you had 4 sets of drawings, and you have to change something. Now you must change it in all four engine drawings. What happens if you accidentally forget to change something in the 4th engine because you were spacing out?

So say you are changing the length of a screw, or a pipe threading. Now you can't just "search and replace" on your database of engineering drawings, you might accidentally change the mounting screws in the fuel pumps because they happened to be the same size. Or the hydraulic line powering the tail rudder used the same thread, but now it's different and you can't power the tail anymore.

Now imagine you get hassled by the NTSB because your engines randomly throw turbine blades and explode while flying south of Florida. Now which engine drawings do you look at? All of them, one of them? How do you know that all four are the same? Perhaps the corrections are made, but they are only applied to engine one, because the guy who designed the engines left a year back to play in a reggae band and was the only one who remembered that the four engines are in separate files, and the guy who fixed the exploding turbine was his replacement.

Copying and pasting code is analogous to having duplicate drawings of component parts, whether its a screw or an engine. You want to abstract components down to fundamental pieces that are reused as much as possible.

Don't duplicate the engines, just write the code that mounts the engines to the wing.

whatsisname
  • 27,463
  • 14
  • 73
  • 93
  • 12
    Now, imagine that you find number 4 engine is different from the other three. Was this difference intended? Is is designed to counter a certain torque issue caused by turning left immediately after takeoff? Or was it a mistake in copying? – David Thornley Mar 29 '11 at 16:37
  • 5
    Great analogy... but if someone has difficulty understanding copy/paste code... jet engines might be just as difficult :) – Steven Evers Mar 29 '11 at 16:47
  • You should talk about solid fuel rockets instead of jet engines for this analogy. That way, you can finish with, "See? Just like in rocket science." – detly Mar 30 '11 at 04:08
  • This is not an analogy. Blueprints are literally code for mechanical artifacts. – intuited Apr 03 '11 at 09:30
38

It's like this... you have one clock in your house. Great! You know what time it is, but you always have to go to that one room to look at it.

But of course you want to know what time it is without going to that room all the time, so you buy some more clocks, and you distribute them around your house. Each of these clocks are independent. They all keep their own time. This means:

  • When the time changes due to daylight savings time, you have to change all of them
  • Even when they're all set, they're all a bit different and rarely agree perfectly. Over time they drift.

Now imagine the same problem in a large facility with dozens or hundreds of clocks. That's why you need something like this networked clock that keeps itself in sync with a central time base. That way the time is defined once and only once.

Copy-paste programming is like buying more independent clocks. It doesn't scale.

Scott Whitlock
  • 21,874
  • 5
  • 60
  • 88
  • 1
    I picked this answer because I think it works best for the situations I'm usually in - most of the software I look at is for people in the service sector, and manufacturing analogies are often difficult for them to comprehend. But pretty much everyone has multiple clocks in their house. I also like it because I can use that fact that each of the clocks in your house probably has a different process for changing the time (and is fast/slow by a different amount) as a way to explain why search-and-replace isn't an option for maintenance of copy-paste code. – E.Z. Hart Apr 02 '11 at 15:44
7

You have to explain it in terms of sharing the same resource versus duplicating the same resource.

For instance, would it make sense for every house in a big city to have a dedicated power station providing electricity to the house or would it make more sense that every house shares the same power station? If something goes wrong with a particular component used at the power station(s) and repairs are required, it would be easier to make the repairs in one place and everyone benefits from these repairs versus making the repairs at each dedicated power station and only each house benefits individually.

Bernard
  • 8,859
  • 31
  • 40
7

"Hey Look all surgery is somewhat similar right?, so you wouldn't mind if I randomly copy surgical instruction for different procedures from different surgeons for your operation?"

Darknight
  • 12,209
  • 1
  • 38
  • 58
6

Copy and paste is like trying to manufacture parts without a mold. Its slow, and you'll get a one-time use from each part, since once it's determined to be defective or broken, you can't just fix the mold to create a suitable replacement.

In the search for an analogy, first we have to consider the dangers of copy and paste programming:

  • Bugs introduced because the copy isn't an exact fit (unnecessary variables and code paths not cleaned up)
  • Increased testing requirements — abstraction helps remove the need for regression testing as you test only what you changed, and you only change the leaves, not the branches.
  • Duplication duplicates everything, bugs included. Every bug fix, or feature that applies to both sections of code now costs twice as much to implement and there is a high likelihood of forgetting it completely.
  • Search and replace exacerbates the above problem, since you can't easily find the duplicated code.

The main weapon in the fight against copy and paste programming is abstraction. So to find a good analogy, look for examples of abstraction in the world around us.

Abstraction is based around the idea of setting up definitions and then proceeding to use those definitions in execution. What would the world be like without definitions?

  • Definitions are a key part of legal language. Imagine a contract that had no core definitions but fully defined every term every time it was used.
  • Definitions and templates are used in construction. A common problem in construction is making each new cut based on the last rather than on a single measurement taken at the beginning. This can result in wildly varying lengths over time.
  • Company organization is based around abstracts and definitions. What if every time your company had to expand, they had to define the new role from scratch? That wouldn't work. So what if they decide to just pick a similar job role and slightly modify it to suit. Everyone would be locked into place because it would be impossible to move resources around.

Copying only has a place when the piece being copied is permanent. Otherwise, every copy makes a whole new branch to be dealt with - tested, maintained, and upgraded separately.

Abstraction fights this by tying all the branches together into one trunk, and isolating modifications to smaller branches or even leaves.

Nicole
  • 28,111
  • 12
  • 95
  • 143
  • 2
    I like the mold analogy, the rest, I am afraid, are not going to help much with non-tech users. – Matthieu M. Mar 29 '11 at 17:32
  • @Matthieu - I don't know if you are referring to the first bullet points, but I wasn't saying those were analogies, I was describing what I think to be the thought process for a developer to think of good analogies. – Nicole Apr 02 '11 at 19:20
4

I think you are talking about duplicate code, not copy pasting(using snippets and similar).

Here is an analogy from a history book, that illustrates it very well. Before Gutenberg's press the monks were sitting and writing the books by hand and rewriting the same book over and over again. The books that monks wrote, were often with bugs and thanks to Gutenberg this problem was eliminated.

Another analogy: cash machines. You have one cash machine that can serve various cards and always serves them well. Duplicating code creates different cash machines, so everybody would have to go to a different one and sometimes the machine would even give you a BSOD.

There is an awesome article about copy pasting from Jeff http://www.codinghorror.com/blog/2009/04/a-modest-proposal-for-the-copy-and-paste-school-of-code-reuse.html

P.S I know there was printing press before Gutenberg.

Marcus Maxwell
  • 409
  • 2
  • 2
2

To non programmers I assume we are talking business people, so I would be brief and involve the realities of money.

  1. Every line of code cost you money ( whether written or copied )
  2. Every bug cost you much more that every line.
  3. Every line of code add potential bugs
  4. Duplicated code = duplicated bugs
  5. Duplicated bugs are almost Never found on in the same testing cycle.

Cut and paste = Burning Money.

Stephen Bailey
  • 2,236
  • 14
  • 14
1

There are also security and code integrity concerns.

As demonstrated here, it is possible to embed malicious data in unicode characters that are transferred to the clipboard.

Depending on how your editor responds to unicode characters, this may result in unexpected changes of your sourcecode, unexpected compiler outputs or some things I haven't thought of yet.

makerofthings7
  • 6,038
  • 4
  • 39
  • 77
1

Can I not answer the question but say you really don't need an analogy here, and trying to find the right analogy for each development idiom or pattern seems perverse and is often counter productive. It's like trying to do yoga with flat feet...

There are a few reasons why copy/paste leads to problems, it propagates existing bugs into newly pasted areas, in some environments where it used to be considered a performance enhancement, it's actually now slower (I can provide examples if anyone is interested, but it comes down to JIT and also do you really think you're smarter than a modern compiler?).

It shows the developer is either lazy or selfish or both. If this is a battle you're fighting in a team at the moment, depending on your position in this team (team lead/jnr dev, snr dev, what ever) you need to get it fixed, possibly by arbitration within your organisation.

EDIT: In light of the comment below, that this is code reviewing third party code on behalf of the third party (or maybe even a fourth party :) ) There are some useful things I can add hopefully.

First, when the code was produced for the third party, did they have any metrics in place? Lines of Code (LoC) for example.

I still think some of what I said above still counts. I should probably have also asked what was the goal of the review. If it's to get a quote to maintain it, or replace it you have to ask a whole lot of different questions.

Either way, you're assessing the quality of code, well, copy any paste falls under the category of "Developer showed adequate understanding of abstraction and/or program flow control design":

Comment: Developer failed to show any understanding of abstraction, and their approach to program flow control was prone to errors. You could introduce "Cyclomatic complexity" here. It's actually quite easy to understand, and in a round about way I think I might have found an answer :D Yay for me.

Ok Cyclomatic complexity is like this. You have a map. It has your start position and every possible destination. It doesn't have to be lots. Think, car park, cafe, toilet. Cyclomatic complexity is a measure of the number of different routes there are to get to your start position to any of the destinations.

Copy and pasted code will probably increase cyclomatic complexity because it will include repeated logic that could have been abstracted out into it's own named block (or method).

Seem reasonable?

Ian
  • 5,462
  • 22
  • 26
  • To be clear, this is code that other organizations have written, and it's being brought to our organization for review. So it's not battle within my organization, but something I need to make people (non-programmers) from another organization understand. – E.Z. Hart Mar 29 '11 at 16:47
  • That's useful to know and makes it a lot easier for me to be useful hopefully :) I'll add an edit. – Ian Mar 29 '11 at 18:01
  • Sorry, long edit, but I think the tldr is copy and pasted code is a code smell which indicates an increase in cyclomatic complexity (amongst other things) and cyclomatic complexity is very easy to describe using a metaphor being single faceted. – Ian Mar 29 '11 at 18:15
1

Take an English word for something. Now imagine every time you wanted to describe that thing, you used the complete dictionary definition instead of just the word. How easy would it be for others to understand you?

I form a mental image of something that is not present or that is not the case (imagine) it Indicating an action or state that is conditional on another; Simple past of will. Indicating futurity relative to a past time. Indicating an action in the past that happened repeatedly or commonly (would) be quite not easy; requiring great physical or mental effort to accomplish or comprehend or endure (difficult).

It also wouldn't hurt to show an actual before and after example of real code that has been refactored to remove duplication.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
0

Let's say you have 5 girlfriends (you sly dog you) and you wish to send all of them a valentine's message. You type the first letter, adds her name and mention something memorable you guys shared. You then copy and paste the letter four times, each time missing an instance of girlfriend #1's name with copy and paste because you made a typo. Now, 4 of your five girlfriends are on their way to girlfriend #1's house.

0

There are a couple of different routes I could see taking here:

  1. Plagiarism - Some may remember this from school where theft of intellectual property is a big no-no. The copy-paste programming can be just like this as someone may not understand the source or what gotchas may come from using a particular solution that was just blindly copied and pasted without analyzing how well does this work and understand why this may or may not be an effective solution to the problem.

  2. Blindly following directions - Most people would probably have had experiences having to get to some place they haven't been previously. Some may have used MapQuest or Google Maps to find a place and then follow the directions given. There have been stories of people getting lost or just not finding where they were supposed to be even though the software gave specific instructions of how to get there. This is the other big danger of copy-paste is that it is just like if someone just handed you the directions to get from A to B without letting you see any map of the area that may make a trip slightly harder. If that doesn't seem hard you could up the ante by asking the person to get from A to B wearing a blindfold so that they have to rely on other senses to determine which direction they are facing and get to a target.

Data, Information, Knowledge and Wisdom may be a good model that could be referenced to show why search and replace isn't effective as a solution because the copy and paste is very mechanical and without a lot of thinking so that the data transferred may be without the knowledge and wisdom of using it properly. One could look at nuclear energy for examples of how understanding the difference can be quite powerful. Contrast a nuclear reactor with a nuclear bomb in terms of safety and use to see how knowing just what goes where isn't enough to safely harness the power of the atom.

JB King
  • 16,795
  • 1
  • 40
  • 76
0

Imagine you have a group students and a set of rules for the school. Instead of posting the rules in a common place all the students must reference you hand each one a copy of the rules. Each student is told they must follow their copy of the rules to the letter.

Now modify one of the rules saying that in the case of a disaster you should go to the new disaster shelter. You have to go to each student and modify their set of rules. If one of the students gets missed and a tornado hits the student will go to the old place and die a horrible death.

ElGringoGrande
  • 2,913
  • 22
  • 20
0

Someone sends you an email with an attached document template. Feel free to keep using it until the template changes. Don't worry, they won't forget to send you a refreshed copy.

JeffO
  • 36,816
  • 2
  • 57
  • 124
0

The CoCoMo cost model.

http://en.wikipedia.org/wiki/COCOMO

Effort Applied (E) = a*(KLOC)**b, where b > 1.0

That exponent means effort to build/maintain/support/rewrite grows faster than the number of lines of code.

S.Lott
  • 45,264
  • 6
  • 90
  • 154
0

There's another important aspect to this bad practice which no one took into consideration yet: by blindly copying (full or partial) code from somebody else (without their permission) you might be breaking copyright laws.

karlphillip
  • 1,548
  • 2
  • 13
  • 25
0

The copy-paste coding I see is one where the developer doesn't understand or want to reason out what they're doing, and copies together different parts that already do "more or less" what they need, randomly jiggling them at the end to make them fit together.

There are three major problems with that:

  1. It never results in bug-free code. Ever.
  2. If they didn't understand the code while writing it, they could never figure it out while debugging. Only someone else can clean up the mess they made, at additional cost.
  3. If they avoid thinking about the code they're writing, they avoid learning. If they avoid learning, they'll never be a good programmer. If they're never going to be a good programmer, why are they on your team?
Joeri Sebrechts
  • 12,922
  • 3
  • 29
  • 39