6

This often happens in my projects. Sometime I have this part of code that is very similar to this other part, yet a few lines makes it complicated to keep the code clean and without duplication. Here is an example of a recent project.

I am working on a program which can do 2 things : A will generate an image projected in some way, and B will generate several images in some other projection.

Choice A:

void generateImage(width, height)
{
    vector<Pixel> pixels;

    for(x = 0; x < width, x++)
    {
        for(y = 0; y < height; y++)
        {
            Position pos = projectPixel(x, y);
            Pixel p = someCrazyFunc(pos);
            pixels.add(p);
        }
    }
}

Choice B:

void generateImage(width, height, images)
{
    vector< Pixel > pixels;

    foreach(image in images)
    {
        for(x = 0; x < width, x++)
        {
            for(y = 0; y < height; y++)
            {
                Position pos = projectPixel2(x, y, image);
                Pixel p = someCrazyFunc(pos);
                pixels.add(p);
            }
        }
    }
}

As you can see, that's pretty much the same. The only difference with A is that we dont compute pos the same way, and that we need to generate several images.

In practice, this is a bit different since there is more code. Basically, there is more loops because we need to compute all the pos of a loop before calling someCrazyFunc, and because we compute images by slices (first 1000th row, then 1000 more, etc). I'm mentioning this just to say that this isn't just 4 lines which get duplicated, but loops and function calls.

I fixed it this way:

void generateImage(width, height, images, projectPixelFunction)
{
    vector< Pixel > pixels;

    foreach(image in images)
    {
        for(x = 0; x < width, x++)
        {
            for(y = 0; y < height; y++)
            {
                Position pos = projectPixelFunction(x, y, image);
                Pixel p = someCrazyFunc(pos);
                pixels.add(p);
            }
        }
    }
}

So basically for A numImages will be equal to 1, and the old projectPixel function will take a dummy parameter.

I don't think this is an acceptable solution to the problem. I honestly even prefer to have 2 mostly identical functions. But as the complexity grow, I'll either have a lot of duplicated code, or a lot of code with weird solutions.

How do I get over this? What am I not considering?

Edit: Just to be clear, I'm not against the use of functions as parameters, what I don't like is the use of a dummy parameter no has no purpose in the context of choice A.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
Aulaulz
  • 289
  • 1
  • 7
  • 3
    I don't think this is the right way to apply dry. I would call the Choice A version from inside the outer loop in the Choice B version. – James McLeod Nov 19 '17 at 14:26
  • 7
    "I don't think this is an acceptable solution to the problem" - I disagree, this makes a lot of sense to me, and I would not prefer to have 2 mostly identical functions. It seems you only have objections because you did this not often enough to get used to it. Apply functional solutions more often, and your objections will vanish. – Doc Brown Nov 19 '17 at 15:06
  • 3
    Please explain WHY you think parametrizing by the projection function is not an acceptable solution. Without that crucial information, this question is unclear and we can't tell you how to get over this or what you are not considering. – amon Nov 19 '17 at 17:56
  • Just to be clear, I am not against using a function as a parameter. What I was against was the dummy parameter. However using lambda capture it sounds like this isn't a problem anymore, as someone answered below. – Aulaulz Nov 20 '17 at 06:11
  • 1
    It might help us to know what language you're writing in as features of the language could affect what patterns you may be able to use. – Danikov Nov 20 '17 at 16:24
  • Follow up on @Danikov comment: does you language support varargs or default parameters? That alleviates the dummy parameter concern. – user949300 Nov 20 '17 at 18:57
  • I was more thinking about functional programming and/or streaming paradigms, they could significantly change the way I'd implement the given functionality – Danikov Nov 21 '17 at 09:57
  • 1
    Unrelated to the question at hand but check your loop orders, to use rows as slices the inner loop must be width. This is the most cache-friendly ordering in c/c++ too. – Wes Toleman Nov 28 '17 at 10:39

6 Answers6

19

DRY absolutely does not mean "use minimum number of lines possible", or "do not write code that looks like other code"

DRY refers to having code that does the same thing in two different places. But same doesn't mean "code looks the same" but rather "does the same conceptual task". How the code looks is irrelevant, what it does is what is important. Don't do the same thing in different places.

Sometimes you'll run into code that started out as "accidentally" the same, where two different concepts had the same basic implementations, and then the programmer added additional conditions to handle the differences. This isn't a huge problem, until the requirements start drifting and the code block becomes more and more conditions mixed up between one task and another.

In the opposite direction, there's always exactly one-bajillion different ways to express a given concept -- don't let that fool you into thinking different code means it is a different concept. Just because it's implemented with lambda predicates in one place and a for loop and condition branches in another, does not mean you are not repeating yourself.

jmoreno
  • 10,640
  • 1
  • 31
  • 48
  • So you are actually questioning if it makes sense to apply the DRY principle here? – Doc Brown Nov 20 '17 at 16:28
  • @DocBrown: Sorta. If he thinks he is going to end up with a lot of with weird solutions, that sounds to me like he has confused near identical code with identical concepts. Exactly identical code doesn't mean identical purpose, let alone nearly identical code. I don't think we have enough context to say for sure (and if we did it would belong on another stackexchange). So my answer is intended to allow the OP to decide whether the code violates DRY, not how best to make the code readable (whether it violates DRY or not). – jmoreno Nov 21 '17 at 02:01
  • Absolutely agree. "Sort of does the same thing" is not the same as "does exactly the same". – frostings Nov 21 '17 at 09:07
  • 1
    Well, my impression is, the OP has a valid case for duplication (maybe he has simplified it too much to post it here), and there is no need to look for "excuses" why there is no need for removing the duplication. – Doc Brown Nov 21 '17 at 16:05
5

@JacquesB's answer is fine, however, in C# there is an equivalent approach if you want to keep the loop over all images in your your generalized generateImage. In context A, just pass the lambda expression

 (x,y,image) => projectPixel(x,y)

as the function parameter into generateImage, then there will be no need for adding a dummy parameter to projectPixel. This way, you won't even need a closure.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
4

You can avoid the ugly dummy parameter if you use a closure in B. Something like this (in C#, but you can translate to the language of your choice):

Vector<Pixel> generateImage(int width, int height, Func<Position, int, int> projectPixelFunction)
{
    Vector<Pixel> pixels;
    for(x = 0; x < width, x++)
    {
        for(y = 0; y < height; y++)
        {
            Position pos = projectPixelFunction(x, y);
            Pixel p = someCrazyFunc(pos);
            pixels.add(p);
        }
    }
    return pixels;
}

A:

void generateImage(width, height)
{
    Vector<Pixel> pixels = generateImage(width, height, projectPixel);    
}

B:

void generateImage(width, height, numImages)
{
    Vector<Pixel> pixels;

    for(image = 0; image < images; image++)
    {
        pixels += generateImage(width, height, (x,y)=>projectPixel2(x, y, image));
    }
}

I do agree this does not necessary make to code easier to understand. I think what you really want is:

  • use a Position type rather than x, y
  • separate the looping from the processing of each individual pixel.
JacquesB
  • 57,310
  • 21
  • 127
  • 176
3

I might be missing something but it feels as if projectPixel and projectPixel2 only exist separately because their parameters are different. Why not just abstract the parameters into an object and then at least most of code (from for(x...) would be common.

So:

MyObj obj = new MyObj();
obj.X = x;
obj.Y = y;
obj.Image = image; // Optional

Then:

Position pos = projectPixel(obj);
Robbie Dee
  • 9,717
  • 2
  • 23
  • 53
3

I'm gonna tackle this question from a different angle and express a love for repeating myself, for repeating myself, for repeating myself, when it doesn't cause maintenance issues. It's easy for overzealous developers with their hearts in the right place to develop a hatred of repeating oneself to the extent that, in trying to fight a perceived evil, they create something just as bad or worse.

It's the most intuitive for humans to see redundancy as the greatest and most immediate form on waste when it isn't necessarily in computation, just as it isn't necessarily wasteful to do redundant arithmetic if it results in less branching, just as it isn't necessarily wasteful to duplicate some simple boilerplate here and there if it simplifies things.

And of course redundancy sometimes is genuinely wasteful, like a codebase that duplicates complex, highly error-prone code which wasn't even correct and properly tested in the first place where you fix a bug in one place only to find that instead of centrally correcting the issue, you have the bug duplicated in 5 other places that you've yet to fix which apply the exact same logic. And likewise you might have an application that allocates and frees many chunks of memory ten times more often than it needs, and that type of computational redundancy could be far from trivial. Yet these are extreme cases where complexity is being duplicated, not simplicity.

Further as jmoreno pointed out, just because code looks the same doesn't mean its repetitive. DRY is most prone to be abused if you are trying to apply it at a basic syntactical level where you can end up fighting the tools and the language you're using and get tempted to write the most exotic code that you'll come to hate years later. Still, even when you apply DRY to eliminate high-level, logical redundancies, not all logical redundancies are always so bad as I see it.

If I had the choice between a third party library providing some image operations duplicating some mathematical functions like vector and matrix operations instead of depending on a separate, huge, mathematical library, I'd prefer the code duplication there for things like vector/matrix multiplication and matrix transposition in favor of having this nice, independent, self-contained image library which has the minimum number of external dependencies. In that case, code duplication can become like a decoupling mechanism and a way to achieve isolated minimalism. That's provided that it is well-tested and works and isn't the type of buggy or inefficient or narrowly-applicable code that constantly needs to be modified.

If it's stable and it works great, then I don't care if it's duplicating hundreds of lines of code that is offered elsewhere in some other section of my codebase. There's timeless code like that like image routines I have for parallelized, vectorized convolution filters which I haven't updated or needed to update in over a decade. Such might not have been the case if, instead of duplicating a small amount of code to make the library more independent, it instead relied on a central math library, a central image library, etc, all of which could have needed updates or replacements and could have become obsolete in ways this small and independent library has not. It's nice to have some isolated code here and there which is complex but stands the test of time, being able to continue being widely applicable without requiring any further changes in spite of surrounding code aging, requiring changes, and working towards obsolescence. That requires such code to be isolated away from instable code wanting further changes, and that isolated quality tends to imply some modest duplication for the code to fully achieve its independence.

Anyway, I'm playing the devil's advocate a bit here by trying to make a case in favor of some code duplication, but that might help balance your mindset towards one that isn't tempted to try to zealously stamp out every single type of redundancy you find in your system which could be even worse than accepting some harmless redundancy here and there. I never cared for the term, "overengineering", but there are bigger priorities in SE like making sure that things are thoroughly tested to minimize the reasons for them to change over making sure they reuse as much code as possible (which could actually maximize the reasons for them to change -- maximizing instability).

Further if you are dealing with image processing, then there are some characteristics that typically emerge:

  1. Efficiency tends to be a big factor considering that you're typically looping over millions of pixels. When efficiency is a big enough concern, flatter code, even if it's not the most syntactically elegant, tends to be easier to optimize and work with against a profiler. Likewise optimizing often requires making tradeoffs making common case execution paths more efficient in exchange for making rare case paths less efficient. It's easier to skew the code and make those trade-offs if the changes are only applicable to the particular operation or related operations you are profiling and not to some central library of highly reused code. Such a central library trying to juggle disparate needs might not have the same common use cases as the common case requirements of the image operation(s) you are optimizing.
  2. Image processing generally isn't the type of thing which causes very difficult, incomprehensible bugs. Image operations can be complex and can sometimes be difficult to write the first time around, but they're self-contained and, once tested to a decent degree, tend to work just fine without tricky edge cases and gotchas. As a result, it isn't typically the type of field where you have to worry about creating the most maintainable code possible for an image filter, e.g., since once an image filter works, it tends to have few reasons to change, and few or no reasons to change for correctness. You're typically not coming back to the code you wrote time and time again. If your testing is good, you can apply a checklist mindset and just move from writing one image operation to the next and typically without going back and forth all over the place.

Given these two characteristics, there's even more reasons not to get too obsessed with DRY. Get the image operations done, test them, make sure they're efficient enough and work well, and call it a day. If they required writing a bit more code than conceptually necessary and don't implement things in the most elegant way possible but are still fast and work well in testing, it's not the end of the world. The two most common reasons to revisit former image processing code is usually efficiency (so it might pay to benchmark and profile and tune in advance in favor of greater stability) or because it depended on some external types and functions from libraries that became obsolete (which some modest code duplication and relying on simple PODs can actually fight against and help you achieve a more stable solution with fewer reasons to change).

0

A different way to write this without repeating too much is to write choice A like this:

void generateImage(image, width, height, projectionFunc)
{
    vector<Pixel> pixels = image.getPixels;

    for(x = 0; x < width, x++)
    {
        for(y = 0; y < height; y++)
        {
            Position pos = projectionFunc(x, y);
            Pixel p = someCrazyFunc(pos);
            pixels.add(p);
        }
    }
}

Then write choice B like:

void generateImages(width, height, numImages)
{
    vector<Pixel> pixels;

    for(image = 0; image < images; image++)
    {
        generateImage(image, width, height, projectPixel2);
    }
}

Now the common stuff is in a single function, and you don't loop over a single image in the Choice A case.

user1118321
  • 4,969
  • 1
  • 17
  • 25