3

At the moment I am developing three scientific software projects (computer vision) in parallel in Python and scratching my head about organization, clean code and easy extensible/to maintain code.

One question which I have is whether to keep all three project separate or not? They share similar functions, which I thought to collect in a shared API (with many arguments). Or should keep the projects entirely separated, which means I end up with redundant code which is usually not recommended.

I have a function to stack images. This function is used in all three projects.

def stack_images(img_lst):
    # do some stacking of the img_lst with algorithm_max
    return stacked_image

Now in one of the projects I need another algorithm for the stacking, let's call it algorithm_min but the rest of the code inside the function will not change. Should I add the algorithm as an argument or should I write three stack_images functions, one for each project, which can be easily adapted to the needs of each project?

My fears for the first option, it is very likely that I have to make small changes now and then which means in one year I might end up with many arguments.

My fears for the second option, if I change something fundamental in my function which concern all three projects, I have to change the code on three places (which I am sure soon or later I will forget to do it for the other projects.)

Second example, I have to switch the color spaces of my images all the time. This is one line of code in opencv:

# RGB to HSV 
hsv_image = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
# RGB to LAB 
lab_image = cv2.cvtColor(img, cv2.COLOR_RGB2LAB)

Now, should I write one function with the color space as an argument or for each color space a separate function? Should I unit test such basic functions (isn't it exaggerated?)

snowflake
  • 153
  • 5
  • 1
    There are a few different questions here which do not seem related. You should probably split your question into multiple more focused questions instead. – JacquesB Feb 28 '17 at 14:43
  • 1
    It's good to see that you recognise you have a problem and you're working toward fixing it! =) Unfortunately you're asking a few different questions and they're a bit broad to answer specifically. I recommend you look into "version control software" and practices for your experimental code. There are ways to share functionality between projects in python, but depending on the nature of your program and that how generic that shared functionality is, it may be unnecessary. – Erdrik Ironrose Feb 28 '17 at 15:02
  • Ok, I see the point and I apologize. I will ask the other question separately and only concentrate on the shared code question. – snowflake Mar 01 '17 at 09:18
  • Note to anyone else who sees this in the close vote review queue: please read the question as I edited it since it was placed in the queue. It should no longer be too broad. –  Mar 08 '17 at 00:44
  • 1
    @snowflake by the way, you can never grow up. There is only room for one... Snowman... around here ;-) –  Mar 08 '17 at 00:45

2 Answers2

2

Try to read up on agile methods (I liked "Continuous Delivery" - but that might be a step too far for now)

If there is significant shared code, implement a shared back-end library. Probably have this under separate configuration control because sooner or later one of the projects is going to race ahead of the others. You want to be able to specify which version of the library you want to load.

An API with "many" arguments sounds like a bad idea. The "many" makes me nervous.

Separate test project for the experimental stuff?

Many small classes is the way to go. Unit test heavily. Automate system test as much as possible. Favour clean simple functions.

Don't go overboard on documentation. I'd favour a small overview document and little else but clean source code and copious unit tests.

You do have configuration control, don't you?

Nick Keighley
  • 754
  • 4
  • 7
  • Thank you, your answer bring already some light in my darkness. I like the idea of Continuous Delivery because I have programming breaks now and then and it would be very convenient for me to leave the projects always in a deployable state. Sorry about this dumb question but I am not 100% sure what does your question "You do have configuration control, don't you?" mean. Could you quickly explain to me? – snowflake Mar 01 '17 at 09:24
  • "Many small classes is the way to go." I see, at the moment I am using only functions in separate files (one file for one task, like stacking, stitching, prediction etc.) – snowflake Mar 01 '17 at 09:26
  • By "configuration control" I mean tools like Git. – Nick Keighley Mar 01 '17 at 10:56
  • ok, I just started to use Github (also I am not very confident yet). – snowflake Mar 01 '17 at 11:31
1

This is a common place to get tripped up when starting a project of new scope. Books like Code Complete and The Pragmatic Programmer help bring clarity to these situations. I would suggest looking at a method like "tracer bullets".

Essentially, don't try to make your project perfect. Aim for where you think you want to hit, and adjust the aim if you notice it needs adjusting. Especially when working in the sciences, you WILL have changing requirements throughout the project. Over-engineering a new project like this in its early stages will only set you up for wasted work and designing yourself into a corner.

Just get it working first, follow sane principles like keeping things simple and small, and refactor if things start getting hazy. KISS

Mike Lui
  • 169
  • 5