Refactoring and documenting a big monolithic system

Question

I have started working on a new project and to my surprise its written by a single developer with almost no tests at all, the remaining tests are either buggy or feel error prone throwing a lot of NullPointerException when anything is changed. To my surprise a release was scheduled just after I started so I hardly had time to understand code base and functionality.

Now, after the release I am thinking ways to improve code quality and understanding the systems. I know that there is going to be some heavy refactoring from databases to server code to frontend. Also, I have document things properly for future use and as part of my job.

What I am thinking is start with integration test so that I get to know the use cases and then starting refactoring on piece at a time while writing proper unit tests. I am used to writing integration tests and after some googling I got confused that there are acceptance tests too.

I need to know seeing my current status or anyone in same condition which of the above testing strategy is helpful? and how to proceed?

I see this sort of problem time and time again. Really the smartest approach in my humble opinion is to force bottlenecks in the program that obliges common program operations to have to call a single method, effectively allowing you to test / change program behavior by modifying the single method. If it is as large as you say it is, don't expect it to be fixed overnight either. Blood. Sweat. And Tears. — Neil, Oct 06 '17 at 10:02
Possible duplicate of [I've inherited 200K lines of spaghetti code -- what now?](https://softwareengineering.stackexchange.com/questions/155488/ive-inherited-200k-lines-of-spaghetti-code-what-now) — Ben Aaronson, Oct 06 '17 at 12:52
What does your copy of "Working Effectively with Legacy Code" tell you? — Doc Brown, Oct 06 '17 at 13:21
IMHO you already should have read it when you are a professional which has to maintain and evolve a large code base on a regular basis. If you do currently do not have a copy, now would be a good opportunity to get one. — Doc Brown, Oct 06 '17 at 20:21

score 6 · Accepted Answer · answered Oct 06 '17 at 14:37

I would strongly advise you not try an refactor this code.

You are in a no win situation here, any new features you add may introduce a bug in an existing feature for which there is no test. However, if you refactor that existing feature you are just as likely to create bugs in other features.

If you sell refactoring to the business as "we need to spend some time making the system less buggy and testable", in 3 months you will end up with an even more buggy system with a few more tests the business will not be pleased!

Furthermore you will probably find that the requirements for the system are "whatever it does now". Writing tests for it will not be easy.

Your best bet is to get firm requirements on new features and make sure you make tests for them. Bugs in old features, well you can say "do you want me to write tests for that feature? what are the requirements of how it should work?".

Once you are more familiar with the project only then start replacing bits as they are upgraded with new features.

Advanced user advice:

If you really want to have the product working 100% then you absolutely need to have the business buy in for a "big rewrite" project.

This goes well beyond the scope of the development team, as before you can start programming you will need to gather and understand both the original requirements of the system and also how it is actually used day to day.

You'll need to setup a separate test environment for the existing system. Assume that normal business as usual development will continue while the rewrite project is in progress.

Once you have these you can start writing functional tests for the current system. Don't touch the code until these are done. You're bound to find a number of existing bugs, or shall we say differences between what people thought the requirements were and what the system does. These will require extensive meetings to resolve, maybe the system is right and the requirements are wrong, maybe they are both wrong, maybe they are right but not what the business needs anymore.

Only once you have a full suite of functional tests all passing can you even think out changing the code; and remember, while your doing that another team is plugging away adding more features that you will have to duplicate on the 'new system' before you can go live.

If you manage to do all this, overtake the business as usual team and release then you have a product with the exact same features it had at the start of the process. It's a very hard sell to make to a business if the product is basically working ok.

Good answer, but a "big rewrite" is not always necessary in the long term. An incremental rewrite may be possible, by changing as little as possible at once, and testing only the changed portions. It depends on how extensive the data dependencies are that exist across the program as a whole. — Frank Hileman, Oct 06 '17 at 21:20
essentially that is what I suggest. rewrite as features are upgraded — Ewan, Oct 06 '17 at 21:21
@Ewan For first para I don't think writing existing on top of old system is an easy task, I see that the code is really spaghetti and hardly follows OOPs principal at all. — CodeYogi, Oct 09 '17 at 10:59
@Ewan what I was thinking is to write couple of integration tests with the existing requirements and for new features writing unit tests. I would then do the refactoring in parallel if I get spare time. — CodeYogi, Oct 09 '17 at 11:00
@Ewan and I hardly see anyone mentioning the use of versioning control, I think it would be a good idea to write some `git-hooks` . — CodeYogi, Oct 09 '17 at 11:02
If there are some obvious tests to write it cant do any harm, but you need to focus on business value. I think we all assumed you can the code in source control and a build server already. — Ewan, Oct 09 '17 at 11:10
I think no one noticed documentation stuff. How to start documenting project? — CodeYogi, Oct 15 '17 at 18:46
same answer, don't. Its a no win crappy job that you wont get thanked for. No one reads docs anyway — Ewan, Oct 15 '17 at 19:09

score 3 · Answer 2 · answered Oct 06 '17 at 10:27

We have all been there, I had to refactor some messy projects once. What I can tell you is to not underestimate how long it can take. Having the system tested first will definitely make your task easier.

The choice between integration and acceptance tests depends on how big is your refactoring. Acceptance tests are written to test the whole system. It is generally expressed as an example or a usage scenario. ex: as a user, I enter my username and passwords, press login button. Then I should see the welcome screen.

Once you have acceptance tests of all your features you can start refactoring with a lot of confidence, You can split the project into different services or even change the programming language itself. Writing these test from scratch can be very time consuming if the system is big.

If you want to write integration tests for some of the functionalities and then refactor, it means you are assuming that some parts of the project won't change. ex: if you want to test an endpoint, you assume that this endpoint will not change. And it will have the same interface after refactoring.

The decision depends on how much time you have and how messy is the project. What I suggest is to be realistic and not aim too high. Start with some integration tests, do the refactoring, keep adding new features and refactor at the same time. In the end, there is no perfect software anyway.

Good luck

score 1 · Answer 3 · answered Oct 06 '17 at 14:00

The prior answer is very good and should be followed, so I will try to cover the aspects of your question that are not strictly unit tests - the "and how to proceed?" part.

Ask yourself how well the project is organized. Is it properly divided into classes and functions that make sense? Are parent classes and interfaces used appropriately? Has unused or redundant code been removed? Is there a clear workflow as data moves through the program and is processed? Is the code well-commented, at both the individual function level and the higher class level to indicate what each class is used for? And is there good documentation like requirements and design plans? The project may be great or horrible in these regards; it all depends on the prior developer and the conditions under which they had to make the program. And even a good developer can make a poorly-implemented program if forced to work with changing requirements and limited time.

If you are using Visual Studio or an IDE with similar intellisense-like features, then your job is easier.

A first step is to put the project in source control, if it isn't already. That is most important. The next is to remove any code and/or variables that are unused. After that, read through at least a few of the classes and understand what each one does; add comments explaining the classes at a 'strategic' level if such comments don't exist - say when the class is used and what does for the overall program. Do this for several classes before you do anything else. Now you can start building unit tests for those classes. If any class seems bloated, you can consolidate or reorganize its private and internal functions, then maybe work on its public functions. If you see repeated or redundant code, you can break those out into separate classes/functions/etc as needed.

After you go through enough classes, you will probably see several that could share a parent class or interface - you can make that interface, and update the references and calls as needed so they refer to the parent/interface where possible. This will probably allow you to remove many conditional statements that were used to handle the variation between the classes.

Finally, remember that (if the prior developer was any good at all), there is a reason that everything was done the way that it was done. Never assume that you know more than the person who originally wrote the program. Sometimes a segment may be poor because the developer didn't know a better way to program it, but sometimes it's because of a specific requirement or a quirk of the language or system.

Could you be more clear about which prior answer you're referencing, please? This will become increasingly vague as answers accumulate. — , Oct 06 '17 at 17:09

Refactoring and documenting a big monolithic system

3 Answers3