How to gently steer a giant PHP app toward better practices?

Question

So I'm responsible for maintaining and enhancing a PHP codebase that runs to around 1,100 files and 120,000 lines of code.

In general, it's not terrible, although since it was written over some 10+ years starting in the PHP 4 era, some parts are more terrible than others. However, this entire codebase has a grand total of two unit tests, one of which I wrote yesterday.

All the persistent data is stored in a database.

If the codebase has one big, obvious problem, it's the giant 1700-line "god object" that sits in the middle of everything and acts as combination class factory, database abstraction, and session data storage.

The god object then has various different factories depending on whether it is being called from a web page (used by end users) or a command line (used by operators) or an operational script responsible for interfacing with the rest of the world to actually make things happen based on what users & operators do.

So it looks a little like this mockup example: https://gist.github.com/jdavidlists/f2abeaefa58a9a823f71bb42fbd3f49d

Where this turns ugly is that in addition to the Houses table in the database, there are another dozen or so house-related tables (Doors, Windows, Bookcases, Books, Rooms, Cabinets, Furniture, etc.), all with their own WhateverTable and WhateverRow classes. And there's also a Streets table. And a Sidewalks table. And a Driveways table (and of course a house can have more than one driveway, and a driveway can be built where there's not a house yet, and a house's driveway can connect to more than one street, and a driveway might connect to more than one house). And there's a Cars table. And an Owners table. And Streets, Driveways, Cars, and Owners all have another dozen or so related tables.

What I'm getting at here is that the system being modeled is an existing real world system that is insanely complex and full of interdependencies, many-to-many relationships, special cases and exceptions that have to be accommodated.

For example, to add a Door to the House, you'll wind up:

Creating a BuildingPermit object.
Creating a Door object.
Possibly creating a Sidewalk object. (Or a Driveway object if it's a GarageDoor subclass.)
Adding a Door-to-House relationship.
Adding a Door-to-Sidewalk relationship.
Possibly adding Sidewalk or Driveway relationships to one or more Streets.
Modifying (or possibly adding) a Room object.
Wait for BuildingPermit to reach state Approved.
Issue asynchronous command to have the actual door built.
Wait for that to finish.

Of course any of these things might fail, which means they all have to be rolled back, so now your add-a-door method (in whichever class you decide all this actually belongs) at the top of the stack has to be aware of transaction safety at the database level all the way at the bottom. And the is-this-valid tests include some 2 KiB SQL queries with 8 joins, 2 subqueries, and a HAVING condition that make the tester in you go home early, hide in a closet, and sob.

Currently, this is handled by passing the God object around all over the place. The theory is that each object can do its bit and pass off to the next object. The practice is that every object is now aware of and dependent on every other object. There's almost no such thing as encapsulation because the relationships between the real-world objects being modeled are so tight and interdependent.

All the dependency injection and inversion of control stuff I see says stuff like "If your House needs a Door, pass the Door in the constructor and have a HouseFactory. That way you can pass a mock door or NULL for testing." Which is great for a simple one house : one door model. But in this case, the user (or the factory) has no idea how many doors the house might have, as that's buried in the database.

Given the above, it's easy to see how things evolved to their current state over a long period of time. But what's less clear is what to do about it now.

The good news is this code works remarkably well, all things considered. The bad news is that doing a total rewrite is out of the question.

It is possible to make big changes, eventually. It's just really slow and requires backwards compatibility code loaded down with "trigger_error( 'Don't do this anymore.', E_USER_DEPRECATED );" calls. Using this approach, I've already managed to consolidate the three nearly-but-not identical database abstractions down to just one, which was a big project and a big win, with minimal disruption.

Now I'd like to figure out where to go from here. Over time I would like to make things better and not worse. This seems likely to come in two flavors:

1) Modifications that make testing easier. (Read: possible.) And, of course, creating the tests.

2) When making enhancements, push things that are being touched in the right direction. I.e. slowly phase out use of the God object in favor of whatever the right answer is. (While making the God object pass things through to the new model so existing code continues to work.)

So, although it's not an option, if this was being written from scratch today, what would be the right approach for such a complex, interdependent, database-backed model? And then the real question is: what's the best way to safely ease things in that direction over time?

Note: Thusfar, experiments with registry-based IoC approaches don't fare well, because they wind up registering hundreds of classes on every page load, of which only a handful wind up being used, and that negatively affects performance. At least with the God object, the thing sits in the opcache all day long, ready to go.

Thanks for any suggestions!

This sort of "How do I fix _?" question is generally too broad for a Q&A site, as we'd either have to review your whole codebase or make an incredibly lucky guess to give you more useful advice than "try to break up the god object into smaller objects" which I'm assuming you already thought of. However, I do see several parts of this question which might be excellent questions if asked by themselves with far more detail, so I'd try to focus on the subproblem that's either causing you the most trouble or that you feel the most confident you can give a crystal clear, complete explanation of. — Ixrec, Jun 05 '16 at 20:10
Off topic (nowadays, this type of question was accepted years ago) and duplicate of several old questions: http://programmers.stackexchange.com/questions/66438/techniques-to-re-factor-garbage-and-maintain-sanity http://programmers.stackexchange.com/questions/135311/what-is-the-most-effective-way-to-add-functionality-to-unfamiliar-structurally http://programmers.stackexchange.com/questions/129951/general-approach-to-re-factoring-an-large-very-badly-written-legacy-system — thorsten müller, Jun 05 '16 at 20:39
The goal is not to refactor, rewrite, or complain about the existing codebase. Not only do I not want to do any of those things, but also nobody is going to pay for it. There are kind of two parallel questions. The first is a design question about how the Law of Demeter applies in a situation where the system being modeled is this barking complicated and interdependent. The implementation question is, for that level of interdependency, how to figure out how to add tests so new changes don't break existing things. It's the size and complexity of the system that's the problem, not the codebase. — jdavid, Jun 05 '16 at 21:23
Argh I just saw that I put "refactor" in the question title. That's fixed, as this is really not about refactoring at all. Or, to the extent it is, it's about refactoring slowly over the next ten years during the course of ordinary development. — jdavid, Jun 05 '16 at 22:38
"Thus far, experiments with registry-based IoC approaches don't fare well" Maybe take a look at dependency injection containers such as the Symfony one. Might help tame the inter-dependencies. — Cerad, Jun 06 '16 at 15:00

score 4 · Answer 1 · answered Jun 05 '16 at 20:38

The first question that comes to my mind when reading your question is not how one could refactor such a beast (»1,100 files and 120,000 lines of code.«), it is more »why on earth would anybody do that?«

I know the strive for the perfect design and I too love clean code. But ont the other hand, I have only 8h a working day and there are funnier things to do than that. More than once I arrived at a job, where a complete mess awaited me. So, I know that »OMG-WTF! let's clean this!« feeling.

But why do you want to do it?

There are possible solutions that are way better than refactoring such a behemoth:

I) If the core runs, you are able to make clean extensions. If there is much ceremony and boilerplate involved in getting things running, you could stuff things behind a Facade - like cleaning a room when putting everything in drawers. So you have done something valuable: You delivered business value (a new feature - with clean code) and have some feeling of tidyness.

II) If the core doesn't allow extensions in one or the other direction, there are two decisions to make:

II a) How expensive is it to make partial changes to the core in order to deliver value with justifiable effort. Some minor changes here and there making your life easier is perhaps enough to get things done.

It is never justifiable to completey refactor 120k LOC.

II b) If you come to the conclusion, there is an unjustifiable amount of work to do, you should play with the cards on the table and get a team starting a complete rewrite. Sometimes this is the way to go.

But which direction you want to go, you have to make a business based decision.

That is all, I could do for you at the moment.

The primary problem here is that the code as written borders on totally opaque to unit tests. So, the bigger it gets, the more chance that any change has some disastrous consequence somewhere far away that you never contemplated. So the goal isn't really to refactor so much as to find a way to test things. — jdavid, Jun 05 '16 at 20:41
@jdavid If you can't write unit tests, write integration tests. Test at the web service level. I have used jmeter very effectively in such circumstances. — Joeri Sebrechts, Jun 06 '16 at 07:15

score 4 · Answer 2 · answered Jun 05 '16 at 21:11

4

You're responsible for enhancing and maintaining the codebase. Is that the only responsibility that you have ?

If it is, then the game is pretty simple. Start small. DO NOT TOUCH THIS GOD. It's a God for a reason. It is used everywhere, it is too massive and too powerful, you won't be able to test it anyway and any change could break ANY part of your codebase.

Your real goal is not to make this class cleaner. This class is your enemy. Make it useless by removing calls to this class in small parts of the application, and replace it with clean and tested code. This will take a lot of time, but at some point this class won't even be used anymore. You'll be able to simply remove it.

So, simple strategy : - Very small part of the application using the god object : target acquired. - Make everything possible to test it. Mock the god object, inject it. - Now inject a better, clean and tested implementation of the god object in this small part. - Rinse and repeat with other small parts. - Remove the god object.

This should be a pretty huge improvement already.

answered Jun 05 '16 at 21:11

Steve Chamaillard

1,691
1
11
20

Thanks, I feel like you've understood my objective pretty well. However I want to be careful not to do this ad-hoc, so I don't accidentally evolve something else just as bad over the next ten years. (The god object is dead! Long live the dog object!) So what you suggest is exactly what I want to do, the problem being I have to figure out what things ideally should look like for a system (referring to the needed functionality, not the code) this complex and interconnected, that appears to subvert many core OO principles. Then I can use that as a compass bearing as I move ahead. – jdavid Jun 05 '16 at 22:29
There shouldn't be a 'new' god object. There should be multiples classes helping each other and doing only one thing but doing it perfectly – Steve Chamaillard Jun 06 '16 at 05:04
Well, right. Everyone's goal is to write perfect code. The "dog object" above refers to backing away from the god object while backing unintentionally into some other equally undesirable outcome that (like the original) will only become apparent after a long period of time. That's why this is a design question, not a coding question. – jdavid Jun 06 '16 at 15:28

score 0 · Answer 3 · answered Jun 05 '16 at 23:49

I haven't done a lot of coding with PHP, but generally There are few steps you can start with,

Create interfaces for each responsibility of the God Object and alter it to implement these interfaces. This is a minimal risk as there are no actual changes done in logic, just adding interfaces.
When you want to modify a section where God Object is used, you could change the God Object reference to use the appropriate interface(s), instead of God Object. This will remove direct dependencies of God Object and it will be more easier to mock and test.
At some point when there are minimal references to the actual God Object, you can start to split up the object such that one per interface.

How to gently steer a giant PHP app toward better practices?

3 Answers3