So I'm responsible for maintaining and enhancing a PHP codebase that runs to around 1,100 files and 120,000 lines of code.
In general, it's not terrible, although since it was written over some 10+ years starting in the PHP 4 era, some parts are more terrible than others. However, this entire codebase has a grand total of two unit tests, one of which I wrote yesterday.
All the persistent data is stored in a database.
If the codebase has one big, obvious problem, it's the giant 1700-line "god object" that sits in the middle of everything and acts as combination class factory, database abstraction, and session data storage.
The god object then has various different factories depending on whether it is being called from a web page (used by end users) or a command line (used by operators) or an operational script responsible for interfacing with the rest of the world to actually make things happen based on what users & operators do.
So it looks a little like this mockup example: https://gist.github.com/jdavidlists/f2abeaefa58a9a823f71bb42fbd3f49d
Where this turns ugly is that in addition to the Houses table in the database, there are another dozen or so house-related tables (Doors, Windows, Bookcases, Books, Rooms, Cabinets, Furniture, etc.), all with their own WhateverTable and WhateverRow classes. And there's also a Streets table. And a Sidewalks table. And a Driveways table (and of course a house can have more than one driveway, and a driveway can be built where there's not a house yet, and a house's driveway can connect to more than one street, and a driveway might connect to more than one house). And there's a Cars table. And an Owners table. And Streets, Driveways, Cars, and Owners all have another dozen or so related tables.
What I'm getting at here is that the system being modeled is an existing real world system that is insanely complex and full of interdependencies, many-to-many relationships, special cases and exceptions that have to be accommodated.
For example, to add a Door to the House, you'll wind up:
- Creating a BuildingPermit object.
- Creating a Door object.
- Possibly creating a Sidewalk object. (Or a Driveway object if it's a GarageDoor subclass.)
- Adding a Door-to-House relationship.
- Adding a Door-to-Sidewalk relationship.
- Possibly adding Sidewalk or Driveway relationships to one or more Streets.
- Modifying (or possibly adding) a Room object.
- Wait for BuildingPermit to reach state Approved.
- Issue asynchronous command to have the actual door built.
- Wait for that to finish.
Of course any of these things might fail, which means they all have to be rolled back, so now your add-a-door method (in whichever class you decide all this actually belongs) at the top of the stack has to be aware of transaction safety at the database level all the way at the bottom. And the is-this-valid tests include some 2 KiB SQL queries with 8 joins, 2 subqueries, and a HAVING condition that make the tester in you go home early, hide in a closet, and sob.
Currently, this is handled by passing the God object around all over the place. The theory is that each object can do its bit and pass off to the next object. The practice is that every object is now aware of and dependent on every other object. There's almost no such thing as encapsulation because the relationships between the real-world objects being modeled are so tight and interdependent.
All the dependency injection and inversion of control stuff I see says stuff like "If your House needs a Door, pass the Door in the constructor and have a HouseFactory. That way you can pass a mock door or NULL for testing." Which is great for a simple one house : one door model. But in this case, the user (or the factory) has no idea how many doors the house might have, as that's buried in the database.
Given the above, it's easy to see how things evolved to their current state over a long period of time. But what's less clear is what to do about it now.
The good news is this code works remarkably well, all things considered. The bad news is that doing a total rewrite is out of the question.
It is possible to make big changes, eventually. It's just really slow and requires backwards compatibility code loaded down with "trigger_error( 'Don't do this anymore.', E_USER_DEPRECATED );" calls. Using this approach, I've already managed to consolidate the three nearly-but-not identical database abstractions down to just one, which was a big project and a big win, with minimal disruption.
Now I'd like to figure out where to go from here. Over time I would like to make things better and not worse. This seems likely to come in two flavors:
1) Modifications that make testing easier. (Read: possible.) And, of course, creating the tests.
2) When making enhancements, push things that are being touched in the right direction. I.e. slowly phase out use of the God object in favor of whatever the right answer is. (While making the God object pass things through to the new model so existing code continues to work.)
So, although it's not an option, if this was being written from scratch today, what would be the right approach for such a complex, interdependent, database-backed model? And then the real question is: what's the best way to safely ease things in that direction over time?
Note: Thusfar, experiments with registry-based IoC approaches don't fare well, because they wind up registering hundreds of classes on every page load, of which only a handful wind up being used, and that negatively affects performance. At least with the God object, the thing sits in the opcache all day long, ready to go.
Thanks for any suggestions!