Achieving Zero Downtime Deployment touched on the same issue but I need some advice on a strategy that I am considering.
Context
A web-based application with Apache/PHP for server-side processing and MySQL DB/filesystem for persistence.
We are currently building the infrastructure. All networking hardware will have redundancy and all main network cables will be used in bonded pairs for fault-tolerance. Servers are being configured as high-availability pairs for hardware fault-tolerance and will be load-balanced for both virtual-machine fault-tolerance and general performance.
It is my intent that we are able to apply updates to the application without any down-time. I have taken great pains when designing the infrastructure to ensure that I can provide 100% up-time; it would be extremely disappointing to then have 10-15 minutes downtime every time an update was applied. This is particularly significant as we intend to have a very rapid release cycle (sometimes it may reach one or more releases per day.
Network Topology
This is a summary of the network:
Load Balancer
|----------------------------|
/ / \ \
/ / \ \
| Web Server | DB Server | Web Server | DB Server |
|-------------------------|-------------------------|
| Host-1 | Host-2 | Host-1 | Host-2 |
|-------------------------|-------------------------|
Node A \ / Node B
| / |
| / \ |
|---------------------| |---------------------|
Switch 1 Switch 2
And onward to VRRP enabled routers and the internet
Note: DB servers use master-master replication
Suggested Strategy
To achieve this, I am currently thinking of breaking the DB schema upgrade scripts into two parts. The upgrade would look like this:
- Web-Server on node A is taken off-line; traffic continues to be processed by web-server on node B.
- Transitional Schema changes are applied to DB servers
- Web-Server A code-base is updated, caches are cleared, and any other upgrade actions are taken.
- Web-Server A is brought online and web-server B is taken offline.
- Web-server B code-base is updated, caches are cleared, and any other upgrade actions are taken.
- Web-server B is brought online.
- Final Schema changes are applied to DB
'Transitional Schema' would be designed to establish a cross-version compatible DB. This would mostly make use of table views that simulate the old version schema whilst the table itself would be altered to the new schema. This allows the old version to interact with the DB as normal. The table names would include schema version numbers to ensure that there won't be any confusion about which table to write to.
'Final Schema' would remove the backwards compatibility and tidy the schema.
Question
In short, will this work?
more specifically:
Will there be problems due to the potential for concurrent writes at the specific point of the transitional schema change? Is there a way to make sure that the group of queries that modify the table and create the backwards-compatible view are executed consecutively? i.e. with any other queries being held in buffer until the schema changes are completed, which will generally only be milliseconds.
Are there simpler methods that provide this degree of stability whilst also allowing updates without down-time? It is also preferred to avoid the 'evolutionary' schema strategy as I do not wish to become locked into backwards schema compatibility.