51

Background

I'm working in a team that's looking to implement zero-downtime deployments. We're planning on using a blue/green deployment strategy in order to achieve this. One of the things I'm realising in doing the research is how complicated it becomes to make database changes. A simple operation like renaming a column can take 3 full release cycles until it's completed!

It seems to me that having the full rollout of a change take multiple release cycles introduces a lot of potential for human error. In the linked article it shows that code changes are necessary for 2 releases and a database migration is needed for 3 releases.

What I'm looking for

Currently, if we want to remember to do something, we can create a ticket in our issue management system, which creates clutter and also might get moved to a later sprint or the backlog by management; or we can create a TODO comment, which will probably be forgotten about completely.

What I'm looking for is a way that a TODO comment can have a deadline against it, and our Continuous Integration system (current undecided which we'll use) would reject the build if this deadline was expired.

For example, if we rename a column we could create the initial migration for it, and then two TODO comments to ensure that the remaining two migrations are created:

// TODO by v55: Create migration to move constraints to new column, remove references to old column in app
// TODO by v56: Create migration to drop old column

This seems fairly simple to implement, but I'm wondering if something like this already exists, because I don't want to re-invent the wheel.

Additional thoughts

I feel like I might be suffering from XY problem here, given that rolling deployments and blue/green deployments are considered a best-practice it seems strange that I can't find a solution for making database updates less painful. If you think I'm looking into the wrong thing entirely, please let me know in a comment! That said, the database example I gave is just one example, and I think TODO comments with due dates would be useful in other situations too, so even if I'm approaching this specific situation all wrong I'd really like to answers to my actual question too. Thanks!

EDIT: I just thought of another situation where this could be helpful. If you use Feature Toggles to turn on parts of your app when they are ready, you have to be careful to clean them up, otherwise you may end up with Toggle Debt. Comments with deadlines could be a good way of remembering this.

Joshua Walsh
  • 565
  • 4
  • 8
  • 19
    The TODO issue is more a matter of discipline than tooling. – Brandon May 30 '17 at 00:46
  • 16
    I think humans all make mistakes, and tooling can be a good way of mitigating this. – Joshua Walsh May 30 '17 at 00:47
  • 1
    Don't http://blog.schauderhaft.de/2014/02/16/your-code-aint-your-to-do-list/ – Jens Schauder May 30 '17 at 05:35
  • 1
    what's the question? are you asking for tooling recommendation? – cat May 30 '17 at 14:13
  • Are you worried that step 3 will be deployed before step 2, or that you will deploy step 2, your managers will say "yay it works", and step 3 will never happen? – user3067860 May 30 '17 at 17:18
  • 3
    What about doing this programmatically like. I mean write in a class member your version. Fail the app from starting if version is equal == 56 with message "class y" need to have this feature you can have a list of such messages. What do you think? – Tomer Ben David May 30 '17 at 18:12
  • 6
    Directed at the nay-sayers, I disagree: Our code base relies on a lot of other components which we don't work on, so we use `TODO :` to track workarounds for issues with other components. When a bug is cleared on one of those components, you can easily find & address the relevant workarounds. It does not replace an issue tracker, it makes it easier to maintain. – TemporalWolf May 30 '17 at 18:57
  • I've written a C macro in the past that parsed a `__DATE__` string into a `20170531` numerical format, so I could do `char TODO [20170601 - PARSE(__DATE__)]`. This failed as soon as the array size became negative. The chief challenge was that `__DATE__` is a string, and the months are _named_ not numbered (!). Had to do a compile-time hash of the month names. – MSalters May 31 '17 at 08:04
  • I have seen renaming a column on a large database take more then 24hrs! – Ian May 31 '17 at 17:14

7 Answers7

53

This question is really two questions in one.

Todo comments

Of all the ways to track action items, this is the worst. TODO comments are good during active work or as a way of suggestion to a maintainer, "here is something that could maybe be improved on in the future". But if you rely on TODO comments for getting work done, you're doomed to fail.

What to do about it

TODO comments are basically technical debt, so they should be handled like any other technical debt. Either tackle them right away, if you have time, or put them in the backlog so they can be tracked and prioritized.

Generally speaking, and this is totally opinionated and open for debate, TODO comments could be considered a code smell. If a TODO comment makes it as far as being checked into version control, you have to ask yourself, are you actually going to follow through on it right now? If not, that's ok. Just be honest with yourself and put it in the backlog.

How you manage this backlog comes down to business process, company politics, and perhaps some personal autonomy. But you still need a tracked and prioritized backlog to make sure it happens.

Database changes

Yes, database changes are tricky with a zero-downtime policy. Some tricks to help make it less painful:

Post-deploy process

Create a post-deploy process that runs as part of the same release. However you want it to work. On the last system I worked on, I designed a 4-phase deployment:

  1. preapp database scripts
  2. web apps
  3. postapp database scripts
  4. maintenance window database scripts

The idea was that wherever possible, we would put as much of the database changes into preapp as possible.

Postapp was reserved for the unusual cases where we needed to make incompatible schema changes. In those cases, preapp would make enough of a change to make the new application code compatible (maybe creating a temporary view for compatibility), and postapp would cleanup any such temporary artifacts.

Maintenance window phase was reserved for changes which truly required downtime or where the risk or cost of a live deployment was not worth it. For example, scripts that change massive amounts of data may need to lock an entire table.

Deploy frequently

If you deploy new releases frequently enough, you can reach a point where carrying a change across 2 or 3 releases is trivial. Long release cycles amplify the cost of database changes.

Brandon
  • 4,555
  • 19
  • 21
  • 18
    Todo comments are a terrible way to track and prioritize work. They are a valid way to explain why a half finished bit of code is flapping in the wind. In a perfect world no code ever does that. Meanwhile, in this one... – candied_orange May 30 '17 at 01:48
  • 6
    ... it's sometimes nice to have a way to track technical debt that no amount of deprioritizing from to boss can hide. Sure you'll get no credit for fixing it. Sometimes you fix it anyway. – candied_orange May 30 '17 at 01:54
  • 3
    So the strategy with post-app is that those migrations run once the app deployment has all gone successfully? What about the code? Let's say you're renaming a column from last_name to surname. Your old code uses last_name. You migrate the DB to add surname and change your code to use surname if it's available, otherwise last_name. After the deployment has been fully rolled out, you run the next migration drop the old last_name column. But your code still contains the code for last_name, which is now unused and therefore technical debt. How do you enforce cleaning this up? – Joshua Walsh May 30 '17 at 02:21
  • 3
    While managing action items in comments is indeed a terrible way, having the comments automatically create issues in a tracking system can be a good tool to not forget doing this because you are currently in the middle of coding something and don't want to hard switch context to the issue tracking system. – PlasmaHH May 30 '17 at 09:34
  • 1
    @YM_Industries: I'd assume that Brandon's deployment process typically involves setting up such temporary compatibility hacks on the database side in the "preapp" phase, and tearing them down in "postapp". So, for example, preapp might add a surname column alongside last_name, and set up triggers to make sure they stay in sync; postapp would then delete the now obsolete last_name column and the triggers. Or, depending on the DBMS one is using, it might be easier to do that with views or something; in any case, in this process, the temporary stuff lives on the DB side, not in the app. – Ilmari Karonen May 30 '17 at 09:39
  • 6
    IMHO this answer is missing the point. The OP asked for a solution where the CI notifies the team when an important clean-up was forgotten, without cluttering the "issue management system" (TODO comments were only an example, maybe not the ideal solution for this). The OP gave some good reasons why he does not want to use it herefore. Nevertheless this answer suggests to rely completely on the backlog, which is in case of the OP nothing but his "issue management system". So IMHO this answer ignores the core of the question and does not present a solution. – Doc Brown May 30 '17 at 10:33
  • @IlmariKaronen Precisely! – Brandon May 30 '17 at 11:07
  • @DocBrown I get what you're saying. I think my point was I recommend against trying use source code and the build process to manage action items. I don't have any advice on how to do it well. I don't believe it can be done well. (How would you get a reproducible build or handle hot fixes if suddenly your CI system is failing due to an old TODO comment?). My answer is basically saying, here is an alternative that should work much better. – Brandon May 30 '17 at 11:13
  • @Brandon: I currently don't have a good solution at hand as well, maybe JensSchauder's advice is the best here. – Doc Brown May 30 '17 at 11:33
  • 2
    The nice part about `TODO` comments is that you can flag them easily with an IDE or with SonarQube analysis and in Sonar they are treated as technical debt. – Archimedes Trajano May 30 '17 at 17:47
  • I do agree with most of the answer; only will contribute that the way a team I worked with has made sure they are recorded. All TODO have a user story # associated. That allows to link the code, with real work to be prioritized; then, before closing a user story, you search all the code base for any reference to it. – Bruno Guardia May 30 '17 at 19:10
  • @IlmariKaronen Interesting, we don't currently use triggers in our database but I can see how these would be one way to remove the need for code in our app. Unfortunately the migration system we use, Phinx, doesn't seem too well geared for setting up and pulling down triggers, it's less than convenient. Thanks for your well thought out comment though! – Joshua Walsh May 31 '17 at 10:15
24

Do not use TODOs. You already have a TODO list in your project. It's called the issue tracker.

I think the real problem is in this sentence:

we can create a ticket in our issue management system, which creates clutter and also might get moved to a later sprint or the backlog by management.

If your issue tracker creates to much clutter, find ways to fix that. Maybe a special issue type/tag that involves less ceremony. Maybe sub-issues. Maybe less ceremony altogether. We can't really tell. But if your issue tracker creates so much work, that people rather formulate an elaborate question on a public forum than to just add that issue, something is seriously wrong.

If your management unduly delays the last part of a task you have two options:

  1. talk to your management why this is a bad idea.

  2. handle it as a single task. This might be the gold standard solution. In a perfect world you should be able to make the three changes needed in each step. Apply one to the master branch, let it build and deploy. In the meantime apply the second to the master branch, let it build and deploy and so on so that everything happens in the same sprint, and if it doesn't it's not done. Maybe even something automatic makes sense where you logically do one deployment, but it is actually split into 3.

Jens Schauder
  • 1,503
  • 8
  • 13
  • Good advice, I'll have a think about ways that we can make the issue management system work for us for this purpose. I also really like the idea of "Maybe even something automatic makes sense where you logically do one deployment", I'm trying to think of ways we can do that. I'm not sure it's realistically possible though. – Joshua Walsh May 30 '17 at 06:13
  • 11
    It's entirely reasonable to have comments of the form `// TODO(#12345): Frobnicate the sprocket before passing it along`, provided that bug #12345 is a "real" issue number and the issue is assigned to somebody. This makes the source easier to read by clarifying: "No, the frobnicate step is not hiding in one of the helper methods, it's just flat-out unimplemented. Go look at bug #12345 for more context." Ideally, you'd have a daily linter run over the codebase looking for closed or invalid issue numbers, of course. – Kevin May 30 '17 at 19:47
9

What I'm looking for is a way that a TODO comment can have a deadline against it, and our Continuous Integration system (current undecided which we'll use) would reject the build if this deadline was expired.

What your asking for is doable if you're willing to do the work and follow through.

// TODO by v55: Create migration to move constraints to new column, remove references to old column in app // TODO by v56: Create migration to drop old column

grep for //TODO by v55 when it's time to deploy v55. Deploy build runs a script that does that as an integration test.

You can tie 55 into your version tracking or just prompt for it.

It gets interesting if you want to check for //TODO by v54 when doing 55. Rather then search the code base 55 times just search for //TODO by. Then filter that result for 1 to 55. Now 56 won't trigger a fail.

You might think "oh we won't need that. We'll fix these every time so long as we have the check". No. No you won't.

candied_orange
  • 102,279
  • 24
  • 197
  • 315
  • Yep, I'm comfortable with writing such a system myself if nothing is available, but I was wondering if an existing solution exists. – Joshua Walsh May 30 '17 at 00:45
  • 4
    If it does we don't do recommendations here. – candied_orange May 30 '17 at 00:47
  • Really? Based on https://softwareengineering.stackexchange.com/help/on-topic it seems my question is on-topic for this site. I can't find anything that suggests that examples of tools can't be provided. – Joshua Walsh May 30 '17 at 00:50
  • 3
    If there is a generic name for this kind of thing that can be provided but if you read the page you linked the line about recommendations points you to this: https://softwareengineering.meta.stackexchange.com/a/6487/131624 – candied_orange May 30 '17 at 01:01
  • 6
    To be clear, it's your comment that I'm objecting to rather than the whole question. – candied_orange May 30 '17 at 01:04
  • 1
    Thanks, I wasn't aware. I do think that the intention of that answer is more to prevent specific recommendations ("tool A is the best/my favourite") than to prevent any mentions entirely. ("tools A, B and C exist, up to you which you use") – Joshua Walsh May 30 '17 at 01:13
  • 2
    @YM_Industries SE sites tend to be self contained, recommandation are basically simple answers with links to external sites, or invite you to google it instead of a link but it's the same in the end. They may expire and become dead. So a question about recommendation is off-topic, however is someone wants to mention a tool as a complement of an answer or a simple comment, he can do it. – Walfrat May 30 '17 at 12:42
  • 2
    " I was wondering if an existing solution exists" - try asking us over at https://softwarerecs.stackexchange.com/ – Mawg says reinstate Monica May 31 '17 at 11:49
4

We had a very similar problem in our team. To solve this we wrote a static analysis check which handles these TODO's by checking the JIRA issue or Git Issue that they reference. Our build fails when the specified Issue moves past the "In Development" column.

Therefore we can comfortably have TODO's without worrying about them getting forgotten about.

I created an open-source implementation of this, in Java. Yes, a disclaimer is that I wrote this, but like I said it is completely open sourced and licensed.

The tool is called Westie and an example of the the Jira issue checker is on the README.md. See also the GitIssueAnalyser.

To prevent self promoting if you have any further questions, send me a message. If you decide to use it and have any suggestions, please raise any issues on github.

tjheslin1
  • 149
  • 1
  • 1
    That's cool! We use JIRA too, I might look into using this. It doesn't really resolve my concerns about creating clutter in our issue-management system, but at least it will guarantee that they can't be forgotten. – Joshua Walsh May 31 '17 at 10:18
  • @YM_Industries I'm glad. I'd be happy to accept any contributions or work on any issues raised. – tjheslin1 May 31 '17 at 17:02
4

Don't ToDo. Do it now.

TLDR: Write (and test) your DB scripts now, not later; just code them so their execution is contingent on DB version.

Example

For an example, let's imagine that you wish to change a column name from SSN to TaxID, a common requirement when going international.

To make this happen, maybe you will temporarily have both a TaxID and an SSN column. And while supporting both versions, you will have a trigger to update one from the other. But you don't want to keep that trigger around indefinitely, so later, when backward compatiblity is no longer needed, you want that trigger removed (and the SSN column dropped). We are going to code all that up front without any need for ToDo items.

In our example, we will be deploying build 102 (which has the new column) while maintaining compatibility with build 101 (which doesn't).

Here are the steps.

1. Set up versioning table

  1. Add a single table called Configuration with two columns, Name and Value.

  2. Add a row with a Name of "TargetVersion" and set the Value to the version of the new build to be deployed.

  3. Add a row with a Name of "CompatibleWith" and set the Value to the minimum version number with which the deployment must be compatible.

Inspect and update these rows before every deployment.

2. Modify deployment scripts

  1. Add a script that creates a new column of TaxID, side by side with SSN, and populates it from the SSN column. Enclose this code in an If statement that checks TargetVersion; if the target version is too low (i.e. the TaxID isn't needed yet), skip.

    SELECT @TargetVersion = TargetVersion FROM Configuration
    IF @TargetVersion < '102' THEN RETURN
    ALTER TABLE Customer ADD COLUMN taxID VarChar(12) NOT NULL
    UPDATE Customer SET TaxID = SSN
    
  2. Add a script that creates a trigger that populates TaxID when inserting or updating SSN and vice versa. Enclose this code in an If statement that checks the target version and compatible version; skip if TargetVersion is too low (the TaxID isn't needed) or if the CompatibleWith version is too high (the SSN field isn't needed).

    SELECT @TargetVersion  = TargetVersion,
           @CompatibleWith = CompatibleWith 
    FROM Configuration
    IF @TargetVersion  < '102' THEN RETURN
    IF @CompatibleWith > '101' THEN RETURN
    CREATE TRIGGER SSNAndTaxIDTrigger ON Customer etc.
    
  3. Add a script to remove the SSN column. Enclose in an If statement that removes the column only if the CompatibleWith version is high enough (SSN is no longer needed).

    SELECT @CompatibleWith = CompatibleWith FROM Configuration
    IF @CompatibleWith <= '101' THEN RETURN
    IF OBJECT_ID('SSNAndTaxIDTrigger') IS NOT NULL DROP TRIGGER SSNAndTaxIDTrigger
    IF EXISTS (SELECT * FROM syscolumns c JOIN sysobject o ON o.id = c.is WHERE o.Name = 'Custeomr' AND c.Name = 'SSN') BEGIN
        ALTER TABLE Customer DROP COLUMN SSN
    END
    

3. Testing

Be sure to test your deployment with any combination of Blue/Green version numbers that you wish to be able to support in production. You can test as soon as the code is ready, by manipulating the Configuration table in your QA environment.

4. In your deployment playbook

Add a step for an engineer to update the CompatibleWith version and TargetVersion rows. If you are deploying to Blue, set TargetVersion to Blue's version number and the CompatibleWith version to Green's version number; reverse them if you're deploying Green.

Pitfalls

It's OK for your deployment scripts to reference and rely on those version numbers held in that DB table. NOT runtime code.

If you start writing your runtime code to inspect version numbers, you are introducing a new level of complexity in your application that could potentially become a huge maintainability problem. Each runtime execution path must be tested; if you carry around these conditions going forward, QA will have to put together a matrix of pain to validate them with each and every release. My advice is to keep conditions like these in deployment scripts only.

The outcome of all this

In the end, you should be able to write all the code up front (and test it, too) without fear that it will be executing too early. Also, the code will clean up the backward-compatibility trigger when the time comes without you needing to worry about it further.

This way you can write and test all the code up front, when you're thinking about it, and you don't need to deal with those messy ToDo comments.

John Wu
  • 26,032
  • 10
  • 63
  • 84
  • I really like this approach, it's more elegant than ToDo comments. I thought of it shortly after asking this question and I was thinking about making another post asking about how best to implement this, but figured I'd do my own research first. The trick for us is that we're using Phinx for our database migrations, and it doesn't really support this. When I get time I'll look for a way to extend it to support this kind of workflow. This approach doesn't solve the issue of how to ensure that the backwards compatibility code gets removed from my app-tier, but it is elegant for the DB issue. – Joshua Walsh May 31 '17 at 10:24
1

You're getting a lot of pushback on your TODO idea, but I personally see no problem with it. In the end, the best (and easiest) way to make sure the migration goes into production is by failing a unit test if it doesn't. It will literally take you less than a minute to stub out an empty migration function that throws an exception if the version is 55 or more (or whatever the requirements are).

Then if you try to release it, you'll end up with a failed test, and someone will have to turn that exception into an actual migration code.

Eternal21
  • 1,584
  • 9
  • 11
  • 1
    Yep, I was ideally hoping to treat an expired TODO as a failed test. The amount of pushback against TODOs has surprised me a bit, I know they aren't a substitute for an issue management system but given the prevalence of TDD/BDD it's clear that there's no real issue with defining requirements in code and using code to enforce feature completion. – Joshua Walsh Jun 04 '17 at 12:48
-2

No one seems to be focusing on the root of his complaint which is the fact that database changes can take too many release cycles. He wants to carry on with his blue/green deployment schedule and the solution should already be there but unless I'm missing something his description seems to indicate that there is only one database which is shared by both systems. Not a true Blue/Green system if that is the case. Since it appears that the database is the long pole in the tent it should be duplicated as well so that no matter how long or how many release cycles it takes to implement the database changes on the offline system they don't go live until completed and fully tested. In the interim offline system scripts can keep the offline database fully updated daily.

  • 1
    Replicating the database in a blue/green deployment causes a lot of headache. When my prod env is somewhere between blue and green, (50% load distributed to each, for example) I need to have replication code keep both databases in sync, even if their schemas are different. From the research I have done it seems most people in the real world have a shared DB instance between their blue and green stacks. I don't see this as a major issue as long as database migrations are fairly swift. Blue/green stacks inherently need to share some resources, at the very least the load balancer/reverse proxy. – Joshua Walsh May 31 '17 at 10:21