How to avoid logical mistakes in code, when TDD didn't help?

Question

I was recently writing a small piece of code which would indicate in a human-friendly way how old an event is. For instance, it could indicate that the event happened “Three weeks ago” or “A month ago” or “Yesterday.”

The requirements were relatively clear and this was a perfect case for test driven development. I wrote the tests one by one, implementing the code to pass each test, and everything seemed to work perfectly. Until a bug appeared in production.

Here's the relevant piece of code:

now = datetime.datetime.utcnow()
today = now.date()
if event_date.date() == today:
    return "Today"

yesterday = today - datetime.timedelta(1)
if event_date.date() == yesterday:
    return "Yesterday"

delta = (now - event_date).days

if delta < 7:
    return _number_to_text(delta) + " days ago"

if delta < 30:
    weeks = math.floor(delta / 7)
    if weeks == 1:
        return "A week ago"

    return _number_to_text(weeks) + " weeks ago"

if delta < 365:
    ... # Handle months and years in similar manner.

The tests were checking the case of an event happening today, yesterday, a four days ago, two weeks ago, a week ago, etc., and the code was built accordingly.

What I missed is that an event can happen a day before yesterday, while being one day ago: for instance an event happening twenty six hours ago would be one day ago, while not exactly yesterday if now is 1 a.m. More exactly, it's one point something, but since the delta is an integer, it will be just one. In this case, the application displays “One days ago,” which is obviously unexpected and unhandled in the code. It can be fixed by adding:

if delta == 1:
    return "A day ago"

just after computing the delta.

While the only negative consequence of the bug is that I wasted half an hour wondering how this case could happen (and believing that it has to do with time zones, despite the uniform use of UTC in the code), its presence is troubling me. It indicates that:

It's very easy to commit a logical mistake even in a such simple source code.
Test driven development didn't help.

Also worrisome is that I can't see how could such bugs be avoided. Aside thinking more before writing code, the only way I can think of is to add lots of asserts for the cases that I believe would never happen (like I believed that a day ago is necessarily yesterday), and then to loop through every second for the past ten years, checking for any assertion violation, which seems too complex.

How could I avoid creating this bug in the first place?

By having a test case for it? That seems like how you discovered it afterwards, and meshes with TDD. — Οurous, Jul 12 '18 at 21:44
@Οurous: the problem is that I discovered it through a bug report, which, from a perspective of a developer, is a failure. I suppose there should be a way to *change the process* to avoid this situation in the first place. The loop I mentioned in my question would work; formal proof would work as well, but both seem overly complex to use for every piece of logic. — Arseni Mourzenko, Jul 12 '18 at 22:02
@ArseniMourzenko the root here is failure to produce readable code. Loops, proofs, or unit tests are only useful when they result in readable code. Don't avoid them because they are tedious. Avoid them when they aren't helping you make the code more readable. Readable code makes the bugs obvious. Use any dirty trick you can find to make the code readable. Pro tip: you are the worst judge of how readable your code is. So ask someone. — candied_orange, Jul 12 '18 at 22:45
Writing unit tests is easy. Writing useful unit tests is another matter entirely. You only managed to do the first part, not the "useful" part which would require one to think about the edge cases more than the 'easy' cases. The 'easy' cases tend to require little thought. — Dunk, Jul 12 '18 at 22:52
You've just experienced why I'm not a fan of test driven development--in my experience most bugs caught in production are scenarios that nobody thought of. Test driven development and unit tests do nothing for these. (Unit tests have value in detecting bugs introduced through future edits, though.) — Loren Pechtel, Jul 13 '18 at 01:39
Repeat after me: "There are no silver bullets, including TDD." There is no process, no set of rules, no algorithm you can robotically follow to produce perfect code. If there was, we could automate the whole process and be done with it. — jpmc26, Jul 13 '18 at 05:57
Congratulations, you rediscovered the old wisdom that no tests can prove the absence of bugs. But If you are looking for techniques to create a better coverage of the possible input domain, you need to make a thorough analysis of the domain, the edge cases and the equivalence classes of that domain. All the old, well known techniques long known before the term TDD was invented. — Doc Brown, Jul 13 '18 at 06:03
Personally, you have a larger flaw in your algorithm here: You refer to "Today", but the date-math is done against UTC. No customer is going to be in UTC, so the "Today" designation is just going to appear to arbitrarily change, and be unrelated to my wall clock. I'd either just work off of 24-hour periods ("4 hours ago", "2 days ago"), or get the user's timezone and use that to construct what you do here. — Clockwork-Muse, Jul 13 '18 at 07:11
I think what you gain from it is that, now you have lots of test cases for it, you are less likely to end up in an endless loop of fixing one bug and and introducing a new one. These kind of functions are very phrone to that otherwise. — Teimpz, Jul 13 '18 at 08:04
This is a specification problem - not a code problem. The behavior for 26 hours ago wasn't well specified from the get-go. When you deliver such a feature ask your stakeholders for behaviours and use cases and gather the tests from those. It sounds like that step is missing. Not posting an answer since this is not a 'coding' suggestion but a human one. — Benjamin Gruenbaum, Jul 13 '18 at 11:46
I'm not trying to be snarky, but your question could seemingly be rephrased as "how do I think of things I didn't think of?". Not sure what that has to do with TDD. — Jared Smith, Jul 13 '18 at 12:12
I'm confused... what's the difference between "Yesterday" and "A day ago"? — FGreg, Jul 13 '18 at 15:59
The biggest problem here is one you apparently still haven't thought of. Showing times in this way is NOT human-friendly (at least to this human), it's annoying. — jamesqf, Jul 13 '18 at 17:11
@jamesqf. I agree. From my perspective, knowing that something happened "a day ago" is pretty much useless. I cannot think of a case where I would be able to actually **do** something useful with this information. — Michael J., Jul 13 '18 at 17:51
@LorenPechtel Your logic to justify the implied "no TDD > TDD" approach is flawed. The idea that it's "OK" to avoid TDD just because _some_ issues devs didn't think of made it to production is like saying that there's no point in checking both sides of the road before crossing because you could always get run over by a car you didn't see coming. What OP should learn here is not that "TDD is bad", but rather, and as others have mentioned, that it's not a silver bullet. TDD is still very useful in its context, especially when trying to check for dozens, or hundreds, of possible/past regressions. — code_dredd, Jul 13 '18 at 21:32
Unit testing is about the ability to refactor code without introducing new bugs. It isn't about writing bug free code from scratch. — Shane, Jul 13 '18 at 22:02
You know the solution -- you say `loop through every second for the past ten years, checking for any assertion violation`, and you then **dismiss the solution as being too complex**. You're a computer programmer. You're in the business of *managing complexity*. There are only 315 million seconds in ten years! I have a cheap machine on my desk right now that does **four billion operations per second**, so **get on it**. Write a program that tests every second for the last ten years and asserts if there's a problem, and then you'll know if there's a problem! — Eric Lippert, Jul 13 '18 at 23:09
I write compilers for a living; the first thing I do when I'm trying out a new parser is *get an intern to write a program that generates a few hundred thousand test cases at random* and see what crashes. We have *tremendous resources* at our fingertips; don't shy away from using them! They are there for you to use. — Eric Lippert, Jul 13 '18 at 23:12
@ray I'm not saying TDD is useless. I'm saying that TDD doesn't have anywhere near the value it's proponents say it does. Look at Shane's comment--he gets it about testing. It's about safe refactoring. — Loren Pechtel, Jul 13 '18 at 23:18
@EricLippert Sure, a million test cases is no problem. Evaluating whether those million test cases are right isn't so simple. — Loren Pechtel, Jul 13 '18 at 23:18
@LorenPechtel Shane's comment is right and you're also right that it's about safe refactoring. The problem is that that's not what your original comment was saying, IMHO. It came across more like a rant against TDD than anything else. I'm not going to defend whatever it is that TDD marketers say, but the _implication_ that it's probably not worth the effort or something similar (I know you didn't _directly_ state that it was useless) is what I was replying to. — code_dredd, Jul 13 '18 at 23:24
@LorenPechtel - evaluating whether the large number of test cases "are right" is entirely simple, in both the case of the present question and the case of the parser (though it would be more accurate to speak of the code being right with respect to the test cases). In the case of the present question, the test cases "are right" if no assertions occur for any of them. In the case of the parser, the test cases "are right" if the parser does not crash when fed any of them. — Hammerite, Jul 14 '18 at 22:05
@Hammerite If you're going to make up a million test cases you also need a million answers for them--and that means you have a piece of code that does what the code you're writing does. Unless you're porting the code that's unlikely. — Loren Pechtel, Jul 14 '18 at 22:07
@LorenPechtel - that is not true if all you're trying to test is whether some piece of code blows up when you present it with each test case. As is the situation in both of the examples we are concerned with. There is no "answer" to each test case, merely the desire that each one should run benignly. — Hammerite, Jul 14 '18 at 22:11
@Hammerite No, the problem was it produced the wrong message in a case he didn't think about. That's going to take either a working piece of code or human intelligence to create the test cases. — Loren Pechtel, Jul 14 '18 at 22:19
@LorenPechtel - your message that I originally responded to was a response to Eric Lippert's comment about automatically generating test cases to test the stability of a parser, which he used as an analogy to the OP's idea (that they had dismissed out of hand) of testing their code against every second for the past ten years. This is unrelated to curated tests of what message should be produced by the software component under development by the OP in any particular case. That is not the kind of test that Lippert was discussing. — Hammerite, Jul 14 '18 at 22:32
Let me ask "if you didn't use TDD how many bugs will you find in production?", TDD is a tool, your implementation is the solution! — O.Badr, Jul 15 '18 at 05:28
Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/80188/discussion-on-question-by-arseni-mourzenko-how-to-avoid-logical-mistakes-in-code). — yannis, Jul 15 '18 at 13:27
It's not a question that should be answered, since it's predicated on false assumptions; "when TDD didn't help" - how do you know? You had a bug, granted, but that doesn't mean TDD didn't help. It doesn't mean that at all. — JᴀʏMᴇᴇ, Jul 17 '18 at 09:22
One problem I see off the bat, is the fact that you are not mocking your timer, and using utcnow() instead. That means your test may not be reproducible. Instead you should mock it and return the same time for now. — Eternal21, Jul 17 '18 at 18:07

score 151 · Answer 1 · answered Jul 12 '18 at 21:54

151

Test driven development didn't help.

It seems like it did help, its just that you didn't have a test for the "a day ago" scenario. Presumably, you added a test after this case was found; this is still TDD, in that when bugs are found you write a unit-test to detect the bug, then fix it.

If you forget to write a test for a behavior, TDD has nothing to help you; you forget to write the test and therefore don't write the implementation.

answered Jul 12 '18 at 21:54

esoterik

3,879
3
13
22

3

Anything point could be made that if the developer hadn't used tdd, they would have been much more likely to miss other cases as well. – Caleb Jul 13 '18 at 06:27
77

And, on top of that, think about how much time was saved when they we're fixing the bug? By having the existing tests in place, they knew instantly that their change didn't break existing behavior. And they were free to add the new test cases and refactor without having to run extensive manual tests afterward. – Caleb Jul 13 '18 at 06:29
15

TDD is only as good as the tests written. – Mindwin Remember Monica Jul 13 '18 at 20:09
1

Another observation: adding the test for this case will improve the design, by forcing us to take that `datetime.utcnow()` out of the function, and instead to pass `now` as a (reproducible) argument instead. – Toby Speight Jul 16 '18 at 12:21

score 115 · Answer 2 · answered Jul 13 '18 at 05:19

115

an event happening twenty six hours ago would be one day ago

Tests won't help much if a problem is poorly defined. You're evidently mixing calendar days with days reckoned in hours. If you stick to calendar days, then at 1 AM, 26 hours ago is not yesterday. And if you stick to hours, then 26 hours ago rounds to 1 day ago regardless of the time.

answered Jul 13 '18 at 05:19

Kevin Krumwiede

2,586
1
15
19

45

This is a great point to make. Missing a requirement does not neccisarily mean that your process for implementation failed. It just means that the requirement was not well defined. (Or you simply made a human error, which will happen from time to time) – Caleb Jul 13 '18 at 06:23
This is the answer I wanted to make. I'd define the spec as "if event was this calendar day, present delta in hours. Else use dates only to determine delta" Testing hours is only useful within a day, if beyond that your resolution is meant to be days. – Baldrickk Jul 13 '18 at 07:52
1

I like this answer because it points out the real problem: points in time and dates are two different quantities. They are related but when you start comparing them, things go south real fast. In programming, date and time logic is some of the hardest things to get right. I really dislike that a lot of dateimplementations basically store the date as a 0:00 point in time. It makes for a lot of confusion. – Pieter B Jul 16 '18 at 09:00

score 58 · Accepted Answer · answered Jul 13 '18 at 05:28

These are the kinds of errors you typically find in the refactor step of red/green/refactor. Don't forget that step! Consider a refactor like the following (untested):

def pluralize(num, unit):
    if num == 1:
        return unit
    else:
        return unit + "s"

def convert_to_unit(delta, unit):
    factor = 1
    if unit == "week":
        factor = 7 
    elif unit == "month":
        factor = 30
    elif unit == "year":
        factor = 365
    return delta // factor

def best_unit(delta):
    if delta < 7:
        return "day"
    elif delta < 30:
        return "week"
    elif delta < 365:
        return "month"
    else:
        return "year"

def human_friendly(event_date):
    date = event_date.date()
    today = now.date()
    yesterday = today - datetime.timedelta(1)
    if date == today:
        return "Today"
    elif date == yesterday:
        return "Yesterday"
    else:
        delta = (now - event_date).days
        unit = best_unit(delta)
        converted = convert_to_unit(delta, unit)
        pluralized = pluralize(converted, unit)
        return "{} {} ago".format(converted, pluralized)

Here you've created 3 functions at a lower level of abstraction which are much more cohesive and easier to test in isolation. If you left out a time span you intended, it would stick out like a sore thumb in the simpler helper functions. Also, by removing duplication, you reduce the potential for error. You would actually have to add code to implement your broken case.

Other more subtle test cases also more readily come to mind when looking at a refactored form like this. For example, what should best_unit do if delta is negative?

In other words, refactoring isn't just for making it pretty. It makes it easier for humans to spot errors the compiler can't.

Next step is to internationalize, and there `pluralize` only working for a subset of english words will be a liability. — Deduplicator, Jul 16 '18 at 16:15
@Deduplicator sure, but then depending on which languages/cultures you target, you might get away with only modifying `pluralize` using `num` and `unit` to build a key of some kind to pull a format string from some table/resource file. OR you might need a complete rewrite of the logic, because you need different units ;-) — Hulk, Jul 17 '18 at 08:01
A problem remains even with this refactorization, which is that "yesterday" doesn't make much sense in the very wee hours of the morning (shortly after 12:01 AM). In human friendly terms, something that happened at 11:59 PM doesn't suddenly change from "today" to "yesterday" when the clock slips past midnight. It instead changes from "1 minute ago" to "2 minutes ago". "Today" is too coarse in terms of something that happened but minutes ago, and "yesterday" is fraught with problems to night owls. — David Hammen, Jul 17 '18 at 11:59
@DavidHammen This is a usability issue and it depends on how precise you need to be. When you want to know at least down to the hour, I would not think "yesterday" is good. "24 hours ago" is much clearer and is a commonly used human expression to emphasize the number of hours. Computers that are trying to be "human friendly" almost always get this wrong and over-generalize it to "yesterday" which is too vague. But to know this you'll need to interview users to see what they think. For some things you really want the exact date and time, so "yesterday" is always wrong. — Brandin, Jul 18 '18 at 13:18

score 38 · Answer 4 · edited Jul 13 '18 at 08:34

38

You can't. TDD is great about protecting you from possible issues you are aware of. It doesn't help if you run into issues you've never considered. Your best bet is to have someone else testing the system, they may find the edge cases you never considered.

Related reading: Is it possible to reach absolute zero bug state for large scale software?

edited Jul 13 '18 at 08:34

Philipp

103
3

answered Jul 13 '18 at 00:38

Ian Jacobs

654
1
5
8

2

Having tests written by someone other than the developer is always a good idea, it means that both parties need to overlook the same input condition for the bug to make it into production. – Michael Kay Jul 15 '18 at 07:52

score 35 · Answer 5 · answered Jul 13 '18 at 03:36

35

There are two approaches I normally take that I find can help.

First, I look for the edge cases. These are places where the behavior changes. In your case, behavior changes at several points along the sequence of positive integer days. There is an edge case at zero, at one, at seven, etc. I would then write test cases at and around the edge cases. I'd have test cases at -1 days, 0 days, 1 hours, 23 hours, 24 hours, 25 hours, 6 days, 7 days, 8 days, etc.

The second thing I'd look for is patterns of behavior. In your logic for weeks, you have special handling for one week. You probably have similar logic in each of your other intervals not shown. This logic is not present for days, though. I would look at that with suspicion until I could either verifiably explain why that case is different, or I add the logic in.

answered Jul 13 '18 at 03:36

cbojar

4,211
1
17
18

9

This is a really important part of TDD that is often overlooked and I've rarely seen talked about in articles and guides - it's *really* important to test edge cases and boundary conditions as I find that's the source of 90% of bugs - of-by-one errors, over and underflows, last day of the month, last month of the year, leap-years etc etc – GoatInTheMachine Jul 13 '18 at 08:51
2

@GoatInTheMachine - and 90% of those 90% bugs are around daylight savings time transitions..... Hahaha – Caleb Jul 14 '18 at 06:06
1

You can first divide the possible inputs in [*equivalence classes*](https://en.wikipedia.org/wiki/Equivalence_partitioning) and then determine the edge cases at the classes' borders. Of ourse that's an effort which may be larger than the development effort; whether that's worth it depends on how important it is to deliver software as error-free as possible, what the deadline is and how much money and patience you have. – Peter - Reinstate Monica Jul 16 '18 at 12:04
2

This is the correct answer. A lot of busines rules require you to divide a range of values to intervals where they are cases to be handled different ways. – abuzittin gillifirca Jul 16 '18 at 12:24

score 15 · Answer 6 · answered Jul 13 '18 at 07:17

You can not catch logical errors that are present in your requirements with TDD. But still, TDD helps. You found the error, after all, and added a test case. But fundamentally, TDD only ensures that the code conforms to your mental model. If your mental model is flawed, test cases will not catch them.

But keep in mind , whilst fixing the bug, the test cases you already had made sure no existing, functioning behavior was broken. That is quite important, it is easy to fix one bug but introduce another.

In order to find those errors beforehand, you usually try to use equivalence-class based test cases. using that principle, you would choose one case from every equivalence class, and then all edge cases.

You would choose a date from today, yesterday, a few days ago, exactly one week ago and several weeks ago as the examples from each equivalence class. When testing for dates, you would also make sure that your tests did not use the system date, but use a pre-determined date for comparison. This would also highlight some edge cases: You would make sure to run your tests at some arbitrary time of the day, you would run it with directly after midnight, directly before midnight and even directly at midnight. This means for each test, there would be four base times it is tested against.

Then you would systematically add edge cases to all the other classes. You have the test for today. So add a time just before and after the behavior should switch. The same for yesterday. The same for one week ago etc.

Chances are that by enumerating all edge cases in a systematic manner and writing down test cases for them, you find out that your specification is lacking some detail and add it. Note that handling dates is something people often get wrong, because people often forget to write their tests so that they can be run with different times.

Note, however, that most of what I have written has little to do with TDD. Its about writing down equivalence classes and making sure your own specifications are detailed enough about them. That is the process with which you minimize logical errors. TDD just makes sure your code conforms to your mental model.

Coming up with test cases is hard. Equivalence-class based testing is not the end of it all, and in some cases it can significantly increase the number of test cases. In the real world, adding all those tests is often not economically viable (even though in theory, it should be done).

score 12 · Answer 7 · answered Jul 13 '18 at 05:20

The only way I can think of is to add lots of asserts for the cases that I believe would never happen (like I believed that a day ago is necessarily yesterday), and then to loop through every second for the past ten years, checking for any assertion violation, which seems too complex.

Why not? This sounds like a pretty good idea!

Adding contracts (assertions) to code is a pretty solid way of improving its correctness. Generally we add them as preconditions on function entry and postconditions on function return. For example, we could add a postcondition that all returned values are either of form "A [unit] ago" or "[number] [unit]s ago". When done in a disciplined way, this leads to design by contract, and is one of the most common ways of writing high-assurance code.

Critically, the contracts aren't intended to be tested; they are just as much specifications of your code as your tests are. However, you can test via the contracts: call the code in your test and, if none of the contracts raise errors, the test passes. Looping through every second of the past ten years is a bit much. But we can leverage another testing style called property-based testing.

In PBT instead of testing for specific outputs of the code, you test that the output obeys some property. For example, one property of a reverse() function is that for any list l, reverse(reverse(l)) = l. The upside of writing tests like this is you can have the PBT engine generate a few hundred arbitrary lists (and a few pathological ones) and check they all have this property. If any don't, the engine "shrinks" the failing case to find a minimal list that breaks your code. It looks like you're writing Python, which has Hypothesis as the main PBT framework.

So, if you want a good way to find more tricky edge cases you might not think of, using contracts and property-based testing together will help a lot. This doesn't replace writing unit tests, of course, but it does augment it, which is really the best we can do as engineers.

This is exactly the right solution to this kind of problem. The set of valid outputs is easy to define (you could give a regular expression very simply, something like `/(today)|(yesterday)|([2-6] days ago)|...`) and then you can run the process with randomly selected inputs until you find one that isn't in the set of expected outputs. Taking this approach *would* have caught this bug, and *would not* require realising that the bug might exist beforehand. — Jules, Jul 13 '18 at 16:29
@Jules See also [property checking/testing](https://en.wikipedia.org/wiki/QuickCheck). I usually write property tests during development, to cover as many unforeseen cases as possible and force me to think of general properties/invariants. I save one-off tests for regressions and such (which the author's issue is an instance of) — Warbo, Jul 14 '18 at 20:27
If you do that much looping in tests, if will take a very long time, which defeats one of the main goals of unit testing: run the tests **fast**! — CJ Dennis, Jul 16 '18 at 03:20

score 5 · Answer 8 · answered Jul 12 '18 at 22:05

This is an example where adding a bit of modularity would have been useful. If an error-prone code segment is used multiple times, it's good practice to wrap it in a function if possible.

def time_ago(delta, unit):
    delta_str = _number_to_text(delta) + " " + unit;
    if delta == 1:
        return delta_str + " ago"
    else:
        return delta_str = "s ago"

now = datetime.datetime.utcnow()
today = now.date()
if event_date.date() == today:
    return "Today"

yesterday = today - datetime.timedelta(1)
if event_date.date() == yesterday:
    return "Yesterday"

delta = (now - event_date).days

if delta < 7:
    return time_ago(delta, "day")

if delta < 30:
    weeks = math.floor(delta / 7)
    return time_ago(weeks, "week")

if delta < 365:
    months = math.floor(delta / 31)
    return time_ago(months, "month")

Chris Becke · Answer 9 · 2018-07-17T04:19:10.083

Test driven development didn't help.

TDD works best as a technique if the person writing the tests is adversarial. This is difficult if you are not pair-programming, so another way to think about this is:

Don't write tests to confirm the function under test works as you made it. Write tests that deliberately break it.

This is a different art, that applies to writing correct code with or without TDD, and one perhaps as complex (if not more so) than actually writing code. Its something you need to practice, and its something there is no single, easy, simple answer for.

The core technique to writing robust software, is also the core technique to understanding how to write effective tests:

Understand the preconditions for a function - the valid states (i.e. what assumptions are you making about the state of the class the function is a method of) and valid input parameter ranges - each data type has a range of possible values - a subset of which will be handled by your function.

If you do simply nothing more than explicitly testing these assumptions on function entry, and ensuring that a violation is logged or thrown and/or the function errors out with no further handling you can quickly know if your software is failing in production, make it robust and error tolerant, and develop your adversarial test writing skills.

NB. There is a whole literature on Pre and Post Conditions, Invariants and so on, along with libraries that can apply them using attributes. Personally I am not a fan of going so formal, but its worth looking into.

score 1 · Answer 10 · answered Jul 13 '18 at 21:20

This is one of the most important facts about software development: It is absolutely, utterly impossible to write bug-free code.

TDD won't save you from introducing bugs corresponding to test cases you didn't think of. It also won't save you from writing an incorrect test without realizing it, then writing incorrect code that happens to pass the buggy test. And every other single software development technique ever created has similar holes. As developers, we are imperfect humans. At the end of the day, there is no way to write 100% bug-free code. It never has and never will happen.

This isn't to say that you should give up hope. While it's impossible to write completely perfect code, it's very possible to write code that has so few bugs that appear in such rare edge cases that the software is extremely practical to use. Software that does not exhibit buggy behavior in practice is very much possible to write.

But writing it requires us to embrace the fact that we will produce buggy software. Almost every modern software development practice is at some level built around either preventing bugs from appearing in the first place or protecting ourselves from the consequences of the bugs we inevitably produce:

Gathering thorough requirements allows us to know what incorrect behavior looks like in our code.
Writing clean, carefully-architected code makes it easier to avoid introducing bugs in the first place and easier to fix them when we identify them.
Writing tests allows us to produce a record of what we believe many of the worst possible bugs in our software would be and prove that we avoid at least those bugs. TDD produces those tests before the code, BDD derives those tests from the requirements, and old-fashioned unit testing produces tests after the code is written, but they all prevent the worst regressions in the future.
Peer reviews mean that every time code is changed, at least two pairs of eyes have seen the code, decreasing how frequently bugs slip into master.
Using a bug tracker or a user story tracker that treats bugs as user stories means that when bugs appear, they're kept track of and ultimately dealt with, not forgotten about and left to consistently get in users' ways.
Using a staging server means that before a major release, any show-stopper bugs have a chance to appear and be dealt with.
Using version control means that in the worst-case scenario, where code with major bugs is shipped to customers, you can perform an emergency rollback and get a reliable product back into your customers' hands while you sort things out.

The ultimate solution to the problem you've identified is not to fight the fact that you can't guarantee you'll write bug-free code, but rather to embrace it. Embrace industry best practices in all areas of your development process, and you will consistently deliver code to your users that, while not quite perfect, is more than robust enough for the job.

score 1 · Answer 11 · answered Jul 14 '18 at 06:35

You simply did not have thought of this case before and therefore didn't have a test case for it.

This happens all the time and is just normal. It's always a trade-off how much effort you put in creating all possible test cases. You can spent infinite time to consider all test cases.

For an aircraft autopilot you would spend much more time than for a simple tool.

It often helps to think about the valid ranges of your input variables and test these boundaries.

In addition, if the tester is a different person than the developer, often more significant cases are found.

score 1 · Answer 12 · answered Jul 16 '18 at 03:34

(and believing that it has to do with time zones, despite the uniform use of UTC in the code)

That's another logical mistake in your code for which you don't have a unit test yet :) - your method will return incorrect results for users in non-UTC timezones. You need to convert both "now" and the event's date to user's local timezone before calculating.

Example: In Australia, an an event happens at 9am local time. At 11am it will be displayed as "yesterday" because the UTC date has changed.

score 0 · Answer 13 · answered Jul 14 '18 at 10:54

Let somebody else write the tests. This way somebody unfamiliar with your implementation might check for rare situations that you haven't thought of.
If possible, inject test cases as collections. This makes adding another test as easy as adding another line like yield return new TestCase(...). This can go in the direction of exploratory testing, automating the creation of test cases: "Let's see what the code returns for all the seconds of one week ago".

score 0 · Answer 14 · answered Jul 16 '18 at 02:56

You appear to be under the misconception that if all of your tests pass, you have no bugs. In reality, if all of your tests pass, all the known behaviour is correct. You still don't know if the unknown behaviour is correct or not.

Hopefully, you are using code coverage with your TDD. Add a new test for the unexpected behaviour. Then you can run just the test for the unexpected behaviour to see what path it actually takes through the code. Once you know the current behaviour, you can make a change to correct it, and when all the tests pass again, you'll know you've done it properly.

This still doesn't mean that your code is bug free, just that it is better than before, and once again all the known behaviour is correct!

Using TDD correctly doesn't mean you will write bug free code, it means you will write fewer bugs. You say:

The requirements were relatively clear

Does this mean that the more-than-one-day-but-not-yesterday behaviour was specified in the requirements? If you missed a written requirement, it's your fault. If you realised the requirements were incomplete as you were coding it, good for you! If everybody who worked on the requirements missed that case, you're no worse than the others. Everyone makes mistakes, and the more subtle they are, the easier they are to miss. The big take away here is that TDD does not prevent all errors!

score 0 · Answer 15 · answered Jul 16 '18 at 16:04

It's very easy to commit a logical mistake even in a such simple source code.

Yes. Test driven development does not change that. You can still create bugs in the actual code, and also in the test code.

Test driven development didn't help.

Oh, but it did! First of all, when you noticed the bug you already had the complete test framework in place, and just had to fix the bug in the test (and the actual code). Secondly, you don't know how many more bugs you would have had if you had not done TDD in the beginning.

Also worrisome is that I can't see how could such bugs be avoided.

You can't. Not even NASA has found a way to avoid bugs; we lesser humans certainly don't, either.

Aside thinking more before writing code,

That is a fallacy. One of the greatest benefits of TDD is that you can code with less thinking, because all those tests at least catch regressions pretty well. Also, even, or especially with TDD, it is not expected to deliver bug-free code in the first place (or your development speed will simply grind to a halt).

the only way I can think of is to add lots of asserts for the cases that I believe would never happen (like I believed that a day ago is necessarily yesterday), and then to loop through every second for the past ten years, checking for any assertion violation, which seems too complex.

This would clearly conflict with the tenet of only coding what you actually need right now. You thought you needed those cases, and so it was. It was a non-critical piece of code; as you said there was no damage except you wondering about it for 30 minutes.

For mission-critical code, you actually could do what you said, but not for your everyday standard code.

How could I avoid creating this bug in the first place?

You don't. You trust in your tests to find most regressions; you keep to the red-green-refactor-cycle, writing tests before/during actual coding, and (important!) you implement the minimum amount necessary to make the red-green switch (not more, not less). This will end up with a great test coverage, at least a positive one.

When, not if, you find a bug, you write a test to reproduce that bug, and fix the bug with the least amount of work to make said test go from red to green.

score -2 · Answer 16 · answered Jul 15 '18 at 12:54

You just discovered that no matter how hard you try, you'll never be able to catch all possible bugs in your code.

So what this means is that even attempting to catch all bugs is an exercise in futility, and so you should only use techniques such as TDD as a way of writing better code, code that has fewer bugs, not 0 bugs.

That in turn means you should spend less time using these techniques, and spend that saved time working on alternative ways to find the bugs that slip through the development net.

alternatives such as integration testing, or a test team, system testing, and logging and analysing those logs.

If you cannot catch all bugs, then you must have a strategy in place for mitigating the effects of the bugs that slip past you. If you have to do this anyway, then putting more effort into this makes more sense than trying (in vain) to stop them in the first place.

After all, its pointless spending a fortune in time writing tests and the first day you give your product to a customer it falls over, particularly if you then have no clue how to find and resolve that bug. Post-mortem and post-delivery bug resolution is so important and needs more attention than most people spend on writing unit tests. Save the unit testing for the complicated bits and don't try for perfection up front.

This is extremely defeatest. `That in turn means you should spend less time using these techniques` - but you just said it'll help with fewer bugs?! — JᴀʏMᴇᴇ, Jul 17 '18 at 09:21
@JᴀʏMᴇᴇ more a pragmatic attitude of which technique gets you most bang for your buck.I know people who are proud that they spend 10 times writing tests than they did on their code,_and they still have bugs_ So being sensible, rather than dogmatic, about testing techniques is essential. And integration tests have to be used anyway, so put more effort into them than into the unit testing. — gbjbaanb, Jul 17 '18 at 15:04

How to avoid logical mistakes in code, when TDD didn't help?

16 Answers16