129

Is there any engineering reason why is it like that? I was wondering in the case of a RDBMS that it had something to do with performance, since a "YEAR" is more specific than a "MONTH", for instance: you only have one year 2000, but every year have "January", which would make it easier/faster to filter/sort something by year first, and that's why the year comes first.

But I don't know if that really makes sense... Is there any reason at all?

TylerH
  • 105
  • 4
lucaswxp
  • 1,379
  • 2
  • 8
  • 10
  • 3
    Performance is not an issue in this case: dates are not stored as strings internally. – IMil Sep 25 '18 at 03:23
  • 5
    Sorting is not performed on the internal representation but on the presented one. – candied_orange Sep 25 '18 at 03:24
  • 14
    @IMil We may not like it, but quite often they are stored as strings. – Honza Brabec Sep 25 '18 at 07:00
  • 5
    @HonzaBrabec sure, they often do, but the OP mentioned RDBMS, and any decent DB has a separate date type – IMil Sep 25 '18 at 07:19
  • 14
    @candied_orange That would be strange, especially in the case of dates. – glglgl Sep 25 '18 at 07:40
  • 2
    @IMil - Even so, it's still common for programmers to use date strings instead of the database's dedicated date type. – Dave Sherohman Sep 25 '18 at 07:55
  • 4
    @lucaswxp - it has been mentioned already, but have a look at what Lexicographical order is about. It is default on many places and also easy to implement and very fast, therefore preferred. – gorn Sep 25 '18 at 08:56
  • 47
    https://xkcd.com/1179/ – 0x2b3bfa0 Sep 25 '18 at 13:19
  • 1
    @Helio That xkcd comic has a mistake. One of the "discouraged" formats is allowed by the ISO standard. For historical reasons there are actually locale issues for some cultures with the "usuall" format for date-only values in some dialects of SQL (namely Sql Server), such that the unseparated format is actually preferred. DateTime values should still used full separated format -- though even a lot of Sql Server people don't know about this. – Joel Coehoorn Sep 25 '18 at 18:18
  • 19
    As a side note, this format is not *that* alien. For example, in Hungarian language (and probably some others too) YYYY. MM. DD. is the default written date format, and has been a long time before computers. – Neinstein Sep 25 '18 at 20:50
  • 1
    @HonzaBrabec I DO like it, a LOT. Enough so I tend to use it when not programming :) – jwenting Sep 26 '18 at 07:01
  • 4
    I want to address 1 area of this question that has been neglected, the OP asked about dates in the context of an RDBMS. In this case the storage and presentation are different and most database tools allow the presentation to be configured to your local preference. That siad Pieter's answer is complete and correct in specifying the reason for writing dates year/month/day: https://softwareengineering.stackexchange.com/a/378936/15931 – Robert Sep 25 '18 at 09:56
  • 1
    I'd say the main reason is probably drilling down. You always start from something broad and then down to the specific. A year is a long range of time, a month is shorter and a day is shortest. From there we go to an hour-min-sec, largest to smallest. Just like if we read a number: 284.301, we start with the largest number and then as we read to the right, it gets smaller and more specific. – Sensoray Sep 26 '18 at 14:24
  • 4
    It follows alphanumerical order. That's really all there is to it. Since 19990514 is greater than 19990122, it's obvious that it's a later date. Makes comparisons super simple. – user91988 Sep 26 '18 at 18:16
  • 31
    In programming, the default date format is "YYYYMMDD"? It would be nice if that were true, but that's definitely not the case everywhere. RFC 822 and RFC 850, as well as ANSI C's `asctime`, are still widely used in a lot of places. It's nice that RFC 3339 and ISO 8601 are gradually displacing the older formats, and they're certainly what should be used going forward. More generally, I would say the ISO 8601 basic form (plain YYYYMMDD without separator characters) is actually *less* common than some other forms, like YYYY-MM-DD. – Daniel Pryden Sep 26 '18 at 19:08
  • 1
    As @Robert says, in RDBMSs the internal and presentation formats are often different, not least because it is currently the year 2018 only in the Gregorian calendar. It is currently the year 1440 in the Islamic calendar, 4715 in the traditional Chinese, and 5779 in the Hebrew calendar. They are all different presentation formats, and some RDBMS will let you use various non-Gregorian calendars: https://docs.oracle.com/database/121/NLSPG/applocaledata.htm#NLSPG594 – David Aldridge Sep 26 '18 at 19:48
  • 8
    “a YEAR is more specific than a MONTH” - might be semantics, but you have that backwards. If your boss said “we’re on a deadline. I need your code submitted in 2018”, you might say “can you be more specific?” I think maybe you mean “more unique”. – Blair Fonville Sep 27 '18 at 05:32
  • 2
    Technically speaking, the default date _storage_ format is (or at least, should be) epoch milliseconds (or seconds, if you prefer) UTC. Years/months/dates are all purely presentation concerns, because humans are bad at parsing epoch timestamps. – aroth Sep 27 '18 at 15:32
  • 1
    Judging from https://en.wikipedia.org/wiki/Date_format_by_country this is also the primary, or one of two or three primary formats for around one third of the worlds' population – leo Sep 28 '18 at 06:59
  • I don't see any mention of this additional reason: Of the three alternative orderings (y-m-d, m-d-y and d-m-y), only y-m-d is unambiguous. – ikegami Oct 02 '18 at 09:48
  • @DanielPryden nowhere in either RFC 822 or 850 nor in asctime do you have anything which resembles YYYYMMDD, so when you see 20181002 as a date you can be very sure it's not a date from those specs, there is no ambiguity. – Pieter B Oct 02 '18 at 13:42
  • @PieterB: I think that's my point. ISO 8601 is a good standard, but it's not "the default". RFC 822 was the internet standard for a long time, and lots of people wrote lots of dates in ways that were incompatible with ISO 8601. Not to mention using epoch timestamps, which I think is probably even more of a "default" than any text-based format. I'm not arguing whether ISO 8601 is a good idea, or whether the question is indirectly referencing ISO 8601, I'm just saying that the premise of the question is incorrect. Perhaps the question should be phrased "why is ISO 8601 a good idea?" instead. – Daniel Pryden Oct 02 '18 at 15:37
  • 1
    Because it is ordered by its significance. When you write down five-hundred thirty one dollars in numbers, you don't write 135, in reverse significance order, but 531. And since it is ordered by significance, talking about YYYYMMDD, you can simply compare them to each other. – Aphton Oct 04 '18 at 17:49
  • 1
    An equally valid, non-programming question, is "Why does the US use the dumbest of all date formats - Month Date Year? – Greg Woods Mar 23 '20 at 17:18

14 Answers14

397

This way, the dates can easily be sorted as strings using the default sorting rules (i.e. lexicographical sorting).

This is also why both month and day are specified using two digits (adding a leading zero if needed).

In fact it is one of the date formats defined by ISO 8601. That standard also defines a date-and-time format, 2015-03-27T15:26:40Z, which is also sortable as strings.

However, YYYYMMDD has an added benefit of making it possible to easily (no substrings or character replacements involved) parse the string as an integer, and still use default ordering on integers.

Ed Avis
  • 105
  • 1
Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513
  • Ok, but couldnt the sort be just as fast as long as you know where the year is located in the string? Because the whole date must be loaded into memory anyway, not only the year. – lucaswxp Sep 24 '18 at 23:50
  • 13
    @lucaswxp It also helps in Windows Explorer, Finder, etc. If you have a list of files or folders named by date then you don't have to sort them when enumerating. – Dan Wilson Sep 25 '18 at 00:34
  • 91
    @lucaswxp: If you write a special-case comparison for strings following a specific schema, you can of course make it as baroque as you want. The thing here is that the schema is designed such that lexical order (as well as lexical number-aware order) is also logical order, so no need for customizing. – Deduplicator Sep 25 '18 at 00:48
  • 19
    @lucaswxp Your date string might not be in memory. Practical example: You have a csv file already sorted by ISO date and millions+ of rows per year. And you want to return only the rows between certain dates. You can read the file line by line (row by row) until you reach your first date, then load rows into memory until you reach your last date. You can skip the rest of the file. But if you save the date as some other format, or sorted by year only, you would have to read trough the entire year worth of records before closing the file. – emptyother Sep 25 '18 at 05:00
  • 49
    Note that the dashes are optional in ISO 8601, so YYYYMMDD *is* ISO 8601. – Martin Ba Sep 25 '18 at 11:14
  • 5
    "This way, the dates can easily be sorted as strings: year first, then month, then day" - this istrue... until 31st December 9999. The 1st [January 10000](https://en.wikipedia.org/wiki/Year_10,000_problem) will not be sorted correctly with this format. But none of us will be here to check that ;-) – Benoit Sep 25 '18 at 13:57
  • 33
    @Benoit A proposal has already been made to solve the Y10K problem. If we're still using the same era then, we'll go to AYYYYYMMDD until Y100K, which will be BYYYYYYMMDD, CYYYYYYYMMDD, DYYYYYYYYMMDD, EYYYYYYYYYMMDD. This leading alpha prefix assures correct sort order (provided that "A0YYYY..." etc. are invalid representations if any YYYY... dates are still used). At some point when the number of year digits are divisible by three, we'll start adding three digits each time we change alpha prefix, in order to assure we don't run out of letters before the heat death of the universe. – Monty Harder Sep 25 '18 at 15:00
  • 35
    It's important to note that with this format, sorting isn't just "easier." The lexical (character based) sort becomes equivalent to the temporal sort, which means you can **sort temporally without parsing**. – jpmc26 Sep 25 '18 at 15:39
  • @MontyHarder I can't wait to see the standards discussion on the day they finally decide to add one more digit than absolutely necessary, thinking ahead for 1000 billion years in the future. (See also the NTP datestamps) – Cort Ammon Sep 25 '18 at 17:25
  • 6
    @Benoit: Monty seems to be referring to [RFC 2550 - Y10K and Beyond](https://tools.ietf.org/html/rfc2550). – dan04 Sep 25 '18 at 19:44
  • You can also sort "DDMMYYYY" as strings, simply by starting at the end for comparison. What you really mean is lexicographic sorting. – phresnel Sep 26 '18 at 07:56
  • 6
    It can also be confusing if you don't put the year first. This `YYYYMMDD` standard (though I prefer the ISO dashes) makes it easier to know which is the month and which is the day. If your application runs in multiple locales, your US users may be used to month/day/year, while the UK is used to day/month/year. – ps2goat Sep 26 '18 at 15:36
  • 3
    @CortAmmon Even if they are not thinking ahead for use past universal heat death, there could still be a need for dates beyond the heat death which will not be experienced personally. Even if the end date of that format coincided exactly with the end of everything, imagine someone getting a 30 year mortgage loan 10 years before the end and requesting an amortization table from the software. That customer won't take "the universe is ending in 20 years" as an excuse for the failure to provide payment details. – Aaron Sep 26 '18 at 19:32
  • It's true that YYYYMMDD can be sorted as a string correctly, but an RDBMS such as Oracle does not store, nor sort, date values as a string. Oracle in particular has an internal representation of the date as seven bytes representing century, year, month, day, hour, minute, second, to which the sort is applied (so it includes time as well, which would take 14 bytes if it were a string). So it is in YYYYMMDDhhmmss order, but sort-of numerical. https://stackoverflow.com/questions/13568193/how-are-dates-stored-in-oracle – David Aldridge Sep 26 '18 at 19:54
  • 1
    @Aaron I don't think so, with the heat death we might also experience that time doesn't flow linearly anymore (just like at the big bang, the most accepted theory is that time reaches a singularity at the big bang, and thus there is no "negative time"). If time is also part of the fundamental forces, it would mean that it too breaks down and there is literary and "end date" that we approach but can never reach. – paul23 Sep 26 '18 at 21:43
  • 2
    @paul23 My point was that even with such an end date, you might need to represent a date which does not exist. If there is a time you cannot reach, you might still need an abstract representation of that non-existing time in your software. Although, I suppose your suggestion of non-linear time might change how we even view the algorithm in that software. Still, I suggest you might have such a need. "What do you mean I can't take out a conventional loan just because my payments will be due at a time that doesn't exist? You still need to abide by the law, and I qualify for this 30 year loan!" – Aaron Sep 26 '18 at 22:13
  • 1
    @Aaron when you are no longer talking about time in a linear fashion it also starts meaning that if you t=x and a delta-t, dt=y, t+dt != x+y, so yes a "30 year loan" at heat death (if the hypothesis that time itself is also a fundamental field holds) would be over at the instant it is "given", as at that point t+t2 = t for all t2. – paul23 Sep 26 '18 at 22:19
  • 1
    @paul23 But that makes the hypothetical thought less humorous. :( I was enjoying thinking about the end of time as if it would be business as usual. Sad but funny. – Aaron Sep 26 '18 at 22:30
  • 4
    @Aaron I'd love to see a move or something about IT admins and/or programmers at work in the last moments before the universe ends as they try to do everything they can to keep the systems working, while the laws of physics stop working. – VLAZ Sep 27 '18 at 13:09
  • @paul23 Well, without doing anything fancy, the arrow of time starts breaking down as you approach heat death. Arrow of time as we experience it is the gradient of entropy, and heat death is a universe with no entropy gradients left. So as you approach it going "forward" and "backward" in time becomes increasingly symmetrical. Having a "you" present to experience it becomes impossible, as a "you" implies narrative consciousness, and that requires an arrow of time to experience consciousness. Forward and backward in time becomes symmetric, with the exception of some far off (invisible) event. – Yakk Sep 27 '18 at 13:54
  • 2
    @dan04 I took that RFC as a starting point and modified it slightly to avoid the need to add caret prefixes, by adding the bit about adding digits three at a time to conserve alpha prefixes. Yes, I realize it was originally a joke, but it's also a practical solution to the problem. – Monty Harder Sep 27 '18 at 17:39
  • 2
    DanWilson's point is key: If I'm writing a tool that sorts by date, I can make it intelligent about how to parse the date; but if I'm using someone else's tools, they're much more likely to be able to sort alphabetically (for ASCII characters at least) without special configuration, than to be able to sort according to a specific date format. – LarsH Sep 27 '18 at 17:56
  • @MontyHarder it's unfair that over the next million years 99% of humans should have to prefix their date with a nonsensical character just so 1% of humans (us and people alive in the next 8000 years) wouldn't have to. – Boris Verkhovskiy Oct 02 '18 at 13:59
  • @boris They can certainly adopt a different scheme, but it will cost them backward compatibility. Neither of us will be around to undertake the task of converting ISO-8601 dates to a new scheme, so it's also unfair for us to saddle those future humans with Y10K problems. – Monty Harder Oct 03 '18 at 13:51
138

Not mentioned yet, but you quickly gloss over the order inside YYYY. That's already millennia, centuries, decades, years. That is to say, YYYY is already ordered from longest period to shortest period. The same goes for MM and DD, that's just how the number system works.

So to keep the order between fields consistent with the order within fields, the only option is YYYYMMDD.

As zahbaz and Arseni Mourzenko noted, the YYYYMMDD formats sorts easily. That is not a lucky coincidence, that's a direct consequence of putting the fields for the longest duration first (and keeping the length fixed; we are introducing a Y10K problem here.)

MSalters
  • 8,692
  • 1
  • 20
  • 32
  • 34
    While you may be joking, this code may seriously come to haunt us in 8000 years. Code lives longer than anyone expects… – deceze Sep 25 '18 at 15:42
  • 16
    @deceze ISO8601 already has provisions for a 5 digit year, but it would be interesting to see which DateTime implementations currently allow for it. – Zac Faragher Sep 26 '18 at 02:16
  • 4
    @ZacFaragher, I'm sure we'll have plenty of time to implement that later, no need to rush, right...? – ilkkachu Sep 26 '18 at 17:46
  • 52
    @deceze Why did you unfreeze me--have you figured out how to cure cancer? No, it is the year 9999 and you know COBOL. – user3067860 Sep 28 '18 at 18:54
  • 3
    @deceze: we already have [RFC2550](https://tools.ietf.org/html/rfc2550) to solve the issue once and for all! I think we should tackle that problem as soon as we fixed the global warming problem (which we have to solve before year 3000 btw). – Michael Sep 28 '18 at 20:02
  • 6
    You may wish to fix your typo. The word *millennia*, the plural of *millennium*, is obligatorily spelled with a double-N to match the double-N in *annual* from Latin *annus* for year. When you misspell it with just a single-N, it now unhappily matches the single-N of *anal* from Latin *anus* with the same meaning as its loanword into English sports. In short, you always need to spell it in a way that means you’re talking about thousands of years, not thousands of butt-holes. :) – tchrist Sep 29 '18 at 16:28
  • 4
    @user3067860 "oh, that? We cured that kind of cancer almost 8 millennia ago. What changed is now, you have a skill marketable enough to afford the treatment!" – Harper - Reinstate Monica Oct 03 '18 at 02:39
  • "While you may be joking, this code may seriously come to haunt us in 8000 years". It certainly won't come to haunt "us" in 8000 years. We will all have been dust for a long, long, time. Who knows what different CPU architectures will be in place by then, they'd probably only run code form this era in x86 emulators, for kicks. – Doctor Jones Oct 05 '18 at 09:09
58

Is there any reason at all?

Yes. Those pieces of software will be using ISO 8601.

ISO 8601 has a number of advantages over other date formats:

  • It's a standard with a spec document :)
  • It's unambiguous. mm/dd/yyyy and dd/mm/yyyy can be confusing unless it's past the 13th day.
  • It lexicographically sorts into ascending time order, so no special date-sorting logic is required. This is especially useful in filenames, where lexicographical number sorting is often confusing (e.g. 1_file, 10_file, 2_file).
  • It mandates 4-digit year and zero padded month and year. This avoids the year 2000 problem and other ambiguities.

As for why ISO 8601 exists in the first place, it's because people were finding date-formats ambiguous and confusing when swapping data between countries/systems, and they needed something unambiguous.

For the rationale see the spec's introduction.

Although ISO Recommendations and Standards in this field have been available since 1971, different forms of numeric representation of dates and times have been in common use in different countries. Where such representations are interchanged across national boundaries misinterpretation of the significance of the numerals can occur, resulting in confusion and other consequential errors or losses. The purpose of this International Standard is to eliminate the risk of misinterpretation and to avoid the confusion and its consequences.

...

This International Standard retains the most commonly used expressions for date and time of the day and their representations from the earlier International Standards and provides unique representations for some new expressions used in practice. Its application in information interchange, especially between data processing systems and associated equipment will eliminate errors arising from misinterpretation and the costs these generate. The promotion of this International Standard will not only facilitate interchange across international boundaries, but will also improve the portability of software, and will ease problems of communication within an organization, as well as between organizations.

The standard defines “basic” variations as minimizing the use of delimiters. So, YYYYMMDD is the basic alternate to the extended format YYYY-MM-DD.

Basil Bourque
  • 1,000
  • 5
  • 10
Pod
  • 734
  • 4
  • 7
  • 4
    I didn't know that ISO 8601 also allowes YYYYMMDD besides YYYY-MM-DD. – keuleJ Sep 26 '18 at 19:12
  • https://www.iso.org/iso-8601-date-and-time-format.html seems to indicate that the "Extended format" of YYYY-MM-DD is the only format for 8601? – Oskar Austegard Sep 27 '18 at 19:08
  • 3
    @keuleJ Minimizing the use of delimiters such as YYYYMMDD rather than YYYY-MM-DD is called “basic” format variation in the ISO 8601 standard. – Basil Bourque Sep 27 '18 at 21:55
  • Two more benefits of the ISO 8601: (a) Easy to parse by machine with no SPACE character and no localized text, and (b) Easy to intuit by humans across cultures with the year coming first being easy to recognize (if contemporary), and without assuming English language. – Basil Bourque Sep 27 '18 at 22:06
56

It's because all the other ways to do it are ambiguous.

01/02/2003 what does that mean? January second 2003? Or in Europe: February 1st 2003? It gets even worse if you use two digits for the year, as 01/02/03.

That is why you use YYYYMMDD, it's the convention which enables us to communicate clearly about dates, 20030201 as a date is always clear. (and it makes it easier to sort)

(Now don't go storing that as the integer 20 million 30 thousand 2 hundred and 1. please ok? pretty please?)

Ed Avis
  • 105
  • 1
Pieter B
  • 12,867
  • 1
  • 40
  • 65
  • 2
    Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/83679/discussion-on-answer-by-pieter-b-is-there-any-technical-reason-why-in-programmi). – maple_shaft Sep 26 '18 at 13:44
  • 15
    _"20030201 as a date is always clear"_: That is absolutely not the case. It is just as ambiguous as "01/02/2003" unless you know that YYYYMMDD (or is it YYYYDDMM or DDMMYYYY?...) is the format being used. You ALWAYS need to know the format of the date; there is no "convention" that makes things unambiguous. – skomisa Sep 27 '18 at 17:20
  • 6
    @skomisa that is quite incorrect. ISO 8601 defined the international standard date format specifically for the reasons you stated. None of the other formats are valid date formats and have not been since 19880605 – K. Alan Bates Sep 27 '18 at 20:25
  • 11
    @K.AlanBates Your date is ambiguous unless we assume that it should be parsed according to ISO 8601. – Stop harming Monica Sep 27 '18 at 21:00
  • @Goyo Exactly so. – skomisa Sep 27 '18 at 21:16
  • 16
    20030201 is the 20th of March 201AD, right? – David Richerby Sep 27 '18 at 23:07
  • 3
    This date format may be technically ambiguous if the standard is not specified (as the smartasses have pointed out), but to a human observer, in practical settings, it is far _less likely_ to be ambiguous, whereas the other ways are highly likely to be ambiguous. That may not help you with parsing, but it certainly has its benefits. – Peter Sep 28 '18 at 12:28
  • 2
    2003FEB01 is a convention that is rather unambiguous – Martijn Sep 28 '18 at 13:30
  • 10
    @Martijn, but language-specific. In Turkey, it is Şubat rather than February (Before you think your code works, [always check Turkey](https://haacked.com/archive/2012/07/05/turkish-i-problem-and-why-you-should-care.aspx/)). – NH. Sep 28 '18 at 16:43
  • 2
    @Goyo disagree. ISO8601 is the reference date format and anyone attempting to define a separate format -no matter how familiar to a specific culture- are communicating invalid dates. Period. They are string garbage that "sometimes happen to communicate the intended moment in time." – K. Alan Bates Sep 28 '18 at 20:20
  • 3
    @K.AlanBates Well I guess there are 10 types of people, those who take standards as references and those who don't. – Stop harming Monica Sep 28 '18 at 20:40
  • 3
    @Peter I would say only with separators (YYYY-MM-DD) it is less likely to be ambiguous. The main reason for using YYYYMMDD seems to be because it uses less characters. That also applies for many other formats that have separators though. If I read 20012002 I don't see why ISO8601 is more likely than any other format with omitted separators. (Of course I could make guess based on the context, locale and maybe format adoption statistics which format is more likely, but that is not an inherent value of the formats themselves) – kapex Sep 29 '18 at 12:30
  • @kapex That is a good example, but a contrived one. Compare this with 02-03-1999. To figure out which is the day and which is the month (even with separators), I need to know the cultural context. 19990302, however, is very unlikely to refer to the year 0302. No need to look for specific examples, this happens for over one third of the days in every year. That what i mean by "more likely to be ambiguous"; the probability over all dates. The reason ISO8601 is less ambiguous, is that there is no common convention that goes year-day-month. – Peter Oct 01 '18 at 03:44
  • 3
    @Peter How is it contrived? If you see a bunch of dates, it's usually not hard to figure out what format they're in (your example doesn't need the assumption that 1999 is a more common year to be talking about than 302: 19990302 is not a valid date if read as DDMMYYYY or MMDDYYYY). But you might have only one data item to inspect, and you might be unlucky, and it might be 20012002 rather than 02031999. – David Richerby Oct 01 '18 at 10:59
  • @HN It's not a perfect format to be sure, but it's pretty unambiguous, something claimed to be impossible. – Martijn Oct 01 '18 at 13:42
  • 1
    @DavidRicherby The point, I guess, is that you can usually recognize the year (dates where you can't guess are rare). If that's the case, and it's at the start, you know it must be ISO8601. With formats with the year at the end, you still have to figure out which way round the day and month are written. It's not _inherently_ more unambiguous, but given the available conventions, it is. – Peter Oct 01 '18 at 14:46
19

Let t1 and t2 be distinct integers that represent two times written in YYYYMMDD formatting. Then t1 < t2 implies that t2 occurred after t1.

You lose this ordering with DD and MM first formatting.

ISO is, IMO, the only sensible format.

zahbaz
  • 315
  • 1
  • 4
  • 1
    Except you would never store this as an integer, at least I've never seen it nor considered it. – pipe Sep 25 '18 at 10:54
  • 5
    @pipe: Believe me, some people would. We maintain a legacy system that stores YYYYMMDD as integers. The design probably originated in some old database system without an explicit date type and was kept for backwards compatibility. It's not pretty. Don't do it. – Heinzi Sep 25 '18 at 11:19
  • 19
    @pipe it has been my experience in the software industry that whenever a reasonable person would want to say "But you would never do X" there is always at least one counter example – Joseph Rogers Sep 25 '18 at 13:09
  • @pipe Good point. By emphasizing integer, I was more trying to draw a connection between YYYYMMDD formatting and mathematical ordering, than assert anything about storage or engineering considerations. – zahbaz Sep 25 '18 at 16:29
  • 5
    @pipe In data warehousing it's not uncommon to use a yyyymmdd integer as the primary/surrogate key for a table of dates. – soapygopher Sep 25 '18 at 16:55
  • 1
    Yeah in a DW context it actually makes sense. – Guran Sep 25 '18 at 18:14
  • My programming language of choice has nice functionality to work with dates. But because some ... decided to store dates as integers we now also have to support our own library of date functions that work on integer-dates. All those functions are readily available out of the box if only he would have used a date type. – Pieter B Sep 26 '18 at 08:06
  • 4
    @pipe, well, the sequence number of a DNS zone is a 32-bit integer, which must be increased when the zone changes. While it could be just a plain number, a common idiom is to use numbers like 2018092601... Then there are some curious definitions of magic numbers in described in [`feature_test_macros(7)`](http://man7.org/linux/man-pages/man7/feature_test_macros.7.html), like having `_POSIX_C_SOURCE > 200809L` means that features from POSIX.1-2008 are supported... – ilkkachu Sep 26 '18 at 17:56
  • 1
    This is a pretty good reason to be honest. To choose a representation that gives sensible comparison operators. – mathreadler Sep 27 '18 at 21:28
  • Im a DBA who works with a lot of time series and SCADA data. Storing dates as Integers in YYYYMMDD format in a database is not uncommon, and generally works well. They work especially well when transporting between systems. Biggest downfall is lack of native validation and artihmetic. Proper date formats are better at that. However Ints take less space and sort better, giving them a performance advantage in indexes, sorts and joins. – Sir Swears-a-lot Oct 03 '18 at 03:56
12

One point not mentioned is that, in interactive inputs, this format allows to control the input.

The system cannot know if a month has 28, 29, 30 or 31 days without knowing the specific year and month. When the interactive input mandates that year and month come first it can check if the day (inserted last) is in the allowed range.

Granted, the question was largely about the date format, but it can be argued that the date format follows the formatting presented to the user.

Martin
  • 231
  • 2
  • 8
7

YYYYMMDD orders dates the same way you orders numbers: most significant portion first. MMDDYYYY would be like writing "one hundred twenty three" as "twenty and one hundred three". When storing numbers for machine use, there are significant advantages to keeping the ordering from most significant to least.

In our culture, we have a natural understanding of MMDDYYYY because, as humans, we have an awareness of time, and years progress slowly. We generally know what year it is. Seeing the year rarely matters, so we push it to the back. Months change over just fast enough to retain their importance. Other cultures handle this differnently. Much of the world prefer DDMMYYYY.

Joel Coehoorn
  • 1,895
  • 4
  • 17
  • 22
  • 63
    You may want to rephrase "our culture" because in my culture it's DDMMYYYY so it's not "our" culture just yours – slebetman Sep 25 '18 at 05:38
  • 67
    Comprehensive map of all countries that use the MMDDYYYY date format http://img-9gag-fun.9cache.com/photo/a2mXmGd_700b.jpg – Peregrine Sep 25 '18 at 07:02
  • 9
    Seems like an odd argument: "Months change over just fast enough to retain their importance" -> Why not then put the day first since that changes even faster? – Wim Deblauwe Sep 25 '18 at 08:59
  • When I say, "our culture", I refer to myself and OP. – Joel Coehoorn Sep 25 '18 at 17:56
  • 7
    @JoelCoehoorn, it's easy to make that explicit ("In our U.S. culture"). "our"/"we" is often used to mean "the stackexchange community" here. – AnoE Sep 25 '18 at 18:52
  • 15
    Exactly. Stackoverflow is **international**. That you are US based does not say, imply, or even make it more likely that others are as well. You cannot make any assumption about the locality of your readers here, they are all around the world. And most of your readers will be neither you, nor the OP, but other people who find your answer on Google. This very comment is written on a different continent than the one that you happen to live on. And while we have our own - ehm - *interesting* habits, we most certainly don't use MM/DD/YYYY here... – cmaster - reinstate monica Sep 25 '18 at 22:06
  • 1
    @Peregrine You know what's worse than using MM-DD-YYYY though? Using all three of MM-DD-YYYY, DD-MM-YYYY, and YYYY-MM-DD: https://www.theguardian.com/news/datablog/2013/dec/16/why-do-americans-write-the-month-before-the-day Welcome to Canada! – jbch Sep 25 '18 at 22:20
  • 3
    I generally assume that the short date format grew out of the long date format -- in the US you would typically say "March 14 2015" but elsewhere you'd say "14th of March 2015". I further assume that the US reordering is to save time (much like the removal of "u" in many words). (I don't know if this is actually what happened, but it seems sufficiently plausible to keep me happy. And FWIW I do not use the US format myself.) – Miral Sep 26 '18 at 07:46
  • 1
    "Natural understanding": I find this misleading. It sounds as if DDMMYYYY-ordering is more "natural", when in fact it is not even clear whether nature has a concept of time, plus when in fact not all human cultures have a concept of relative direction, i.e. left and right. – phresnel Sep 26 '18 at 07:51
  • 2
    Thanks for the explanation of the reasons why such date formats are used. I wondered because here in Hungary we put the year first as in IS-8601 and the western formats seem quite convoluted to me, not natural at all. – Peter Tölgyesi Sep 26 '18 at 14:53
  • 1
    Even in USA, genealogists are smart enough to use dd mmm yyyy, and the military used to be. That format is not as nice for programming, but is also unambiguous. – WGroleau Sep 26 '18 at 18:07
  • 3
    @WGroleau - that's why in programming you should always use native date / datetime values and only convert to a string at the point you want to display something to to the user. Transfers across processes (eg xml or json) should always use ISO 8601 format. – Peregrine Sep 27 '18 at 07:37
  • 2
    Of course several European languages count with constructs like "one hundred, three and twenty", and not even always using multiples of ten(?!?). Thankfully, however, they use decimal positional representation for written communications. – dmckee --- ex-moderator kitten Sep 30 '18 at 17:53
6

Sorting has been mentioned but by far the most useful reason for doing is to compare them as "strings", and yes a 26 character timestamp is ordered similarly.

I am aware such comparisons are essential for sorting, but it is generally useful for a 2 element sort.

I have worked on projects where this was not adopted, and yes, programmers tried (with mixed results) to compare the dates as strings.

Pretty formatting is for the client side or typesetting.

mckenzm
  • 177
  • 5
5

This format makes alphabetical order of the strings identical to chronological order of the dates. This is useful because many tools provide alphabetical ordering of e.g. files by name, but no way to parse arbitrarily-formatted dates from file names and sort by those.

4

It's about restrictiveness. Imagine YEAR, MONTH and DAY as parameters, in the format YYYYMMDD each parameter is more restrictive than the previous one.

So if you want to search something that happened in 1970 you can do it by searching a string starting by "1970*", but if you remeber which month was you can add the month like "197005*". This way every "parameter" of the date gives you more specific information.

It's the only way to go from less specific info ("1970*") to more specific info ("19700523").

mrvinent
  • 57
  • 3
  • 3
    Not really a great argument - it's just as common to search for things happening in specific months rather than specific years. – Cubic Sep 25 '18 at 14:25
  • 1
    If `1970*` and `197005*` represent "glob" wildcard syntax, then you could search a bunch of MMDDYYYY dates by searching for the glob `*1970` or `05*1970`. Your answer might be implicitly assuming some extra constraint that you didn't explicitly mention, and could be improved by explaining your assumption. – Quuxplusone Sep 26 '18 at 00:01
  • 3
    This is sort of a side-effect or another way of describing the sort-key order mentioned by other answers. But this explanation falls apart unless you restrict it to searching on prefixes. (Easier to index for, but by no means required). – Peter Cordes Sep 26 '18 at 08:13
  • It also means you can select a sequence of dates with relatively simple regexp's... – Harper - Reinstate Monica Oct 03 '18 at 02:43
1

Why, in programming, the default date format is YYYYMMDD ...

It's a human readable format for input and output, it's not necessarily stored that way.

Over a third of all programming languages were developed in a country with English as the primary language and most of the modern ones adhere to a Standard of some description - the international Standard for dates is ISO 8601.

More info: (TMI?)

As time changes, usually forward, days increment first, then months, lastly years - it might be easier to understand if we had decimal dates (and decimal time) - as time passes the number gets bigger. It's simply easier for humans to look at the number and compare it to another date at a glance.

The computer doesn't care what structure you want to use and in most (but not all) computers binary logic is used - base e actually has the lowest radix economy but isn't the most efficient nor easiest for a complete sequence.

The actual input and output format for dates varies by country and is set by localization, while YYYYMMDD may seem to make the most sense and be what you are used to it isn't universal today, nor was it that way in the past for the longest time, yet even today Roman numerals are commonly used for dates.

Knowing the year upfront tells you the number of days in a year, the biggest variation in duration that a year can undergo. It tells you upfront the number of days in each month to follow (for error checking during entry), permitting input of the day first might have to back you up if the subsequent year did not agree with your input - possibly making accessible input more difficult. It also has importance with regards to the calendar format. See also the geek calendar, with its decimal stardates.

As far as the computer is concerned it's likely to use UNIX Epoch time, the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, where every day is treated as if it contains exactly 86400 seconds. See also the Julian day. The YYYYMMDD format is simply preferred by egocentric humans, the IAU regards a year as a Julian year of 365.25 days (31.5576 million seconds) unless otherwise specified.

Rob
  • 286
  • 1
  • 3
  • 11
0

Another use I've seen for this representation is that you can store dates as integers (i.e. in a database), using only 4 bytes per date. Using YYYYMMDD then means that integer comparisons (often a single machine instruction) have the same result as comparisons on the date represented. And it prints moderately human-readably. And none of this requires any code or special support at all, in any mainstream programming environment.

If those things are most of what you need to do with dates, and you need to do a lot of it, then this format has a lot of appeal.

By comparison, dates in common formats like DD/MM/YYYY take 10 bytes as strings of ASCII characters. YYYYMMDD strings reduce that to 8 and gain the "comparing the representations has the same result as comparing the dates" advantage, but even then string-based comparison is character-by-character rather than a single integer comparison.

Ben
  • 1,017
  • 6
  • 10
  • 2
    It's trivial to pack a date into three bytes. The range 0000~9999 requires 14 bits, 01~12 requires 4 bits, and 01~31 requires 5 bits, for a sum of 23 bits. By using also the remaining bit in a three-byte quantity, you can represent dates over a period of 32,768 years maintaining a one-day resolution. This could be used, for example, to allow representing dates in the range year 8191 BC through 24576 AD. By packing the bits as, say, yyyyyyyyyyyyyyymmmmddddd, the decimal representation remains directly comparable (though not directly human readable, but who cares in database physical storage?). – user Sep 27 '18 at 18:57
0

Same reason the Moon is made of green cheese: it is not. In most cases the default format is some kind of localized string. Sometimes ISO format is used but usually with dashes for better readability. YYYYMMDD(or %Y%m%d in strftime parlance) is seldom the default. To be fair I am sure I have seen it but I cannot think of an example right now.

Unix date (GNU core utilities)

date

Output:

Wed Sep 26 22:20:57 CEST 2018

Python

import time
print(time.ctime())

output:

Wed Sep 26 22:27:20 2018

C

#include <stdio.h>
#include <time.h>

int main () {
   time_t curtime;

   time(&curtime);
   printf(ctime(&curtime));
   return(0);
}

Output:

Wed Sep 26 22:40:01 2018

C++

#include <ctime>
#include <iostream>

int main()
{
    std::time_t result = std::time(nullptr);
    std::cout << std::ctime(&result);
}

Output:

Wed Sep 26 22:51:22 2018

Javascript

current_date = new Date ( );
current_date;

Output:

Wed Sep 26 2018 23:15:22 GMT+0200 (CEST)

SQLite

SELECT date('now');

Output:

2018-09-26

LibreOffice Calc

enter image description here

Gnumeric

enter image description here

OnlyOffice

enter image description here

Python + numpy

import numpy as np
pd.datetime64('now')

Output:

numpy.datetime64('2018-09-26T21:31:55')

Python + pandas

import pandas as pd
pd.Timestamp('now', unit='s')

Output:

Timestamp('2018-09-26 21:47:01.277114153')

Software Engineering

enter image description here

apport.log

ERROR: apport (pid 9742) Fri Sep 28 17:39:44 2018: called for pid 1534, signal 6, core limit 0, dump mode 2

alternatives.log

update-alternatives 2018-05-08 15:14:24: run with --quiet --install /usr/bin/awk awk /usr/bin/mawk 5 --slave /usr/share/man/man1/awk.1.gz awk.1.gz /usr/share/man/man1/mawk.1.gz --slave /usr/bin/nawk nawk /usr/bin/mawk --slave /usr/share/man/man1/nawk.1.gz nawk.1.gz /usr/share/man/man1/mawk.1.gz

cups/access.log

localhost - - [28/Sep/2018:16:41:58 +0200] "POST / HTTP/1.1" 200 360 Create-Printer-Subscriptions successful-ok

syslog

Sep 28 16:41:46 pop-os rsyslogd:  [origin software="rsyslogd" swVersion="8.32.0" x-pid="946" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Stop harming Monica
  • 835
  • 1
  • 8
  • 11
  • 10
    to add to your argument, how many of them are formatted that way because of user settings on the computer you ran the scrip on? – Topher Brink Sep 27 '18 at 11:05
  • The first one is not really "bash", it is the date program (and it outputs `Do 27. Sep 22:27:09 CEST 2018` here.) – Paŭlo Ebermann Sep 27 '18 at 20:27
  • @PaŭloEbermann You are right, I hope it is better now. As I said many of these formats are localized so the actual format you see will depend on your localization options. – Stop harming Monica Sep 27 '18 at 20:42
  • 4
    While the point of this Answer is true for end-user-oriented apps, not so for data exchange between systems, data serialization, message/data protocols, logging, tracing, debuggers, and so on. The ISO 8601 standard is rapidly becoming the norm for such uses aimed at system admins and programmers. Ditto for international or locale-agnostic scenarios. – Basil Bourque Sep 27 '18 at 22:01
  • @BasilBourque Thanks, I added a random sample of the logs I found in my own system. I don't have examples of the other types handy. But I don't think that a trend towards defaulting to ISO 8601 in specific domains make its basic variant "the default in programming" in the face of the vast amount of software that defaults to other formats. – Stop harming Monica Sep 28 '18 at 21:39
  • 1
    This is a greatly under-appreciated answer, if only because it shows that many programs and libraries handling date output will do the wrong thing by default. I can excuse the UNIX program "date" incorrectly assuming that its common use case is to display dates in the supposedly more human readable (it's not, even when setup correctly) local format. But for the internals of programming languages this makes no sense whatsoever. – Kafein Oct 03 '18 at 08:27
-1

An additional benefit not mentioned so far is that desirable quantization (assigning a precise value as belonging to the same general range of values) is a relatively easy and fast single operation..

Suppose you're writing a report that summarises the events today, like the sum and number of sales. The sale date and time is stored as YYYYMMDDHHMISS, you simply need to keep the leftmost 8 characters (if it's a string) or integer divide (i.e. floor) by 1,000,000 to reduce your datetime to the day of the sale.

Similarly, if you wanted the month's sales, you keep only the leftmost 6 digits, or divide by 100,000,000

Sure, you could argue that any string manipulation is possible, a datetime for sales of "12-25-2018 12:34pm" could be substringed and manipulated multiple times to get the month and the year. In numeric form 122520181234 could be divided and modded, and multiplied, and divided some more, and eventually also produce a month and a year.. ..but the code would be really difficult to write, read, maintain and understand..

And even sophisticated database optimizers might not be able to make use of an index on a column for a where clause if the date form was MM/DD/YYYY but cut up and pieced back together. In comparison, storing a YYYYMMDD representation and wanting December 2018 leads to where clauses of the ilk dateasstring LIKE '201812%' or dateasint BETWEEN 20181200 and 20181299 - something an index can easily be used for

Thus, if there wasn't a dedicated datatype for dates and string/numerical representation was the only choice, using and storing times in some representation of longest-interval-on-the-left-to-shortest-interval-on-the-right has quite a few benefits for ease of understanding, manipulation, storage, retrieval and code maintenance

Caius Jard
  • 685
  • 4
  • 9