-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicitly disallow year 0 #1000
base: main
Are you sure you want to change the base?
Conversation
Year "0" doesn't exist; it goes from 1 BC to 1 AD. RFC 3339 is somewhat unclear if this should be allowed, at least in my reading of it: > This document defines a [..] representation of dates and times using > the Gregorian calendar. "Gregorian calendar" has no "year zero", so it should be forbidden. But also: > All dates and times are assumed to be in the "current era", > somewhere between 0000AD and 9999AD. So meh. Practically speaking, supporting this across the board is rather tricky. Python's datetime has no way to represent this (other than None, maybe? Ugh.), PostgreSQL doesn't support it, Go's time.Time behaves oddly (e.g. IsZero() is false, which is rather surprising), etc. ISO 8601 defines year 0 as "1 BC", which is even worse since most datetime implementations don't really do BC dates. Just forbidding it is by far the easiest for everyone; for implementations with a datetime that supports it, it's just a single `if`, and for e.g. Python nothing needs to be done. The only potential downside is that people may have `d = 0000-01-01`. We already broke compatibility "for sanity" by disallowing table overrides, so I think that's okay. The alternative is making it implementation dependent. Meh. RFC 3339 is supposed to be a "ISO 8601, without obscure edge cases"; this seems to fit with the intended purpose.
739a0fd
to
20e87a0
Compare
Is there a reason to not leave this unspecified and treat this as a case of "garbage datetime inputs give you a garbage output"? |
I can foresee people using 0000-01-01 as a sort of "null value", causing interoperability problems. Is it a huge practical problem? I don't know, probably not? But we forbid a bunch of things that are not huge practical problems, so... It came up when I copied some extra tests go-toml has to toml-test; we didn't have anything to test the boundaries of dates so that seemed useful, but then a bunch of parsers failed with that new test (they were also in the "failed fuzzer" regression tests, although I'm not sure if they caused a crash or something else – the commit that added it changed a bunch of things). I'm not strictly opposed to leaving it as-is I suppose. |
However, given that we KNOW it will cause interoperability problems, at the very least a line mentioning it wouldn't be a bad idea, IMHO. At least Python, C#, and Common Lisp will always fail, because their stdlib doesn't support this concept. |
It is concerning that 0000-01-01 is not properly handled by many systems. I wasn't even aware that year zero caused Python problems until @arp242 mentioned it. We've said a lot about the use of sentinels to avoid nulls. We want sentinels to be interpreted as normal values and not rely on placeholders that users would not expect to see. In this case, we ought to note that for each of the three datetime formats that include dates, the minimum date value is necessarily 0001-01-01, and not anything in year zero. The concept gets a +1 from me. I'll review the code in this PR and make suggestions if needed. |
Let's not mention the Gregorian calendar. Even though RFC 3339 explicitly mentions it, that standard doesn't help matters when it mentions year 0000 twice. A lot of people, myself included, just assume that it's the proleptic Gregorian calendar and that year 0000 is the same as 1 BC. The real reason that we would exclude 0000 is the previously mentioned technical limitations: year 0000 does not work with some systems. So, let's just tell it like it is: For the sake of interoperability, the year 0000 is invalid. Note that the ABNF permits a well-formed year 0000, so the parser must catch and invalidate year 0's if it normally wouldn't. And we must add tests to make sure any year 0 is caught as invalid. To drive the point home, let's include odt5ok = 0001-01-01 00:00:00Z # The earliest valid timestamp
# odtX = 0000-12-31 00:00:00Z # INVALID Since we don't have a section describing common facets of all date-time types, we will need to repeat the "year 0 is invalid" line, however it's formulated, in the following three sections explicitly: Offset Date-Time, Local Date-Time, and Local Date. |
I'm fine with the idea of this PR, and I would in any case mention the Gregorian calendar; it's a better justification than mere interoperability alone (though the latter could be mentioned too).
True, but I'd keep it short, just saying something as "The allowed year range is the same as for Offset Date-Time" in the other three sections. |
@ChristianSi Sounds good, though getting too deep into the specifics of the Gregorian calendar, both regular and proleptic, can be a mindtrap. Don't mention October 1582! Or better yet, just let RFC 3339 cover any details we leave out. Let's say "Valid years run from 0001 through 9999." That would work for all three sections, and we can follow up with details, about the calendar and the reasons for why we do not allow non-positive years, in the Offset Date-Time section only. |
I've begun work on a PR to disallow year 0. @arp242, I may need your help to submit the appropriate tests. (We just need three files in |
My apologies @arp242. I only just now realized that you already submitted this as a PR. My approach in #1016 is different from yours, in that I took the conversation here into account when I created it. Like it or not, RFC 3339 doesn't forbid year 0. But a lot of languages do, at least in their most fundamental implementations. And I just now realized that my PR got really janked up. I'll be making some corrections later tonight. |
#1016 is updated. Take a look and see if this passes muster. |
@eksortso I think #1016 can be made a bit shorter (see my comments), otherwise it looks good for what it does. However, especially considering that RFC 3339 does allow the year 0000, I now wonder again whether this is really the right way to approach this problem. It makes more work for RFC 3339-compatible parser writers, with no particularly relevant benefit as far as I can see. Hence I wonder whether it wouldn't be better to just point out that the year 0000 doesn't have to supported and so is better avoided in documents, writing something like:
That way, the potential pitfall is documented and the guaranteed supported year range is clearly documented, but nobody is forced to adapt their implementation. |
@ChristianSi This makes a lot of sense, though I think we can make it even simpler. It isn't up to us to document the potential pitfall. We just need to acknowledge its existence, and to stay out of the way when its use is justifiable. We should just note that we are making this special case for "technical reasons." We need not say what those technical reasons are. They could be intrinsic errors for year
odtN = 0001-01-01 00:00:00Z # NOTE: Earliest date-time guaranteed to be valid. The example mirrors the sentiment that exists in |
Please review the latest updates in #1016. |
In #1016, I said that I was closing the |
Year "0" doesn't exist; it goes from 1 BC to 1 AD.
RFC 3339 is somewhat unclear if this should be allowed, at least in my reading of it:
"Gregorian calendar" has no "year zero", so it should be forbidden. But also:
So meh.
Practically speaking, supporting this across the board is rather tricky. Python's datetime has no way to represent this (other than None, maybe? Ugh.), PostgreSQL doesn't support it, Go's time.Time behaves oddly (e.g. IsZero() is false, which is rather surprising), etc.
ISO 8601 defines year 0 as "1 BC", which is even worse since most datetime implementations don't really do BC dates.
Just forbidding it is by far the easiest for everyone; for implementations with a datetime that supports it, it's just a single
if
, and for e.g. Python nothing needs to be done.The only potential downside is that people may have
d = 0000-01-01
. We already broke compatibility "for sanity" by disallowing table overrides, so I think that's okay. The alternative is making it implementation dependent. Meh.RFC 3339 is supposed to be a "ISO 8601, without obscure edge cases"; this seems to fit with the intended purpose.