Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly disallow year 0 #1000

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Explicitly disallow year 0 #1000

wants to merge 1 commit into from

Conversation

arp242
Copy link
Contributor

@arp242 arp242 commented Oct 13, 2023

Year "0" doesn't exist; it goes from 1 BC to 1 AD.

RFC 3339 is somewhat unclear if this should be allowed, at least in my reading of it:

This document defines a [..] representation of dates and times using
the Gregorian calendar.

"Gregorian calendar" has no "year zero", so it should be forbidden. But also:

All dates and times are assumed to be in the "current era",
somewhere between 0000AD and 9999AD.

So meh.

Practically speaking, supporting this across the board is rather tricky. Python's datetime has no way to represent this (other than None, maybe? Ugh.), PostgreSQL doesn't support it, Go's time.Time behaves oddly (e.g. IsZero() is false, which is rather surprising), etc.

ISO 8601 defines year 0 as "1 BC", which is even worse since most datetime implementations don't really do BC dates.

Just forbidding it is by far the easiest for everyone; for implementations with a datetime that supports it, it's just a single if, and for e.g. Python nothing needs to be done.

The only potential downside is that people may have d = 0000-01-01. We already broke compatibility "for sanity" by disallowing table overrides, so I think that's okay. The alternative is making it implementation dependent. Meh.

RFC 3339 is supposed to be a "ISO 8601, without obscure edge cases"; this seems to fit with the intended purpose.

Year "0" doesn't exist; it goes from 1 BC to 1 AD.

RFC 3339 is somewhat unclear if this should be allowed, at least in my
reading of it:

> This document defines a [..] representation of dates and times using
> the Gregorian calendar.

"Gregorian calendar" has no "year zero", so it should be forbidden. But
also:

> All dates and times are assumed to be in the "current era",
> somewhere between 0000AD and 9999AD.

So meh.

Practically speaking, supporting this across the board is rather tricky.
Python's datetime has no way to represent this (other than None, maybe?
Ugh.), PostgreSQL doesn't support it, Go's time.Time behaves oddly (e.g.
IsZero() is false, which is rather surprising), etc.

ISO 8601 defines year 0 as "1 BC", which is even worse since most
datetime implementations don't really do BC dates.

Just forbidding it is by far the easiest for everyone; for
implementations with a datetime that supports it, it's just a single
`if`, and for e.g. Python nothing needs to be done.

The only potential downside is that people may have `d = 0000-01-01`. We
already broke compatibility "for sanity" by disallowing table overrides,
so I think that's okay. The alternative is making it implementation
dependent. Meh.

RFC 3339 is supposed to be a "ISO 8601, without obscure edge cases";
this seems to fit with the intended purpose.
@pradyunsg
Copy link
Member

Is there a reason to not leave this unspecified and treat this as a case of "garbage datetime inputs give you a garbage output"?

@arp242
Copy link
Contributor Author

arp242 commented Oct 13, 2023

I can foresee people using 0000-01-01 as a sort of "null value", causing interoperability problems. Is it a huge practical problem? I don't know, probably not? But we forbid a bunch of things that are not huge practical problems, so...

It came up when I copied some extra tests go-toml has to toml-test; we didn't have anything to test the boundaries of dates so that seemed useful, but then a bunch of parsers failed with that new test (they were also in the "failed fuzzer" regression tests, although I'm not sure if they caused a crash or something else – the commit that added it changed a bunch of things).

I'm not strictly opposed to leaving it as-is I suppose.

@arp242
Copy link
Contributor Author

arp242 commented Oct 13, 2023

I'm not strictly opposed to leaving it as-is I suppose.

However, given that we KNOW it will cause interoperability problems, at the very least a line mentioning it wouldn't be a bad idea, IMHO. At least Python, C#, and Common Lisp will always fail, because their stdlib doesn't support this concept.

@eksortso
Copy link
Contributor

eksortso commented Oct 16, 2023

It is concerning that 0000-01-01 is not properly handled by many systems. I wasn't even aware that year zero caused Python problems until @arp242 mentioned it.

We've said a lot about the use of sentinels to avoid nulls. We want sentinels to be interpreted as normal values and not rely on placeholders that users would not expect to see. In this case, we ought to note that for each of the three datetime formats that include dates, the minimum date value is necessarily 0001-01-01, and not anything in year zero.

The concept gets a +1 from me. I'll review the code in this PR and make suggestions if needed.

@eksortso
Copy link
Contributor

Let's not mention the Gregorian calendar. Even though RFC 3339 explicitly mentions it, that standard doesn't help matters when it mentions year 0000 twice. A lot of people, myself included, just assume that it's the proleptic Gregorian calendar and that year 0000 is the same as 1 BC.

The real reason that we would exclude 0000 is the previously mentioned technical limitations: year 0000 does not work with some systems.

So, let's just tell it like it is: For the sake of interoperability, the year 0000 is invalid. Note that the ABNF permits a well-formed year 0000, so the parser must catch and invalidate year 0's if it normally wouldn't. And we must add tests to make sure any year 0 is caught as invalid.

To drive the point home, let's include 0001-01-01 as a valid example:

odt5ok = 0001-01-01 00:00:00Z  # The earliest valid timestamp
# odtX = 0000-12-31 00:00:00Z  # INVALID

Since we don't have a section describing common facets of all date-time types, we will need to repeat the "year 0 is invalid" line, however it's formulated, in the following three sections explicitly: Offset Date-Time, Local Date-Time, and Local Date.

@ChristianSi
Copy link
Contributor

ChristianSi commented Oct 21, 2023

I'm fine with the idea of this PR, and I would in any case mention the Gregorian calendar; it's a better justification than mere interoperability alone (though the latter could be mentioned too).

@eksortso:

Since we don't have a section describing common facets of all date-time types, we will need to repeat the "year 0 is invalid" line, however it's formulated, in the following three sections explicitly: Offset Date-Time, Local Date-Time, and Local Date.

True, but I'd keep it short, just saying something as "The allowed year range is the same as for Offset Date-Time" in the other three sections.

@eksortso
Copy link
Contributor

@ChristianSi Sounds good, though getting too deep into the specifics of the Gregorian calendar, both regular and proleptic, can be a mindtrap. Don't mention October 1582! Or better yet, just let RFC 3339 cover any details we leave out.

Let's say "Valid years run from 0001 through 9999." That would work for all three sections, and we can follow up with details, about the calendar and the reasons for why we do not allow non-positive years, in the Offset Date-Time section only.

@eksortso
Copy link
Contributor

eksortso commented Feb 28, 2024

I've begun work on a PR to disallow year 0. @arp242, I may need your help to submit the appropriate tests. (We just need three files in invalid, since you already have valid tests in edge.toml.)

@eksortso
Copy link
Contributor

eksortso commented Mar 1, 2024

My apologies @arp242. I only just now realized that you already submitted this as a PR.

My approach in #1016 is different from yours, in that I took the conversation here into account when I created it. Like it or not, RFC 3339 doesn't forbid year 0. But a lot of languages do, at least in their most fundamental implementations.

And I just now realized that my PR got really janked up. I'll be making some corrections later tonight.

@eksortso
Copy link
Contributor

eksortso commented Mar 2, 2024

#1016 is updated. Take a look and see if this passes muster.

@ChristianSi
Copy link
Contributor

@eksortso I think #1016 can be made a bit shorter (see my comments), otherwise it looks good for what it does.

However, especially considering that RFC 3339 does allow the year 0000, I now wonder again whether this is really the right way to approach this problem. It makes more work for RFC 3339-compatible parser writers, with no particularly relevant benefit as far as I can see. Hence I wonder whether it wouldn't be better to just point out that the year 0000 doesn't have to supported and so is better avoided in documents, writing something like:

Valid years run from 0001 to 9999. Parsers may also accept dates with the year 0000, but don't have to do so. Since not all parsers will accept this year and since it doesn't exist in the standard Gregorian calendar, it's better to avoid it in TOML documents.

That way, the potential pitfall is documented and the guaranteed supported year range is clearly documented, but nobody is forced to adapt their implementation.

@eksortso
Copy link
Contributor

eksortso commented Mar 4, 2024

That way, the potential pitfall is documented and the guaranteed supported year range is clearly documented, but nobody is forced to adapt their implementation.

@ChristianSi This makes a lot of sense, though I think we can make it even simpler. It isn't up to us to document the potential pitfall. We just need to acknowledge its existence, and to stay out of the way when its use is justifiable.

We should just note that we are making this special case for "technical reasons." We need not say what those technical reasons are. They could be intrinsic errors for year 0000 in the implementation languages. Or they could be externally generated errors, produced because the (standard) Gregorian calendar has no year zero. They could be both.

Dates with the year 0000 may throw errors for technical reasons. Avoid them in TOML documents.

odtN = 0001-01-01 00:00:00Z  # NOTE: Earliest date-time guaranteed to be valid.

The example mirrors the sentiment that exists in edge.toml in toml-test. Year 0 may not exist, but year 1 most certainly will.

@eksortso
Copy link
Contributor

eksortso commented Mar 8, 2024

Please review the latest updates in #1016.

@eksortso
Copy link
Contributor

eksortso commented Mar 17, 2024

In #1016, I said that I was closing the issue PR because nobody reported a problem, and we shouldn't add prohibitive or ineffectual language to the spec prematurely. To this end, I believe that this PR should also be closed. We'll revisit the matter again if we really need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants