-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to define "canonical time zone identifier" in the spec? #8
Comments
Here's the text we now have in the 402 section of the Temporal proposal. Unless we hear implementer or user concerns here in this issue, we'll assume that this text below is OK. If CLDR ends up defining a clearer requirement, we can revise the text below in a normative PR to ECMA-402, outside of the scope of this proposal.
|
Do we want/need to make a clearer distinction between resolving links to zones and actual time zone transition data? For example ICU doesn't provide any time zone transition data for any time zone rule from js> new Temporal.TimeZone("Atlantic/Reykjavik").id
"Atlantic/Reykjavik"
js> new Temporal.TimeZone("Atlantic/Reykjavik").getNextTransition("1800-01-01T00:00:00Z").toString()
"1912-01-01T00:16:08Z" This is the first transition from Africa/Abidjan, not from Atlantic/Reykjavik. Kind of related: https://mm.icann.org/pipermail/tz/2023-June/032998.html |
My weakly-held opinion is that it would be ideal if ICU used backzone data, and I think that it's reasonable to encourage them to do so. But they may have good reasons not to do this. For example, if it would add 10MB to browser downloads, then I'd probably want to live with the current weird pre-1970 data. I'd want to understand more about why they've chosen to not use backzone, especially if it is intentional on their part. But Im also not sure that inaccuracy of pre-1970 data is a huge problem. First, it's been this way for quite a while and doesn't seem to be generating as many bugs as the Calcutta/Kyiv issue. I suspect this is because almost all ECMAScript programs deal with post-1970 data only. And those that do deal with pre-1970 data, it's likely that all they care about are dates, for which Temporal.PlainDate can be used which avoids TZDB completely. So I suspect that this problem gets better with Temporal, so I'd probably want to wait until Temporal is widely adopted to see if it really needs to be fixed. That's why I suspect that the current spec text, which makes a recommendation but does not require backzone, is probably OK for now. The recommendation was written to give implementers (or ICU!) leeway to choose alternate ways to build TZDB as long as there's at least one Zone per country. Do you think this is OK? Or do you think that we should use different spec text? If at some point later we get ICU and implementers to align on a particular solution, then I think we can (outside of this proposal) open a normative PR against 402 to change that text. For example, once ICU surfaces the IANA canonical ID for all CLDR zones, then I'd support changing that text to refer to ICU instead of IANA build options. |
My concerns were about differences between time zone id canonicalisation and time zone transition rules. ICU basically builds TZDB two times, using different build options (*):
So we end up with a Zone per country, but the Zone's transition rules are always from a non- (*) This isn't really what happens, the canonical time zone identifiers are actually derived from CLDR (plus ICU specific overrides), but let's ignore that difference for now. |
Does IANA guarantee that? AFAIU from IANA's perspective (or Paul's) the only thing that backzone does is that it clarifies pre-1970 transitions. Also, I don't think IANA guarantees "at least one time zone per country". Do you really want to use geopolitically sensitive term as "country" in the spec? Also
backzone does not introduce new time zones, I think that's inaccurate.
I don't think IANA makes such claims. They also don't use terms as primary. Personally, I find this note confusing. |
The spec deliberately avoids "country" and instead says "ISO 3166-1 Alpha-2 country code", so I think it's OK because we're delegating the difficult geopolitical decision of "what is a country" to ISO, and simply agreeing to follow their decisions. Is there different wording that you think would work better?
My understanding is that
Yep, this matches my understanding of how ICU works. I agree that this is a problem, and is something that we should engage with ICU to see if they're willing to change it to be more consistent. I think the first priority is to get CLDR to report the current IANA canonical ID. Once that change is made, then I think we should engage with ICU (perhaps as part of the planning for the API to expose the IANA canonical ID in ICU's APIs?) to figure out how to solve the backzone data issues. What do you think? |
zone.tab (and zone1970.tab) groups existing time zones. Theory page does not even mention country. Stephen Colebourne suggested adding "at least one time zone per OSI country" guarantee, but AFAIK it is still only a proposal.
Please use different term. IANA has no canonical IDs and it might mean different thing to different people.
Build options do not change set of existing / available / acceptable time zones. Also neither this spec defines time zones nor implementers can create their owns ones.
I don't understand why spec needs to mention country / country codes in the first place. What is the idea behind proposed note? |
The goal is to emphasize that TZDB's default build options, which currently merge multiple countries into one Zone, are not recommended for ECMAScript, because it'd mean that time zone changes in one country can affect time zone data in a completely different country. For example, it'd be bad if I show up at the wrong time for my meeting in Reykjavik next year just because Cote d'Ivoire changed its time zone rules. Instead, we want implementations to follow CLDR's approach: do not deprecate all time zones for a particular country code, even if all those Zones are turned into Links in the IANA TZDB. Also, because most (all?) major ECMAScript implementations rely on CLDR (via ICU), our assumption is that this recommendation won't require implementations to make any changes. Without mentioning country codes, I'm not sure how we can make the same points above. Do you have other text in mind that could do it better?
As I understand it, zone.tab is the data source for programs that allow users to map from an ISO 3166-1 Alpha-2 country code to one or more time zones that are used with that country code. I suspect for backwards compatibility reasons, it doesn't use the same time zone ID (which may be a Zone or a Link) for multiple country codes. Which is good for ECMAScript use.
Yep, in the ECMAScript spec we've been calling this "primary time zone identifier" and "non-primary time zone identifier". Do you like these terms? Here's the relevant part of the ECMA-262 current spec that defines these terms: https://tc39.es/ecma262/#sec-time-zone-identifiers. And here's the proposed spec text in the Temporal proposal that describes how the IANA TZDB relates to these terms: https://tc39.es/proposal-temporal/#sec-use-of-iana-time-zone-database. Would you like to see changes in either of those spec text sections? If yes, what changes do you think are needed? |
I am sure you know that it's not how timezones work after the merge :)
Theory page explicitly says that:
and you've seen outcome of the recent merges. And there is nothing about
Is your goal to treat
Yeah, it sounds good. One can nit-pick whether GMT / GMT+0 / GMT-0 / Etc/GMT / Etc/GMT+0 / Etc/GMT-0 / Etc/UTC are non-primary for UTC (I am referring to |
Yes. Even if Germany and Norway use the same time zone rules today, they might not do so in the future. So we believe that there's value in allowing end-users and developers to use the most granular time zone identifier that corresponds to a desired location. If that per-country ID is persisted somewhere, then it's immune to time zone rule changes that happen in another country. It's still not a perfect solution, because rules can change in any country, but by retaining per-country-code IDs (and exposing them as primary in ECMAScript APIs like
Yep, per above I'm less concerned about the current state (which after the merge is identical between merged IDs) and more concerned about making persisted IDs tolerate future changes. So "show up at the wrong time for my meeting in Reykjavik next year just because Cote d'Ivoire changed its time zone rules" really is the failure case that we're trying to prevent. For example, imagine a calendar app that serialized a Was that what you meant? Or did I misunderstand what you were trying to say?
Yeah, I agree. But backwards compatibility does seems to be really important to Paul, and he's bent over backwards to ensure that there are build options that can replicate the global-tz fork. And even if zone.tab were to be somehow eliminated (which seems unlikely), we're insulated by CLDR from direct impact. So the worst possible outcome if zone.tab goes bad (where "bad" means either it's removed or it stops having one row per country code) would be that we'd need to change the ECMAScript spec to point to CLDR instead of zone.tab. So I'm not really worried about it for the time being. What do you think? |
I have a feeling that these problems stem from treating
Are you sure something that 1) over which you have no control and 2) which does not promise anything should be used in the spec? Are country codes needed when there are primary timezone identifiers? |
Yep, this proposal will reduce the scope of the problem considerably by stopping canonicalization in all cases where IDs are provided by users. But canonicalized cases will still remain where ECMAScript is providing the IDs:
Given that canonicalization is still exposed in these two cases, I don't think we should be following IANA's merges because those IDs will be used in ECMAScript programs and sometimes persisted, leading to the "meeting in Reykjavik" problem noted above. I'm not sure how the "show me this user's current time zone" and "show me a list of time zone IDs I can localize and then use in a GUI time zone chooser" cases can work as expected without ignoring the merges in IANA's default build options. Do you have another solution in mind?
I'm not sure I understand the question. Could you explain a bit more what you mean?
The current spec language (in https://tc39.es/proposal-temporal/#sec-use-of-iana-time-zone-database) offers a lot of wiggle room in case zone.tab breaks backwards compatibility: it is recommended that ECMAScript implementations instead use build options such as I think this language is fairly clear that "at least one primary identifier for each ISO 3166-1 Alpha-2 country code" is the goal and there can be many ways to achieve that goal, only one of which is using zone.tab. So even if zone.tab stops satisfying the "at least one primary identifier for each ISO 3166-1 Alpha-2 country code" recommendation, I'm not worried that implementers will get it wrong. Especially given that all major current implementations are going through CLDR and ICU (which have similar output to the zone.tab list and which also won't follow IANA's merges). What do you think? |
As I understand it, merge according to IANA means no more than "rules for timezone X are exactly the same as for Y". So when they added
I don't think that timezone merges and timezone picker UI are related. On Android we maintain such file. It is probably somewhat close to proposed primary timezones list grouped by country code (it is not a spec and we don't use
Oh, you give some explanation on how primary IDs are defined. It will be hard to do it w/o mentioning them then.
Text looks fine. Though
is incorrect. You assume that
"won't follow merges" - is it regarding pre-1970 data?
|
Oh, I forgot to mention that there is no guarantee that a timezone is mapped to one single country. Be careful with assumptions! |
Thanks, this helps me understand your feedback better. I *do* think that timezone merges and timezone picker UI are related, because in order to build a simple timezone picker UI in a web app (for example, a "what time zone do you want to use for your results?" dropdown box in a reporting UI), a developer needs a way to get a list of IDs that are available to be chosen. Those IDs then need to be localized into the user's language and shown in a picker UI, like a dropdown box. There are a few ways for an ECMAScript developer to get that list of available IDs into their web app:
I assume that (3) is the best solution, because it avoids the problems of 1/2/4, because it's the easiest DX, and because the API already exists. Do you agree? And if not, what alternative do you think would work better? A challenge with (3) is that ECMAScript needs to pick which IDs are primary and which are non-primary. In a perfect world, primary IDs could just be the Zone names in a TZDB built with the default build options. But those defaults merge multiple countries together, which is generally not what most implementations prefer. Instead, most implementations seem to be converging on a list of canonical IDs that is identical to (or very close to) the list of IDs in zone.tab. This is true for CLDR's list , which is identical to zone.tab except for 19 outdated IDs like Asia/Calcutta and Asia/Saigon. We're working with CLDR to extend CLDR data so that ECMAScript could surface the current IDs for those 19. See https://unicode-org.atlassian.net/browse/CLDR-14453 (which will hopefully result in CLDR exposes the current IANA Zone name for those 19) and tc39/ecma402#806 (which will hopefully result in ECMAScript overriding CLDR's outdated IDs until a long-term CLDR solution is avaialble). Firefox uses backzone instead of zone.tab, but FF's list is also quite similar to zone.tab's and I'm checking to see if FF can move to use zone.tab instead. The Android list you sent seems pretty closely aligned to zone.tab too. How is Android's list maintained? So my current opinion is that every ID in zone.tab should be primary. What do you think?
Yeah, it's an interesting question about whether
By "won't follow merges" I mean ignoring the changes like turning Atlantic/Reykjavik and Europe/Oslo into non-primary identifiers, even though the default build options of the current TZDB make these to be Links. I don't have a strong opinion about pre-1970 data, because very little software written in ECMAScript deals with time-of-day info before 1970. I care a lot more about minimizing the impact of future changes to TZDB. For example, if Cote D'Ivoire changed its time zone offset but Iceland didn't (or vice versa), then it'd be really helpful if both countries had different primary IDs in ECMAScript. Also, pre-1970 is already broken in all major ECMAScript implementations, where "broken" means that data from Temporal.ZonedDateTime.from('1800-01-01[Europe/Berlin]').offset
// => '+00:53:28'
Temporal.ZonedDateTime.from('1800-01-01[Europe/Oslo]').offset
// => '+00:53:28'
// Europe/Oslo is 0:43:00 in backzone file. This omission of backzone data does occasionally cause some user complaints, but not nearly as many as the complaints that come from Chrome and Safari using outdated IDs like Asia/Calcutta, Europe/Kiev, and Asia/Saigon. So if the best answer is to omit backzone, I'd be OK with that (and we can update the recommendation in the spec accordingly). What do you think?
In the current zone.tab, there are no IDs listed in that file are used across multiple country codes. This is why I am proposing using zone.tab. I realize that zone1970.tab does use the same ID for multiple country codes, which is I'm proposing to not use that file to determine ECMAScript's primary IDs. Or were you suggesting something different and I'm not understanding what you mean? |
We probably have different assumptions / guarantees around IANA data and APIs provided to users and their behaviour. Stock Android's timezone picker does not care about merges and we've made no code changes when we followed ICU's decision to accept merges.
Timezone picker is not a trivial problem and dropdown list is simple yet not always elegant solution. If you see it as an UI-centric problem to make the majority of developers happy, then (3) sounds good. Though I have no data on browsers update saturation data and how bad version skew issues might be. And I am of course biased - I like Android's current "select region > select timezone" approach more than long dropdown list :)
It is maintained manually. We used to validate it against zone.tab, but since aosp/1909188 it is not.
So you understanding of IANA data is different from Paul Eggert's (not blaming!).
Nope, I just mentioned it because it came in an internal conversation. I don't think you rely on that, but something to keep in mind.
I don't have an opinion on that. What matters is how Link-s are treated, and maybe even that is not important with proposed |
This proposal should define what "canonical time zone identifier" means in ECMAScript. Creating the text for this definition is part of Step 4 as described in the README.
Unfortunately "A canonical identifier is a Zone in TZDB" is too vague because it depends on the TZDB build options. Some build options, like the default MAKEFILE run without options, generate Links (like Atlantic/Reykjavik => Africa/Abidjan) that are not appropriate for ECMAScript use. So a more precise definition is needed that will push implementers to avoid those aggressive Links.
One possible starting point for this precise definition is capturing whatever Firefox is doing now as described in this comment and later comments. Another possible starting point is to align with the output of the global-tz fork of TZDB.
What both of these seem have in common is using
zone.tab
to back out the controversial, problematic merges like Atlantic/Reykjavik => Africa/Abidjan, Europe/Stockholm => Europe/Berlin, Europe/Zagreb => Europe/Belgrade. So if an identifier is listed as a Zone in zone.tab, then it should be canonical in ECMAScript.What I'm unsure about is whether we want or need to use the old data in
backzone
as well. We'll be looking to @anba (from Firefox), @Yqwed (from Android), and others for advice.Once we understand what we want "canonical" to mean, the next step will be crafting specific spec text for that definition suitable for a PR into the ECMASCript spec. One possible definition will be including specific MAKEFILE options (for building TZDB) into the spec. Another possibility is describing in words (e.g. "every identifier that's a Zone in zone.tab must be canonical in ECMAScript") and leave it up to implementers to translate that into build options. The global-tz README could be a starting point for that kind of text.
The text was updated successfully, but these errors were encountered: