Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CalendarDuration type #1282

Open
11 tasks
pitdicker opened this issue Sep 11, 2023 · 14 comments
Open
11 tasks

Add CalendarDuration type #1282

pitdicker opened this issue Sep 11, 2023 · 14 comments

Comments

@pitdicker
Copy link
Collaborator

pitdicker commented Sep 11, 2023

The goal is to add a type that can describe durations with the flexibility of an ISO 8601 duration. Because we already have enough confusion with types named Duration, I propose CalendarDuration as name.

It will have components that can describe a 'nominal duration' (to use the terms of ISO 8601), and an 'accurate duration'.

#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
pub struct CalendarDuration {
    // Components with a nominal duration
    months: u32,
    days: u32,
    // Components with an accurate duration
    seconds: u32,
    nanos: u32,
}

ISO 8601 describes more possible components: years, weeks, hours and minutes. But these can all be expressed as a multiple of the components proposed above: years as 12 months, weeks as 7 days, hours as 3600 seconds, and minutes as 60 seconds.

ISO 8601 does not have a nanosecond component, but allows the last smallest component to have fractions. So we can map a fractions of a second to nanos.

In theory a minute may not be always equal to 60 seconds, in the presence of a leap second. We currently can't do calculations with leap seconds. But if/when we do, this type may need a little revising.

Checklist for implementation:

  • Add CalendarDuration type
  • Initilization methods
  • Implement From Day, Month, std::time::Duration
  • Methods to get the components of a CalendarDuration
  • Display implementation
  • Parsing from ISO 8601 format with designators
  • Parsing from ISO 8601 format with date/time-like syntax
  • Various diff_* methods on NaiveDate, NaiveDateTime and DateTime to create a CalendarDuration
  • add_calendar_duration methods on NaiveDate, NaiveDateTime and DateTime
  • sub_calendar_duration methods on NaiveDate, NaiveDateTime and DateTime
  • Other methods?
@pitdicker
Copy link
Collaborator Author

I would like to start small, to keep things reviewable:

  • methods the create a CalendarDuration
  • methods to get the data back out
  • a Display implementation (to make checking the CalendarDuration easier)
  • some methods on NaiveDate to show how a CalendarDuration can be created from the difference between two dates.
  • methods on NaiveDate, NaiveDateTime and DateTime<Tz> that show how a CalendarDuration should be added to a date(time).

@djc does this sound reasonable? Of would you like to start with something smaller or larger?

@pitdicker
Copy link
Collaborator Author

The natural order to add a CalendarDuration to a some value is to first add the months component, and then the days component, and then the accurate component.

What to do if a result does not exist? That can be the case of the date is for example 2023-08-31 and you try to add one month. The result, 2023-09-31, does not exist. Returning 2023-10-01 does not seem right; adding one month to a date in August should not return a date in October. Also returning 2023-09-30 does not seem right, the period in between is less than a month. In those cases we should return None.

The tricky part is: what to do if an intermidiate value does not exist, but there can be a reasonable end result? For example adding one month and one day to 2023-08-31. In that case I find it reasonable to return 2023-10-01, just like how adding two months and one day would return 2023-11-01.

Why go through the trouble of supporting non-existing intermediate values? Because otherwise working with a CalendarDuration that includes a month component just becomes too tricky to work with for users in my opinion.
I already have most of the code, and it does not add too much complexity for us. We have to insert the last day of the month instead, and later do a check that the end result falls on a later (valid) day.

Another tricky part is that we add the months and days components to a DateTime<Tz> in local time, convert it to UTC, and then add the accurate components. The conversion to UTC may fall in a DST transition. In that case we should forward LocalResult::Ambiguous or LocalResult::None when adding the accurate components.

@pitdicker
Copy link
Collaborator Author

About parsing ISO 8601 durations with a fractional component:

b) If necessary for a particular application, the lowest order components may have a decimal fraction. The decimal fraction shall be divided from the integer part by the decimal sign specified in ISO 31-0, i.e. the comma [,] or full stop [.]. Of these, the comma is the preferred sign. The decimal fraction shall at least have one digit, the maximum number of digits in the decimal component needs to be agreed by the partners in information interchange. If the magnitude of the number is less than unity, the decimal sign shall be preceded by a zero (see ISO 31-0).

The terms 'if necessary' and 'may' do not read like we have to support fractions.

A couple of fractions make sense to me to support, because there is a common understanding of what they should mean:

  • Seconds: we want to have those anyway to encode nanoseconds.
  • Hours and minutes: those can be translated to seconds and nanoseconds, and I already have code to do so in ISO 8601 parsers #1143.
  • Years: ¼ (P0.25Y), ½ (P0.5Y) and ¾ (P0.75Y) map cleanly to a number of months, and ⅓ (P0.33Y) and ⅔ (P0.67Y) also seem reasonable to map to months.

Not really sensible:

  • A fraction of a day. For example P1.5D, 1½ day. It will usually map to 1 day and 12 hours. But what if it falls in a day with a DST transition, which is only 23 hours long? Does the duration become 1 day and 11½ hours, or remain 12? And what if the duration falls partly in a regular day, and partly in a shorter day? Should it become 1 day and, say, 11:40 hours?
  • A fraction of a month is also hard to define except in a fuzzy sense. Especially when the duration crosses from one month into another.
  • A fraction of a week seems like a quite unlikely duration.

@djc
Copy link
Member

djc commented Sep 12, 2023

What about signedness? You proposed a type with u32 fields, but IIRC an ISO 8601 duration serialization can contain negative values?

@pitdicker
Copy link
Collaborator Author

They define it to be unsigned: https://www.iso.org/obp/ui/en/#iso_std_iso_8601-1_ed-1_v1_en_term_3.1.1.8

Duration
non-negative quantity of time equal to the difference between the final and initial instants (3.1.1.3) of a time interval (3.1.1.6)

@djc
Copy link
Member

djc commented Sep 12, 2023

Okay! Do you want to make the fields public of keep them private? It feels like once we have this, we can maybe deprecate the Days and Months types? I'm happy to review an initial PR, please make small PRs.

@pitdicker
Copy link
Collaborator Author

Do you want to make the fields public of keep them private?

I prefer to make them private, like in std::time::Duration and chrono::Duration. Then we can validate nanos < 1,000,000,000 on creation. And I may want to put the 2 free bits to use.

It feels like once we have this, we can maybe deprecate the Days and Months types?

Yes, maybe? I won't miss the Days type. Months is a bit more opinionated on what to return if the resulting month has fewer days.

I'm happy to review an initial PR, please make small PRs.

I'll do my best. A first PR needs ca. 2 more days to polish.

@pitdicker
Copy link
Collaborator Author

Currently I am reading up a bit more on our leap seconds issues. It seems like it really shouldn't be much work to implement a TAI timezone and do correct calculations with leap seconds (but definitely not in the first PR).

As I see it now we can prepare the type to support working with them. It should encode whether the duration is expressed with an accuracy in hours and/or minutes, or in seconds.

With an accuracy in minutes the expectation is probably that we paper over leap seconds. With an accuracy in seconds the expectation is probably that passing seconds are accurately counted, including leap seconds. In any case it is a property of the duration how to deal with leap seconds. Of course the type should have a method to override this.

@demurgos
Copy link
Contributor

demurgos commented Sep 13, 2023

Continuing the discussion about durations from #954 (comment)

This brings me back to the meaning of adding chrono::Duration::hours(5). Is it "add 18,000 historical seconds" or is it "add 18,000 SI seconds"?

#1282 is about creating a new CalendarDuration that can encode seperate components. months can take into account months have a different number of days, and days can take into account that days are not always 24 hours (because of DST and other timezone transitions). What do you think of the plan for seconds to encode the question of how to count a duration in the presence of leap seconds in that duration type?

First of all, I appreciate the effort of adding CalendarDuration: I think that this is a great step to help disambiguating between adding a day to the calendar or 86400 SI seconds. In particular, it addresses the issue at the right level by separating calendar semantics from the underlying timestamp.

My goal would be for chrono::Duration to always represent SI seconds, and thus match the ISO definition of a duration. This would allow to treat it as a difference on a continuous and constant timeline.


Here are two scenarios I described in the linked issue. There was a leap second on December 31st 2016. The current time is 22:00 UTC:

  1. Time measurement: I start a process that will last exactly 5 hours (18000 SI seconds), when will it end?
  2. Scheduling: I want to send a notification every 5 calendar hours (o'clock), when should I send the next notification?

I expect the answer to the first scenario to be:

// Time measurement: work with SI seconds and "absolute" time
let now = Utc::now(); // 2016-12-31T22:00:00Z
let duration = Duration::hours(5);
let result = now + duration; // 2017-01-01T02:59:59Z

And the answer to the second scenario to be:

// Scheduling: work with calendar components
let now = Utc::now(); // 2016-12-31T22:00:00Z
let duration = CalendarDuration::hours(5);
let result = now + duration; // 2017-01-01T03:00:00Z

For calendar durations, I expect them to be applied as follow:

  • start with the internal representation (timestamp + Tz)
  • compute calendar components
  • update each component
  • compute back the new internal representation (timestamp + Tz)

This feels to me like a natural interpretation of calendar durations; but it brings its fair share of edge cases.

ISO 8601 describes more possible components: years, weeks, hours and minutes. But these can all be expressed as a multiple of the components proposed above: years as 12 months, weeks as 7 days, hours as 3600 seconds, and minutes as 60 seconds.

I do not have access to ISO 8601 (I think it's not freely available?) but based on the simple scenario described above, I think that you can't convert hours into seconds but have to keep an explicit hours field. Otherwise you can't model my scheduling scenario above. (I think? depends on what the calendar seconds represent).

The main concerns I have are discontinuities around leap seconds and DST, situations such as "January 30th + 1 month", or if the order of application is important or not (is adding calendar durations commutative?). I also know that chrono durations are way easier to handle than std durations because of their signedness. I'm not sure about calendar duration, but I would still consider having a sign.


EDIT The January 30th + 1 month is actually pretty interesting: it's the same thing as 2016-12-31 23:59:60 + 1 minute, just on a different scale. Could you share how you plan to support it?

@pitdicker
Copy link
Collaborator Author

pitdicker commented Sep 14, 2023

Your comments made me realize the situation with leap seconds is different from days that are not always 24h, months that differ in length, and leap years. For those three it is just a question of: in what unit do you want to count in? For leap seconds we have a second question: does a leap second exist for you?

For many devices and thus many people leap seconds don't exist. Watches, analogue clocks and most digital clocks don't have the concept. They just don't have enough accuracy to be synced to within a fraction of a second with UTC, and they don't need to. Historical seconds with your definition. And the same is true for most computers. And if you do want to acknowledge leap seconds, for most software that means pretending because they don't have a clock source that can return a leap second.

We should strive for correctness, but also not forget this reality exists.

So we have three cases:

  1. "add/subtract this duration, act like leap seconds don't exist"
  2. "add/subtract this duration by first adding/subtracting full minutes, and then the remaing seconds"
  3. "add/subtract this duration as a fixed number of seconds"
  • The result of 1 and 2 are almost indistinguishable. They only differ if the remaing seconds to add or subtract cross a minute boundary that has a leap second.
  • If a duration does not contain seconds, 1 and 2 are always equal.
  • If a duration does not contain minutes, 2 and 3 are equal.

I am not sure where to go with this. It just corrected my thinking.

Combinations of hours, minutes and seconds

It seems to me it is best to carve out two reasonable modes to specify the accurate components of a duration:

  1. Minutes and seconds, where seconds <= 60.
    The duration of minutes is flexible, and can be 59, 60 or 61 seconds depending on the presence of a leap second.
    The value of minutes can be very large, spanning years.
    However in this mode seconds must always be less than a full minute.
  2. Seconds only.
    This counts purely the number of seconds, including leap seconds, between two points in time.
    The value of seconds can be very large, spanning years.
    But minutes and hours are not allowed or must be zero.

This is kind of allowed by ISO 8601. The choice for maximum number of digits for a component, and by extension the maximum value, is left up to mutual agreement (4.4.3.2):

In both basic and extended format the complete representation of the expression for duration shall be Pnn̲W or Pnn̲Ynn̲Mnn̲DTnn̲Hnn̲Mnn̲S.

In these representations the maximum number of digits in a component needs to be agreed by the partners in information interchange.

Instead of this proposal of having two modes we could allow any arbitrary amount of minutes and seconds. That may even be a little easier to do.
But then we get combinations that make very little sense. In my opinion it is important for the type and operations to be explainable. Suppose you have one duration that has 1000 minutes and 60,000 seconds as components: why is it choosing a different strategy for leap seconds during the first 1000 minutes then for the next ca. 1000?

A second advantage is that it keeps the size of the type smaller. In one u32 we can encode seconds in the range 0..61 with 6 bits, and use the rest to store minutes. Or we can use the entire integer to hold seconds (only need to stuff one bit somewhere to tell the difference). I would quite like to keep the type fitting in 16 bytes.

@pitdicker
Copy link
Collaborator Author

pitdicker commented Sep 14, 2023

I do not have access to ISO 8601 (I think it's not freely available?

Beyond the definitions that are freely available, it does not do much more than specify how to format the values as a string.

I think that you can't convert hours into seconds but have to keep an explicit hours field. Otherwise you can't model my scheduling scenario above. (I think? depends on what the calendar seconds represent).

You are right. We should at least have separate minutes and seconds components.

The main concerns I have are discontinuities around leap seconds and DST, situations such as "January 30th + 1 month",

Discontinuities are the tricky part. #1282 (comment) describes most of how I plan to work with them. That does not describe leap seconds yet. As long as we know the type is capable, I think it is best to implement the basic operations without support first and slowly grow the complexity.

or if the order of application is important or not (is adding calendar durations commutative?).

It is important. If you add a duration with the highest order components first, the inverse would be to subtract with the lowest order components first. And even then I am pretty sure that we can't guarantee it will always round-trip.

I hope we don't need to implement inverse_add and inverse_sub methods, but it is of course possible to do so if it turns out people need them.

I also know that chrono durations are way easier to handle than std durations because of their signedness. I'm not sure about calendar duration, but I would still consider having a sign.

This stuff is turning out somewhat tricky with all the edge cases. Throwing signedness into the mix is currently too much for me 🤣. And we would loose the ability to always format a duration with the ISO 8601 format.

EDIT The January 30th + 1 month is actually pretty interesting: it's the same thing as 2016-12-31 23:59:60 + 1 minute, just on a different scale. Could you share how you plan to support it?

The idea is to make January 30th + 1 month return None, and make January 30th + 1 month + 1 day return February 1st. I.e. to have as rule that a non-existing intermediate value is assumed to be the last day of the month for the rest of the calculation, and we return a result if the resulting date does exist.

For consistency we may have to do the same for 2016-12-31 23:59:60 + 1 minute: return None. We would be trying to add 61 seconds, and end up with a non-existing result. But I feel very much for returning 2017-01-01 00:00:59 instead, by only adding 60 seconds. It seems like a footgun users are not going to like otherwise.

@pitdicker
Copy link
Collaborator Author

Creating a duration from the difference between two dates

We should have methods to create a CalendarDate from the difference between two dates. The methods to do so should be as flexible as the CalenderDuration type: they should allow you to specify how the duration should be divided into components.

impl NaiveDate {
    fn diff(&self, other: NaiveDate, date_components: DateComponents);
}

impl NaiveDateTime {
    fn diff(&self, other: NaiveDateTime, date_components: DateComponents, time_components: TimeComponents);
}

impl<Tz: TimeZone> DateTime<Tz> {
    fn diff(&self, other: DateTime<Tz>, date_components: DateComponents, time_components: TimeComponents);
}

enum DateComponents {
    YearsDays,
    MonthsDays,
    Days,
    None, // defaults to `TimeComponents::Minutes` if used with `NaiveDate`
}

enum TimeComponents {
    Minutes,
    Seconds,
}

Again the order of operations is important. To quote from #1247

For a project I needed to know the number of whole months and remaining days between two dates. This turns out to be a bit difficult, until you know how to look at the problem.

The correct answer is different if you are counting down towards a date, compared to counting the number of months and days since an event. In both cases there is a reference date, and when using this method as reference.diff(other) you get the right answer without having to think about it much.


I have not thought much about calculating the difference between two DateTimes that are in different time zones. It does not seem sensible...

@djc
Copy link
Member

djc commented Sep 15, 2023

I'm open in principle to accomodate for the use cases of those who care very much about leap seconds, but one rule that I think makes sense is that the API for those who don't care (who are in the majority by far) probably shouldn't be allowed to suffer.

@pitdicker
Copy link
Collaborator Author

pitdicker commented Sep 18, 2023

ISO 8601 specifies two ways to serialize a duration: a default format using designators, and a more familiar format similar to dates and times.

Examples:

designator format alternate format
1 hour and 30 minutes PT1H30M or PT90M PT1:30 or PT01:30:00
15 seconds PT15S PT00:00:15
1 year and two months P1Y2M P0001-02-00
1 year and 200 days P1Y200D P0001-200
5 days and 12 hours P5D12H P0000-005T12:00

I don't think the second syntax is great. But the designator format is quite alien until you get to know it compared to the regular time formats. In some cases, especially the time-only ones, there is something to say for the second format.

Worth noting that the alternate format is not as flexible als the designator format. It can't express hours ≥ 24, days (within a month) ≥ 31, and similar values beyond their carry-over point to the next unit.

If we have ISO 8061 date and time parsers the alternate format is easy to add.

By mutual agreement of the partners in information interchange, duration may be expressed in conformity with the format used for time points, as specified in 4.1.2, 4.1.3, 4.2.2.5 and 4.3, where the formats of 4.3 are restricted for the date component to the formats in 4.1.2 and 4.1.3 and for the time of day component to the formats in 4.2.2.2 through 4.2.2.4. The values expressed must not exceed the “carry over points” of 12 months, 30 days, 24 hours, 60 minutes and 60 seconds. Since weeks have no defined carry over point (52 or 53), weeks should not be used in these applications. In these expressions a possible value of the year time element is [0000], of the calendar month and calendar day of the month time elements [00] and of the calendar day of the year time element [000].

The complete representation of the expression for duration in the alternative format is as follows:

Basic format: PYYYYMMDDThhmmss or PYYYYDDDThhmmss
Extended format: PYYYY-MM-DDThh:mm:ss or PYYYY-DDDThh:mm:ss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants