Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing osxtime_t? #10

Open
matthijskooijman opened this issue Nov 23, 2020 · 5 comments
Open

Removing osxtime_t? #10

matthijskooijman opened this issue Nov 23, 2020 · 5 comments

Comments

@matthijskooijman
Copy link

In the Lacuna version / Arduino HAL, we're running into some hal timer-related problem that I'd like to pick your brain on.

The problem is caused because we didn't implement the 64-bit hal_getxticks(), because we don't have an easily accessible 64-bit counter (and I was a bit lazy). We could just implement it by extending the 32-bit timer we have (which we already do a bit to extends 32-bit micros to 32-bit 16-us-ticks), but I've seen that this xticks handling actually adds complexity, probably code size and IIRC wasn't implemented perfectly everywhere (i.e. truncating back to 32-bits in some places, voiding the advantages of using a 64-bit timer elsewhere).

So, I'm pondering to just remove the hal_xtimer() / osxtime_t (nearly) completely instead and just simplify things. The main question is, when is it actually used / needed? I haven't checked thoroughly before, but doing so now I find it is used for:

  • Dutycycle available times.

    For these, 2³¹ ticks x 16μs = 9.5 hours should be sufficient (at 0.1% DC, this allows up to 34s of airtime for a single packet, which is over 500 bytes at SF12BW125 IIUC). Even more, the current code uses osxtime_t in updateTx and nextTx, but truncates to ostime_t in the nextTx return value anyway, so if the avail time would be > 9.5h in the future, I expect things to break with the current code already. Also, the actual available times are stored as 16-bit seconds anyway (from a base osxtime_t), so we could keep that and with some special care support up to 2¹⁶ seconds (18 hours) of (un)avail time as well (but maybe it's not even needed).

    One advantage of storing the base timestamp as osxtime_t currently, is that the availability times will remain valid even when there is no activity for > 9.5 hours (i.e. no activitity that causes setAvail to update the base timestamp), but this can probably also be handled by just updating the base timestamp once every 2³¹ or so ticks (maybe using a dedicated "overflow" task?).

  • Tracking the wall time using MCMD_TIME_ANS / LMIC.gpsEpochOff.

    This code uses the wall time from the MCMD_TIME_ANS returned by the network to store an offset, which allows translating the result of hal_xticks() to wall time later. This can still work with a 32-bit ticks, but then the translation is only valid for 19 hours, after which the calculated wall time loops back. This can be solved by using a longer external timer, or maybe by Given there is no documented API for this (other than direct LMIC.gpsEpochOff access), maybe we can come up with some alternative (such as a getWallTime(ostime_t) function or so that works for timestamps within 9.5 hours around the current time as long as you call it at least once every 9.5h, or maybe the offset can be updated automatically internally whenever the timer overflow, using the same overflow task suggested above)?

  • "Extended" scheduled callbacks (i.e. callbacks after 9.5 hours or more). It seems these are not used by BasicMac itself, only maybe by user code.

    If we want to keep this (which I guess could make sense when you do all scheduling of tasks in your application using the BasicMac scheduler), then we can still implement this even without having to keep hal_xticks(). This would require changing the API slightly (i.e. by specifying a relative callback, rather than an absolute time, which should be acceptable since these callbacks are approximate anyway) and then the implementation can be slightly changed by just counting off multiples of 2³¹ ticks instead of comparing against hal_xticks() on every intermediate callback (if implemented properly, this should be doable without stacking timing errors, so total accuracy is similar to the current implementation).

From your experience with the code, does the above seem plausible? Or am I missing something?

Summarizing, I think the availability time tracking can be internally improved to not require 64-bit timestamps without any user-visible changes, so I'll probably have a go at that.

Then, it might be useful for BasicMac to still offer an 64-bit timer, but maybe it can be made optional (and then the Arduino core could just not provide it, or we could add it anyway later, but people can disable it to save code space). If the 64-bit timer is enabled, the GPS offset and an absolute "extended" callbacks can be offered as is now. If the 64-bit timer is omitted, either these can just be omitted entirely (initially), or have limited / relative implementations as suggested above (implemented later maybe).

@matthijskooijman
Copy link
Author

Thinking on this some more, I realized I was assuming a 16μs tick, as that's what the Arduino HAL uses, but other values are also allowed by BasicMAC, so that should be kept in mind here.

However, the source documents an allowed range of 15.5μs - 100μs, so the above conclusions wrt to how long an interval an ostick_t can express are still pretty accurate. I also suspect the lower limit here might be related to this maximum interval, the upper limit is probably because of the needed accuracy of various timing values.

As for the available times, I realized that one other advantage of passing an osxtime_t to setAvail could be that it allows setting an avail time that is more than 2³¹ in the future (which can be stored, since available times are expressed in 16-bit seconds, which allows expressing times up to a little bit less than 2³¹ 16μs ticks). However, in practice this is not actually true, since the available timestamp is calculated before converting to osxtime_t, so if the airtime * DC would be more than 2³¹ ticks, the variable would already overflow:

basicmac/lmic/lmic.c

Lines 952 to 954 in 982b514

setAvail(&LMIC.dyn.bandAvail[b], os_time2XTime(txbeg +
airtime * REGION.bands[b].txcap,
xnow));

So in that sense, we won't lose any precision by simply skipping the conversion to osxtime_t for available times entirely.

I also considered not to store available times in 16-bit seconds, but just storing the upper 16 bits of the ostime_t value (which is a lot simpler, faster and prevents having to store a base value). The main point here is compressing the value into 16 bits, at the cost of lowering precision to whole seconds. If you take the upper 16-bits of a ostime_t value, you get a precision of 2¹⁶ ticks, which is 1.05 seconds with 16μs ticks, which is pretty much the same. However, with a 100μs ticks, this produces 6.55 second precision, which might be too coarse.

I also wondered if some kind of hybrid approach could work, where short times are more precise than longer times, but that can only be made to work with intervals relative to a base value, and probably requires updating the base value (and thus all available times) more often, which might end up being contraproductive.

Then, for something completely different but related: I noticed ostime_t is signed, but in C signed overflow is undefined, so I think that might cause compilers to misoptimize the current code in some circumstance. I already spent some time converting ostime_t to unsigned (and adding the needed casts in places where a subtraction result must be interpreted as signed again), I'll share that patch later.

@matthijskooijman
Copy link
Author

I decided to just see how it would look like with not using xtime for avail times, without changing the "store seconds" approach used now and came up with https://github.com/LacunaSpace/basicmac/commits/no-xtime

In addition to some small cleanups, there's two main changes:

  • I've made the timestamp types unsigned in LacunaSpace/basicmac@a6ccbfb as suggested above. Seems to have no significant impact on the (size of) generated code, but should be safer (though I heard someone recently argue that the "signed overflow is undefined" is not really mandated by the C spec, it's just gcc's interpretation of it, but since we're using gcc, I guess that's what we have to deal with). This changes is pretty broad, but I think it's good.
  • I've rewritten the availability handling to not use 64-bit timestamps, but instead use 32-bit timestamps and properly handle overflow. This removes conversion between timestamp types from the code to simplify it, but somewhat complicates timestamp comparison (since this must now handle overflow, which is a bit more delicate). It also adds a new job that runs periodically to advance the avail timers (to prevent issues from the ticks silently overflowing when we're not paying attention). Overall, I'm not entirely sure if the code really becomes better from this. But it does simplify the generated code. A lot. I noted a difference of 300 bytes of flash on STM32, and 1000 (!) bytes of flash saved on AVR. So there is a significant gain here...

I haven't done long-term testing of that branch yet (just finished it), I'll be doing that next.

@mkuyper
Copy link
Owner

mkuyper commented Nov 30, 2020

Thanks for raising this -- really it's two issues, although they are interrelated.

Definition of terms
ticks - LMiC's base time unit, in the range 15.5μs - 100μs. Ticks will roll over every 2^32 ticks.
xticks - Same as ticks, but they never roll over.

Some history
Your forensic work is absolutely spot on -- the xticks concept was first added to deal with DC timestamps becoming invalid (interpreted as being far in the future) if there is no activity for more than half the clock rollover time, which is about 18 hours at 32.768 kHz, or 9.5 hours with 16 µs ticks. At the time, we decided to go with this method because time with 48 bits was already available in our STM32 implementation, and there are -- as you pointed out -- other advantages to having a clock that doesn't roll over. The other obvious solution would have been to add a maintenance task that runs at the half-way point to clock rollover and fixes up the timestamps.

Extended time availability
On the STM32 implementation, ticks are implemented using a 16 bit timer running at 32.768 kHz, complemented by a 32 bit variable in RAM that counts the number of rollovers of the timer. That is, the counter in RAM is updated every time the hardware timer rolls over. This gives a total of 48 bits and a roll-over period of over 272 years. Would it be possible/feasible to implement something like this on the Arduino as well to make a non-rollover clock available?

osxtime_t usage by the stack
While I think that the availability of xticks is useful, I am not at all opposed to avoiding their use in the core stack -- especially since this this has the potential simplify quite a few things and should save some memory as well. I also think having a regularly-running maintenance task to fix up timestamps is perfectly acceptable.


ostime_t signed vs. unsigned arithmetic
Yeah. I've been aware of this for a long time. Technically, signed integer overflow is UB in C. Fortunately, we can pretty much count on being on a platform that uses two's complement and that doesn't raise an exception on overflow. The other, more tricky bit is that overflow being UB means that the compiler can pretend that overflow can't occur, and optimize accordingly. The way we are using ostime_t arithmetic pretty much precludes any of these optimizations. But you definitely need to keep your fingers crossed -- or enable -fwrapv on GCC.

Changing ostime_t to signed
This is a tough one, as it is a major breaking change to the API. In principle, it feels like the right thing to do, but this will break pretty much any application code that uses the runtime and does time stamp comparisons. Some of those might generate warnings, if enabled, but not all. We might need to rename the type if we make such a fundamental change.

As a side note, while unsigned overflow is well defined in C, a cast from unsigned to signed, e.g. from ostime_t to ostimediff_t in your no-xtime branch, is implementation-defined behavior. Admittedly, that's not as bad as UB, and thanks to the ubiquity of two's complement hardware it does the right thing pretty much everywhere.

A while ago I started an effort to get rid of global state in LMiC. As part of that, I also created a clean version of the runtime without global state that also fixed the signed overflow issues and included macros for creating time deltas and doing time stamp comparisons. I'll dig that up and throw it onto GitHub, maybe that can serve as some inspiration.


In any case, I think we should separate the two issues, reducing/removing the use of xticks in the stack, and the signed overflow UB issue. What do you think?

@mkuyper
Copy link
Owner

mkuyper commented Nov 30, 2020

Here's my no-global runtime concept I mentioned earlier: https://github.com/mkuyper/rtic/blob/master/rtic.h
Note that ticks are unsigned and there are macros for comparisons making it easier to do right thing.

@matthijskooijman
Copy link
Author

Thanks for raising this -- really it's two issues, although they are interrelated.

Agreed, probably best to open another issue for the signed vs unsigned issue. I'll do that next and then respond there as well.

On the STM32 implementation, ticks are implemented using a 16 bit timer running at 32.768 kHz, complemented by a 32 bit variable in RAM that counts the number of rollovers of the timer. That is, the counter in RAM is updated every time the hardware timer rolls over. This gives a total of 48 bits and a roll-over period of over 272 years. Would it be possible/feasible to implement something like this on the Arduino as well to make a non-rollover clock available?

Arduino does something like that internally (extending a 16-bit micro-second timer to 32-bits) to produce a 32-bit micros() timer. The BM Arduino HAL then reduces this to 28-bit to get 16μs ticks, and extends it again to 32-bit by tracking overflows. However, there is no portable API to get an ISR or callback when an overflow happens, so the current code only works when hal_ticks() is called at least once every 2³² μs. But I guess that any code using a maintenance task requires polling the task queue as well, so then hal_ticks() would be called anyway. Your observation that stm32 uses just 48-bits is valuable, though, hadn't realized that xticks doesn't need to be 64-bits, as long as it is big enough to practically never overflow. So I suspect it wouldn't be too hard to extend the current code to produce a 56-bit xticks (28-bit counter and 32-bit overflow variable, which in the current code have one overlapping byte to allow handling overflow without branching and minimal bitshifting, which can be slow on AVR). Alternatively, the overlap could be removed to get a 44-bit xticks (28-bit counter and 16-bit overflow variable), which lasts 8 years without overflowing. Should probably be enough?

Btw, you say 272 years, but I get 142 years for an 48-bit timer?

>>> (2**48)*16/1e6 /3600/24/365
142.80820736207812

While I think that the availability of xticks is useful, I am not at all opposed to avoiding their use in the core stack -- especially since this this has the potential simplify quite a few things and should save some memory as well. I also think having a regularly-running maintenance task to fix up timestamps is perfectly acceptable.

This is what I implemented in https://github.com/LacunaSpace/basicmac/commits/no-xtime, what would you think of that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants