Improve System Timer Usage #363

mole99 · 2025-01-08T08:46:59Z

Magic has (ideally) two timers running, the backup timer creates a backup every ~5 minutes or after a certain number of changes, and some long running functions set up a timer for progress checks. When a new timer is set up, it interferes with the one that is already setup (at least on Linux). This can cause the progess checks to stop, or worse, the crash backup to stop.

Possible solution: use create_timer() instead of setitimer()

The text was updated successfully, but these errors were encountered:

dlmiles · 2025-02-10T22:06:33Z

This "backup timer" what are you claiming it is doing ? Like an auto-save of changed *.mag file ? Does this feature already exist (or is this issue also a feature request)

Can you provide a sequence of events and actions where behaviour changes from that which is expected ?

The most obvious solution is not to use create_timer() which maybe Linux platform specific and stick with the POSIX compatible SIGALARM (so MacOS and other platforms work). But change the management of SIGALRM to track all active timers independently, then manage re-arm timeout and the signal delivery and notification accordingly. From what I can see there are only approx 10 timer event consumers, many of which only operate mutually-exclusively (as they report progress on demand to a specific command and only one command can run at a time).

RTimothyEdwards · 2025-02-10T22:23:02Z

@dlmiles : I requested Leo to post the issue after we discussed it. It is an item that has been on my "to do" list for a long time without being addressed.

I originally put in the backup file timer a long time ago using setitimer() and then slightly more recently added some "progress bar"-type output for some potentially long-running functions like plotting, extraction, and GDS generation. I did not realize at the time that setitimer( ) can access only one system timer, so that the introduction of any other timer interferes with the existing timer. The actions for which the "progress report" is needed can only be run one at a time (I think), so it's really the backup timer that is the main issue, since failing to make a timely backup and then crashing the program can have very demoralizing consequences (I speak from recent experience).

I have not tried to find a reproducible result. The crash backup file handling is done with the crashbackups command, but I did not add any command option to return the amount of time left on the timer, which makes it impossible to know what state the timers are in. I have seen several apparently-related issues; one is that the crash backups stopped being written (or never got written in the first place). Another is that the extraction progress will just stop printing after a while for long extractions. An unrelated issue that needs correcting is that the progress report does not get invoked when the function being timed finishes, so it never outputs "100%" complete, which is a minor issue but one which confuses people.

RTimothyEdwards · 2025-02-10T22:25:38Z

@dlmiles : An interesting feature of the crash backup code is that it will write and read a .mag file which is a concatenation of all the cells in memory. The ability to put multiple cells in one .mag file could be very useful for other applications, like keeping all generated devices used in a layout in one file.

dlmiles · 2025-02-10T23:39:35Z

timer_create() does look to be POSIX but part of real-time extensions that platforms don't have to support. On Linux to get support -lrt is needed and maybe much of the implementation is user-space based, but the Linux kernel does appear to support multiple timers on the kernel side for delivery. The problem is unclear if other POSIX platforms supports it. This is to clarify the crux of my bias away from this API call and managing the 10 timer consumers inside the application.

I see the issue there are TCL command tcltk/tools.tcl:86:proc magic::crashbackups that setup a timeout via the after https://wiki.tcl-lang.org/page/after command to run the magic command crash save. It is not clear how TCL itself manages such interval timers, such as automatic event-loop unblocking or SIGALRM (that enqueues event).

There is reference to a after timer id to allow query and cancellation of active timer. But it is unclear how to get the id in the first place, (except by reading and finding from after info output, and it is unclear if there is a better mechanism).
So maybe it is possible check if an id is still active and if not rearm timer.

Considerations that might just require minor tcl changes:

create a tcl proc magic::rearmcrashbackup that performs the rearming by using after (refactor existing proc magic::crashbackups and magic::makecrashbackup to use this call)
use exception/catch handler to wrap the active part of magic::crashbackups to try to ensure the rearm proc is always called, it is unclear if *bypass crash save fails, if the proc aborts further processing
maybe a cheat here is to use after 0 magic::rearmcrashbackup before running *bypass crash save which should cause it to run it the next time tcl gets back to the main event loop.

It should be possible to see the current active timer with after info (re your comment on seeing how long left on timer, but really you'd want to just know it is active)

bonus now you have a new proc magic::rearmcrashbackup when you can't see the timer is currently active you can manually rearm it. Now if the id is save into a global $Opt(crashbackup_timerid) maybe you can make the command only add new timer if the old one is no longer active.

Obviously the other approach is to remove the tcl procs above and do it inside C code:

maintain expiry timer (current_time + 600 seconds) = expiry_time
check expiry time on every SIGALARM
enqueue command, so it can run at the next available moment, recompue new expiry_time

dlmiles · 2025-02-10T23:54:45Z

# a sketch
proc magic::makecrashbackup {} {
   if {[catch {
      *bypass crash save
   } result]} {
      magic::rearmcrashbackup
   } else {
      magic::rearmcrashbackup
   }
}

# another sketch
proc magic::makecrashbackup {} {
   after 0 magic::rearmcrashbackup
   *bypass crash save
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve System Timer Usage #363

Improve System Timer Usage #363

mole99 commented Jan 8, 2025

dlmiles commented Feb 10, 2025

RTimothyEdwards commented Feb 10, 2025

RTimothyEdwards commented Feb 10, 2025

dlmiles commented Feb 10, 2025 •

edited

Loading

dlmiles commented Feb 10, 2025

Improve System Timer Usage #363

Improve System Timer Usage #363

Comments

mole99 commented Jan 8, 2025

dlmiles commented Feb 10, 2025

RTimothyEdwards commented Feb 10, 2025

RTimothyEdwards commented Feb 10, 2025

dlmiles commented Feb 10, 2025 • edited Loading

dlmiles commented Feb 10, 2025

dlmiles commented Feb 10, 2025 •

edited

Loading