Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve System Timer Usage #363

Open
mole99 opened this issue Jan 8, 2025 · 5 comments
Open

Improve System Timer Usage #363

mole99 opened this issue Jan 8, 2025 · 5 comments

Comments

@mole99
Copy link

mole99 commented Jan 8, 2025

Magic has (ideally) two timers running, the backup timer creates a backup every ~5 minutes or after a certain number of changes, and some long running functions set up a timer for progress checks. When a new timer is set up, it interferes with the one that is already setup (at least on Linux). This can cause the progess checks to stop, or worse, the crash backup to stop.

Possible solution: use create_timer() instead of setitimer()

@dlmiles
Copy link
Contributor

dlmiles commented Feb 10, 2025

This "backup timer" what are you claiming it is doing ? Like an auto-save of changed *.mag file ? Does this feature already exist (or is this issue also a feature request)

Can you provide a sequence of events and actions where behaviour changes from that which is expected ?

The most obvious solution is not to use create_timer() which maybe Linux platform specific and stick with the POSIX compatible SIGALARM (so MacOS and other platforms work). But change the management of SIGALRM to track all active timers independently, then manage re-arm timeout and the signal delivery and notification accordingly. From what I can see there are only approx 10 timer event consumers, many of which only operate mutually-exclusively (as they report progress on demand to a specific command and only one command can run at a time).

@RTimothyEdwards
Copy link
Owner

@dlmiles : I requested Leo to post the issue after we discussed it. It is an item that has been on my "to do" list for a long time without being addressed.

I originally put in the backup file timer a long time ago using setitimer() and then slightly more recently added some "progress bar"-type output for some potentially long-running functions like plotting, extraction, and GDS generation. I did not realize at the time that setitimer( ) can access only one system timer, so that the introduction of any other timer interferes with the existing timer. The actions for which the "progress report" is needed can only be run one at a time (I think), so it's really the backup timer that is the main issue, since failing to make a timely backup and then crashing the program can have very demoralizing consequences (I speak from recent experience).

I have not tried to find a reproducible result. The crash backup file handling is done with the crashbackups command, but I did not add any command option to return the amount of time left on the timer, which makes it impossible to know what state the timers are in. I have seen several apparently-related issues; one is that the crash backups stopped being written (or never got written in the first place). Another is that the extraction progress will just stop printing after a while for long extractions. An unrelated issue that needs correcting is that the progress report does not get invoked when the function being timed finishes, so it never outputs "100%" complete, which is a minor issue but one which confuses people.

@RTimothyEdwards
Copy link
Owner

@dlmiles : An interesting feature of the crash backup code is that it will write and read a .mag file which is a concatenation of all the cells in memory. The ability to put multiple cells in one .mag file could be very useful for other applications, like keeping all generated devices used in a layout in one file.

@dlmiles
Copy link
Contributor

dlmiles commented Feb 10, 2025

timer_create() does look to be POSIX but part of real-time extensions that platforms don't have to support. On Linux to get support -lrt is needed and maybe much of the implementation is user-space based, but the Linux kernel does appear to support multiple timers on the kernel side for delivery. The problem is unclear if other POSIX platforms supports it. This is to clarify the crux of my bias away from this API call and managing the 10 timer consumers inside the application.

I see the issue there are TCL command tcltk/tools.tcl:86:proc magic::crashbackups that setup a timeout via the after https://wiki.tcl-lang.org/page/after command to run the magic command crash save. It is not clear how TCL itself manages such interval timers, such as automatic event-loop unblocking or SIGALRM (that enqueues event).

There is reference to a after timer id to allow query and cancellation of active timer. But it is unclear how to get the id in the first place, (except by reading and finding from after info output, and it is unclear if there is a better mechanism).
So maybe it is possible check if an id is still active and if not rearm timer.

Considerations that might just require minor tcl changes:

  • create a tcl proc magic::rearmcrashbackup that performs the rearming by using after (refactor existing proc magic::crashbackups and magic::makecrashbackup to use this call)
  • use exception/catch handler to wrap the active part of magic::crashbackups to try to ensure the rearm proc is always called, it is unclear if *bypass crash save fails, if the proc aborts further processing
  • maybe a cheat here is to use after 0 magic::rearmcrashbackup before running *bypass crash save which should cause it to run it the next time tcl gets back to the main event loop.

It should be possible to see the current active timer with after info (re your comment on seeing how long left on timer, but really you'd want to just know it is active)

bonus now you have a new proc magic::rearmcrashbackup when you can't see the timer is currently active you can manually rearm it. Now if the id is save into a global $Opt(crashbackup_timerid) maybe you can make the command only add new timer if the old one is no longer active.

Obviously the other approach is to remove the tcl procs above and do it inside C code:

  • maintain expiry timer (current_time + 600 seconds) = expiry_time
  • check expiry time on every SIGALARM
  • enqueue command, so it can run at the next available moment, recompue new expiry_time

@dlmiles
Copy link
Contributor

dlmiles commented Feb 10, 2025

# a sketch
proc magic::makecrashbackup {} {
   if {[catch {
      *bypass crash save
   } result]} {
      magic::rearmcrashbackup
   } else {
      magic::rearmcrashbackup
   }
}
# another sketch
proc magic::makecrashbackup {} {
   after 0 magic::rearmcrashbackup
   *bypass crash save
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants