Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mutex savenexus calls to prevent segfaults #544

Merged
merged 5 commits into from
Feb 18, 2025
Merged

Conversation

walshmm
Copy link
Collaborator

@walshmm walshmm commented Feb 14, 2025

Description of work

This adds a mutex for SaveNexus in MantidSnapper to mitigate the race condition we are experiencing

To test

NOTE: Test script points to data in MY directory, update paths before you run!

Attempt to reduce 64486, default calibration and artificial normalization is fine.
run cis script in pr at the same time.
Repeat till self assured.
Observe no segfault.

Link to EWM item

EWM#8534

Verification

  • the author has read the EWM story and acceptance critera
  • the reviewer has read the EWM story and acceptance criteria
  • the reviewer certifies the acceptance criteria below reflect the criteria in EWM

Acceptance Criteria

This list is for ease of reference, and does not replace reading the EWM story as part of the review. Verify this list matches the EWM story before reviewing.

  • acceptance criterion 1
  • acceptance criterion 2

Copy link

codecov bot commented Feb 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.02%. Comparing base (c5e5379) to head (6b43602).
Report is 1 commits behind head on next.

Additional details and impacted files
@@           Coverage Diff           @@
##             next     #544   +/-   ##
=======================================
  Coverage   96.02%   96.02%           
=======================================
  Files          71       71           
  Lines        5463     5463           
=======================================
  Hits         5246     5246           
  Misses        217      217           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

add back the sleeps

remove algo from manager when complete

added timeout in case of long async algo queue

reorg where we check and wait for algos to complete, share a lock between potential cross interference

both reentrant and nonconcurrent mutexes
@walshmm walshmm force-pushed the ewm8534_save_nexus_segfault branch from e155126 to de1412c Compare February 17, 2025 18:07
Copy link
Contributor

@ekapadi ekapadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm running the tests now, but you might as well fix the spelling error. That is the only comment that I had.

Copy link
Contributor

@ekapadi ekapadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that the provided cis-test script definitely produces a SEGFAULT on "next". (The script needs to be run in workbench at the same time as the reduction workflow is running in SNAPRed.) The SEGFAULT occurs during the final SaveNexus write of the reduced output workspaces.

On this branch however, the same test completes successfully without any SEGFAULT. I can definitely notice a slowdown in certain sections, usually where a RenameWorkspace operation is waiting on one of theSaveNexus calls. This behavior is reassuring as this indicates that the work-around is clearly being applied.

@walshmm walshmm merged commit 53e9326 into next Feb 18, 2025
8 checks passed
@walshmm walshmm deleted the ewm8534_save_nexus_segfault branch February 18, 2025 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants