Skip to content

Update Context to be thread-safe #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
krlberry opened this issue Mar 6, 2025 · 4 comments
Closed
4 tasks done

Update Context to be thread-safe #44

krlberry opened this issue Mar 6, 2025 · 4 comments

Comments

@krlberry
Copy link
Contributor

krlberry commented Mar 6, 2025

Objective

Update the Context mechanism for the Prefect workflow to be thread-safe. The current design is not thread-safe, as indicated from testing during the previous sprint.

Some potential ideas for an approach are here: context

Requirements

  • The context mechanism has been updated to a thread-safe implementation
  • Testing to ensure that the context is thread-safe
  • Update the wiki page to include the new design for the context and documentation of the thread-safety issue that triggered the design change

Key Decision Points

  • Design for a thread-safe context.

Some possibilities include:

  • Use the Prefect database
  • Use a different database
  • REDIS
@amcnicho
Copy link
Member

amcnicho commented Mar 6, 2025

Execution of pipeline.py had errors coming from the contents of the context being mismatched with the keys written by previous stages, so when a subsequent parameterized flow requested an expected key, that failed:

  File "RADPS/prefect_workflow/stage_image_cont_selfcal.py", line 183, in solve
    n_field = datashape[src]['n_field']
KeyError: 'J1752-2956'

This was seen when the pipeline was submitted to helm deployment of prefect-server, but not when running in a local process (prefect server start &).

amcnicho added a commit that referenced this issue Mar 21, 2025
- Add prefect_sqlalchemy dependencies (#44)
@krlberry
Copy link
Contributor Author

krlberry commented Apr 1, 2025

The context was updated to store information in a database (defaulting to sqlite). The information is saved in the database and read from the database as needed.

Here is an example of the resulting stored information:

sqlite> .open context.db
sqlite> select * from context; 
1|calibrator_data_import_and_prep|datashape|{"J1752-2956": {"n_field": 1, "n_spw": 3, "n_scan": 27}}
2|calibrator_data_import_and_prep|datashape|{"J1851+0035": {"n_field": 1, "n_spw": 3, "n_scan": 27}}
3|calibrator_data_import_and_prep|qa|{"calibrator_data_import_and_prep_J1752-2956": 0.06997418511464948}
4|calibrator_data_import_and_prep|qa|{"calibrator_data_import_and_prep_J1851+0035": 0.7180763930764049}
5|bandpass|qa|{"bandpass_qa_score_J1752-2956": 0.9763887191935213}
6|gaincal|qa|{"gaincal_qa_score": 0.46173659562622504}
7|data_import_and_prep|datashape|{"source_0": {"n_field": 1, "n_spw": 3, "n_scan": 27}, "source_1": {"n_field": 1, "n_spw": 3, "n_scan": 27}}
8|data_import_and_prep|qa|{"data_import_and_prep": 0.8378984900777396}
9|findcont|datashape|{"source_0": {"n_field": 1, "n_spw": 3, "n_scan": 27}, "source_1": {"n_field": 1, "n_spw": 3, "n_scan": 27}}
10|findcont|datashape|{"bcal": {"n_field": 1, "n_spw": 3, "n_scan": 1}, "gcal": {"n_field": 1, "n_spw": 3, "n_scan": 4}, "target": {"n_field": 1, "n_spw": 3, "n_scan": 5}}
11|calibrator_imaging|qa|{"imaging_qa_score": 0.33969560899224105}
12|image_cube|qa|{"uvcontsub_qa_score spw0": 0.6863697885630929, "uvcontsub_qa_score spw1": 0.7539950165131493, "uvcontsub_qa_score spw2": 0.8190173226429013}
13|calibrator_imaging|qa|{"imaging_qa_score": 0.2217944813256103}
14|image_cube|datashape|{"target": {"n_field": 1, "n_spw": 3, "n_scan": 5, "n_nchan": 2}}
15|image_cube|qa|{"cubeimage_qa_score": 0.3496019656415892}

amcnicho added a commit that referenced this issue Apr 2, 2025
@amcnicho
Copy link
Member

amcnicho commented Apr 3, 2025

Using the throwaway branch context_validation I went back to remove the except (OSError, KeyError) as e: from prefect_workflow/stage_bandpass_solve.py (cacf144) that was put in to address the difference in context content seen with the previous implementation.

I tested the new context backend by running pipeline.py in both a k3d and local deployment, and saw that exception no longer occurs.

@krlberry
Copy link
Contributor Author

krlberry commented Apr 3, 2025

I ran the same tests as @amcnicho on my machine and can confirm that the exception no longer occurs.

@krlberry krlberry closed this as completed Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants