Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not add PFNs for subworkflow input files #4617

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

spxiwh
Copy link
Contributor

@spxiwh spxiwh commented Feb 2, 2024

While looking over #4535 I realised that we still had some oddities in how subworkflow file inheritance worked

In particular, consider our separate jobs (think minifollowups) that use input files generated in the main workflow (e.g. TRIGGER_MERGE).

When these jobs run, they use tools like resolve_url_to_file to create file objects for the input files. This also adds a PFN into the minifollowsup dax for that file. If these tools are run standalone this is exactly what you want. However, if the minifollowups code is run within a workflow, you do not want to add a PFN because pegasus (and our own code around it) will handle the file transfer for you. Especially when the minifollowup dag creation job runs on a condor node, the PFNs it generates will be invalid and we get things like:

 { "type": "transfer",
   "linkage": "input",
   "lfn": "H1L1-PAGE_FOREGROUND_XMLALL-1368975466-1121610.xml",
   "id": 2,
   "src_urls": [
     { "site_label": "condorpool_symlink", "url": "file:///home/gareth.cabourndavies/test_codes/pycbc/offline_gracedb_prep_workflow/testing/output_test_chunk1_v2/local-site-scratch/work/./o4-main.dax_o4-main/./H1L1-PAGE_FOREGROUND_XMLALL-1368975466-1121610.xml", "priority": 300, "checkpoint": "false" },
     { "site_label": "condorpool_symlink", "url": "file:///local/condor/execute/dir_1600786/pegasus.2V5qg27hB/H1L1-PAGE_FOREGROUND_XMLALL-1368975466-1121610.xml", "priority": 300, "checkpoint": "false" }
   ],
   "dest_urls": [
     { "site_label": "condorpool_symlink", "url": "symlink://$PWD/H1L1-PAGE_FOREGROUND_XMLALL-1368975466-1121610.xml" }
   ] }

Now this doesn't break anything because the first PFN is the right one, and the one coming from pegasus, but it worries me that the second one is always there, and I want it gone.

This patch will not create for files like this that are being passed to subworkflows. resolve_url_to_file now has a new add_pfn option, which these codes set. There is also some sanity checking to avoid some illegal use cases, but hopefully people don't touch this unless they know what it does! I used the dax-file-directory option to decide when to set this new option to False. dax-file-directory needs to be set when running in this subworkflow mode anyway, and should never be set otherwise, so if it's set, you can assume this is a subworkflow.

I also noted that the seed option for the sbank workflow wasn't working as it should if that workflow is run as a top-level workflow. There will be cases where pegasus will not know where the actual file is. This should also use resolve_url_to_file.

@spxiwh spxiwh requested a review from ahnitz February 2, 2024 17:11
@ahnitz
Copy link
Member

ahnitz commented Feb 2, 2024

@spxiwh I'm not sure it's true that dax-file-directory won't be set outside a subworkflow. If you run the minifollowups by hand it will be since it's hardcoded there, no? Maybe an usual useage at the moment. I don't have a better idea at the moment, this may be the right patch for now.

In the longer term, I'm not sure this is something that should really be handled on our end. I think in concept workflows should be idempotent. This may require changes in Pegasus though. I think what's needed here is some way to indicate that the PFN should be overriden if it is in a subworkflow (and pegasus itself determines that relationship when it runs rather than anything in the workflow needing to know if this instance is a top level or sub workflow.).

@spxiwh
Copy link
Contributor Author

spxiwh commented Feb 6, 2024

@ahnitz A few points to answer (I won't bother trying to fix the tests until we're at least happy with the motivation):

  • The --dax-file-directory option should never be set on the command line. If the minifollowups are being run by hand this option would not be given at all .... However, if someone does copy paste the .sh command that a workflow generates for the minifollowups it would have this in it, and then it would likely be used.
  • There's no fix that should be made in pegasus here. I think it is just doing what we tell it (up to the normal caveats that subworkflows are a bit fiddly). We told it that /local/condor/execute/dir_1600786/pegasus.2V5qg27hB/H1L1-PAGE_FOREGROUND_XMLALL-1368975466-1121610.xml is a valid PFN for the input file, so it should be expected to think that this is a valid PFN.
  • HOWEVER, note that pegasus does also do the right thing in realising that the PFN coming from the parent workflow is preferred, so it never actually tries to transfer /local/condor/execute/dir_1600786/pegasus.2V5qg27hB/H1L1-PAGE_FOREGROUND_XMLALL-1368975466-1121610.xml because /home/gareth.cabourndavies/test_codes/pycbc/offline_gracedb_prep_workflow/testing/output_test_chunk1_v2/local-site-scratch/work/./o4-main.dax_o4-main/./H1L1-PAGE_FOREGROUND_XMLALL-1368975466-1121610.xml is looked for and found first.

There is a genuine fix in the sbank workflow here, which would need to do in even if this isn't approved, the rest of it thought is just me not liking the dax files containing "invalid" PFNs, even if they're not used.

An alternative would be to add something like a --dont-add-input-file-pfns option, but in some way this is basically equivalent to --is-subworkflow which we didn't want. However, I think the choice of whether or not to add an input-file PFN is only possible if the workflow knows if it is a subworkflow or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants