-
Notifications
You must be signed in to change notification settings - Fork 108
T0 to produce skimmed RAW data through Repack Workflow #12298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
98804d2
to
bf3e6cf
Compare
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Hello @anpicci @todor-ivanov I have modified the unit tests for the repack workflow, and seems like I got those to work. However I see many other tests that failed and I don't understand their relationship with the changes I made: https://cmssdt.cern.ch/dmwm-jenkins/job/WMCore-PR-Report/560/#showFailuresLink There I dont see any of the StdBase.py and any of the Repack.py tests failing. Could you please have a look and let me know what I can do to improve those test? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LinaresToine the 2 failing unit tests are unstable, so there is nothing to worry with that.
For the Repack unit test, if there was no complain, that means the status of the unit test did not change (but to be on the safe side, perhaps we should ensure that it is succeeding in jenkins).
output['dataTier']) | ||
moduleLabel = "write_%s_%s" % (output['primaryDataset'], | ||
output['dataTier']) | ||
output['moduleLabel'] = moduleLabel.replace("-", "_") # For T0 Raw Skims, PDs will contain a "-", so here we replace for "_" for the moduleLabel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Antonio, can you please move this comment to the line above instead of inline?
Being really honest, I fear that this change can cause us "hidden" problems in the future. Is there any strong reason not to create a PD named with underscore instead of a dash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, given that it is a skim dataset, why changing the primary dataset and not the processing string (what goes between primary dataset and datatier)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comments @amaltaro. It all comes down to creating RAW data. The value of the Raw Skim datasets is that we are producing RAW data that can be Prompt Reconstructed if desired. This means that this feature takes effect in the Repack workflow. In short, we are creating two RAW outputs from the same Repack workflow, and we must distinguish between them.
Lets say then
/PD/Era-v1/RAW
/PD/Era-RawSkim-v1/RAW
which has a few inconveniences:
a). It is not trivial at all for T0 to give the new output an independent prompt reco configuration. This has the additional limitation of sending both sets of RAW data to the same destinations, which is not wanted. Allowing us to configure all these details only for the skimmed RAW is really what makes this project possible. Treating it as a primary dataset gives us this freedom.
b). It does not save us from the module label problem, since those two outputs would have the same module label by definition.
May I please ask what your concern with the dash is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to add that we do produce PDs with a dash (the error PDs), and they are processed without a problem through the system (Tier-0/WMCore/Rucio/DBS). We don't think accepting dashes in the skimmed PDs would cause a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I guess the tape families is a good argument to make this change at the PD level instead of the PS.
My only concern is that this moduleLabel
could be used downstream and no longer be consistent with the output module. However, as that naming conversion only happens at the Repack factory, the risk is much smaller.
Thank you for the follow up, it looks good to me.
The repack tests were failing before my changes in the commit 0b888e3 Please see
However, it was not clear to me if my development required modification or addition of unit tests for the StdBase module. Existing tests were successful. |
Jenkins results:
|
These changes are looking good to me. Can you please squash these commits? See some information on this in "Step 10" at https://github.com/dmwm/WMCore/blob/master/CONTRIBUTING.rst#contributing |
Fixes #12297
Status
Ready
Description
In order to support the new
raw skim
datasets, we need to allow two things:moduleLabel
in Repack.py can't have-
in its name, so we introduce the parameterparentDataset
inOutputs
. This way we can create a legal name formoduleLabel
:We then delete the new attribute similarly to what is done in setupProcessingTask.
Is it backward compatible (if not, which system it affects?)
YES
Related PRs
T0 PR: dmwm/T0#5041
CMSSW PR:
External dependencies / deployment changes
NO