Definition of LFNs for the custom datasets in columnflow #412
-
Dear users, maintainers, all!
Then we obtain LFNs by running a function like this:
Then, somewhere in the framework (probably in GetDatasetLFNs task), this function is executed. My question is how to be with the case when my files are stored not in a single directory. So I need to define different paths for different datasets. Best, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi Stepan, Indeed, what you describe is correct. The As a user, you have complete freedom of how you want to obtain the paths to these lfns. In principle, you could write a function that parses paths to files based on individual datasets, based on the name of a given dataset - this is a design choice you could make when writing your code. Imho, this is not the most efficient way to go about this though, since this doesn't scale very well if you consider a lot of datasets in your analysis. I agree that it would be better to attach the information about the location of a given dataset to the dataset itself. You can do this in many ways, e.g. via the auxiliary dictionary that most of the order objects provide. In this dictionary, you can basically asign any key word argument you want, and then later access it for example in the So bottom line is: I think you got the structure right, and the optimal way to implement a dataset-dependent location when retrieving the paths to the lfns is a matter of taste. Personally, I agree that it's most efficient/sustainable to attach this information to the datasets themselves, for example with the auxiliary dictionary you can access/fill either within the cmsdb or even in your analysis config at runtime. Hope this helps! Cheers, |
Beta Was this translation helpful? Give feedback.
Hi Stepan,
Indeed, what you describe is correct. The
get_dataset_lfns
function is called here, and is used to create json files that contain the final paths to the individual nanoAOD files with ther logical file names (lfns). After this point, only these json files are used in subsequent tasks - so once the list of files has been collected, theget_dataset_lfns
function is not used anymore.As a user, you have complete freedom of how you want to obtain the paths to these lfns. In principle, you could write a function that parses paths to files based on individual datasets, based on the name of a given dataset - this is a design choice you could make when writing your code. Imho, this is n…