-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uproot.dask
behavior for partially readable files
#1046
Comments
This might be related to #1048, with the Daskified version encountering errors where an eager version does not encounter errors. If the eager version does not encounter errors, the Daskified should not encounter errors either—it's calling the same code, just at a later time on a Dask worker instead of the head node. Oh!!! Maybe the Dask worker has an outdated version of Uproot? Maybe that's why you see different errors when running eagerly or lazily, because it's running different versions of Uproot in the two cases? |
Asking for the Daskified mode to raise or not raise the same interpretation or deserialization errors (something that nothing to do with delaying computations) as the eager mode is not a feature request. |
I was not sure how I ran this originally, but I just reproduced this locally on my laptop so I think it is not an issue with different uproot versions. A list of possibly relevant package versions (just did a new install):
We now also have a publicly available file in the same format that can be used to reproduce the behavior above, sitting on EOS: xrdcp root://eosuser.cern.ch//eos/user/f/feickert/physlite_public_testing/DAOD_PHYSLITE.34858087._000001.pool.root.1 . |
It seems that the EOS link in my previous comment still needs some permissions to work, here is another way to access the file over cernbox that should work correctly: curl -sLO https://cernbox.cern.ch/remote.php/dav/public-files/wJGWzAyirlWE6QV/DAOD_PHYSLITE.34858087._000001.pool.root.1 |
The current behavior of
uproot.open
anduproot.dask
differs when dealing with files that are partially unreadable byuproot
. In my concrete example, I am dealing with ATLAS PHYSLITE files. The following snippet:works just fine to access this specific array. A similar version with Dask fails (before a
.compute()
):because parts of the file are not understandable to
uproot
:@lgray pointed out that this is expected behavior and
coffea
removes branches to address this (_remove_not_interpretable
).What I would like to raise for discussion here is making the
uproot.dask
behavior match more closely that ofuproot.open
. As long as I only need data thatuproot
can read, the Dask interface should be able to supply it without too much additional effort for the user. Concretely that might mean for example:_remove_not_interpretable
, possibly with an accompanying warning, orUnknownInterpretation
error with information of how to use it to achieve_remove_not_interpretable
-like behavior.I do not know what the worst case scenario looks like with partially unreadable files: can this ever imply that the interpretation of the other (to
uproot
appearing as "readable") columns becomes wrong? If so, it is dangerous to automatically handle such files without the user potentially being aware of course.The text was updated successfully, but these errors were encountered: