Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ValueError: I/O operation on closed file" uploading to Zenodo #81

Open
Tracked by #61
zaneselvans opened this issue Feb 28, 2023 · 1 comment
Open
Tracked by #61
Labels
inframundo mshamines Mining Safety and Health Administration Data zenodo
Milestone

Comments

@zaneselvans
Copy link
Member

Running the MSHA Mines archiver locally, I'm frequently getting I/O errors during upload, which don't seem to trigger a retry. This seems to be happening on the larger files (a couple 100 MB, by process of elimination), but I'm not sure which upload is actually failing.

I'm not sure if there as any issue with the file itself, as the temporary download directory is cleaned up at the end of the archiver run, but they're zipfiles, and so they should have been verified as valid zipfiles upon download.

2023-02-27 20:36:34 [    INFO] catalystcoop.pudl_archiver.depositors.zenodo:92 PUT https://sandbox.zenodo.org/api/files/a2ac85a6-1aa3-4339-9c07-51163bedffe9/mshamines-assessed_violations.zip - Uploading mshamines-assessed_violations.zip to bucket
2023-02-27 20:41:35 [    INFO] catalystcoop.pudl_archiver.utils:46 Error while executing <coroutine object ZenodoDepositor._make_requester.<locals>.requester.<locals>.run_request at 0x28dfa9f50> (try #1, retry in 10s):
Encountered exceptions, showing traceback for last one: ["('mshamines', ValueError('I/O operation on closed file'))"]
Traceback (most recent call last):
  File "/Users/zane/mambaforge/envs/pudl-cataloger/bin/pudl_archiver", line 8, in <module>
    sys.exit(main())
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/cli.py", line 58, in main
    asyncio.run(archive_datasets(**vars(args)))
  File "/Users/zane/mambaforge/envs/pudl-cataloger/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Users/zane/mambaforge/envs/pudl-cataloger/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/__init__.py", line 81, in archive_datasets
    raise exceptions[-1][1]
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/orchestrator.py", line 195, in run
    await self._apply_changes()
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/orchestrator.py", line 258, in _apply_changes
    await self.depositor.create_file(
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/depositors/zenodo.py", line 346, in create_file
    return await self.request(
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/depositors/zenodo.py", line 104, in requester
    response = await retry_async(
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/utils.py", line 41, in retry_async
    return await coro
  File "/Users/zane/code/catalyst/pudl-archiver/src/pudl_archiver/depositors/zenodo.py", line 95, in run_request
    response = await session.request(method, url, **kwargs)
  File "/Users/zane/mambaforge/envs/pudl-cataloger/lib/python3.10/site-packages/aiohttp/client.py", line 508, in _request
    req = self._request_class(
  File "/Users/zane/mambaforge/envs/pudl-cataloger/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 313, in __init__
    self.update_body_from_data(data)
  File "/Users/zane/mambaforge/envs/pudl-cataloger/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 517, in update_body_from_data
    size = body.size
  File "/Users/zane/mambaforge/envs/pudl-cataloger/lib/python3.10/site-packages/aiohttp/payload.py", line 379, in size
    return os.fstat(self._value.fileno()).st_size - self._value.tell()
ValueError: I/O operation on closed file
@zaneselvans zaneselvans added inframundo mshamines Mining Safety and Health Administration Data labels Feb 28, 2023
@zaneselvans zaneselvans changed the title "ValueError: I/O operation on closed file" when uploading to Zenodo "ValueError: I/O operation on closed file" uploading to Zenodo Feb 28, 2023
@zaneselvans zaneselvans moved this from 🆕 New to 🔖 Backlog in Catalyst Megaproject Feb 28, 2023
@jdangerx
Copy link
Member

jdangerx commented Mar 6, 2023

Scope:

  • MHSA mines archiver doesn't fail on this over the course of 10 runs

Next steps:

  • which file are we trying to read here? when is it open/closed?
    • wrap the await self.depositor.create_file with a try/except that catches the ValueError and instantiates a PDB so we can investigate
    • does this happen every time we fail an upload to Zenodo? force retry_coro to fail on its first try and see if this consistently fails

@zaneselvans zaneselvans added this to the 2023Q2 milestone Mar 12, 2023
@zaneselvans zaneselvans moved this from 🔖 Backlog to 🥶 Icebox in Catalyst Megaproject Mar 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inframundo mshamines Mining Safety and Health Administration Data zenodo
Projects
Status: Icebox
Development

No branches or pull requests

2 participants