Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tropo_pyaps3: parallel downloads #1195

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ritwika21
Copy link

Description of proposed changes

Reminders

  • Fix #xxxx
  • Pass Pre-commit check (green)
  • Pass Codacy code review (green)
  • Pass Circle CI test (green)
  • Make sure that your code follows our style. Use the other functions/files as a basis.
  • If modifying functionality, describe changes to function behavior and arguments in a comment below the function declaration.
  • If adding new functionality, add a detailed description to the documentation and/or an example.

@yunjunz yunjunz self-requested a review May 29, 2024 03:18
@yunjunz
Copy link
Member

yunjunz commented May 29, 2024

Thank you @ritwika21 for contributing!

Could you add some description to the PR?

And some questions on the ERA5 parallel downloading:

  1. How much time does it save now with this PR, compared with the current version, with an example?
  2. Based on what I was aware of a couple of years ago, ECMWF (via the Copernicus Climate Data Store) allows a max of 3 submitted jobs per user at the same time. Will it make more sense to set the parallel job number to 3, instead of 64, if the 3-job-limit still exists?

@yunjunz
Copy link
Member

yunjunz commented May 29, 2024

pre-commit.ci autofix

@yunjunz yunjunz mentioned this pull request May 29, 2024
7 tasks
@yunjunz yunjunz changed the title parallel downloads tropo_pyaps3: parallel downloads May 29, 2024
@falkamelung
Copy link
Contributor

falkamelung commented May 29, 2024

I heard that downloading one date as one file is not the right approach. ECMWF might have options to download all the required days for one SAR dataset as one file. I asked some atmospheric scientists for help, and they were shocked about how we download the data. But I did not pursue this yet.

But until the overall approach is fixed, any improvement is of course greatly appreciated!

@yunjunz
Copy link
Member

yunjunz commented May 29, 2024

Any link to the code or documentation for the downloading all-at-once approach will be very helpful.

Copy link

codeautopilot bot commented Feb 9, 2025

PR Summary

This Pull Request introduces parallel downloading of GRIB files in the tropo_pyaps3.py module by utilizing Python's ThreadPoolExecutor. The change aims to improve the efficiency of downloading weather re-analysis data by processing multiple files concurrently. A new helper function, dload_grib_files_worker, is introduced to handle the download logic for individual files. Additionally, a minor modification in readfile.py changes the date parsing logic to split on underscores instead of colons, which likely aligns with a change in the input data format.

Review Checklist

  • Fix #xxxx (No issue number provided)
  • Pass Pre-commit check (green) (Ensure pre-commit checks are passing)
  • Pass Codacy code review (green) (Ensure Codacy checks are passing)
  • Pass Circle CI test (green) (Ensure Circle CI tests are passing)
  • Make sure that your code follows our style. Use the other functions/files as a basis. (Verify code style consistency)
  • If modifying functionality, describe changes to function behavior and arguments in a comment below the function declaration. (Ensure function behavior changes are documented)
  • If adding new functionality, add a detailed description to the documentation and/or an example. (Ensure new functionality is documented)

Suggestion

Consider adding error handling within the dload_grib_files_worker function to manage potential exceptions during file downloads. This could prevent the entire download process from failing if a single file encounters an issue. Additionally, it might be beneficial to include logging to track the progress and status of each file download, which can aid in debugging and monitoring.

This comment was generated by AI. Information provided may be incorrect.

Current plan usage: 0%

Have feedback or need help?
Documentation
[email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants