Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GISAID ingest keeps running out of memory #217

Open
eharkins opened this issue Oct 1, 2021 · 1 comment
Open

GISAID ingest keeps running out of memory #217

eharkins opened this issue Oct 1, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@eharkins
Copy link
Contributor

eharkins commented Oct 1, 2021

These workflows:

keep running out of memory on AWS and being killed, e.g. https://github.com/nextstrain/ncov-ingest/runs/3764464129?check_suite_focus=true.

This likely happens during the run of https://github.com/nextstrain/ncov-ingest/blob/master/bin/transform-gisaid since it takes gisaid.ndjson (the raw GISAID full dataset, which is over 100GB) as input, and performs a bunch of operations on it.

To avoid continually increasing the resources we ask for on the batch job, here are some ideas:

@tsibley said:

We should also understand why the memory needs increased even though the core of the ETL is streaming, and maybe also consider running these on m5 instance family instead of c5.
(which could be as small a change as adding m5 instances to the job queue used.)

@rneher said:

there are some lengthy compression steps that could happen in parallel to the rest of the pipeline.
the gzip compression is about 1h. Changing the compression of the ndjson to xz -2 already saved a lot.

@eharkins eharkins added the bug Something isn't working label Oct 1, 2021
@eharkins eharkins changed the title GISAID fetch-and-ingest keeps running out of memory GISAID ingest keeps running out of memory Oct 1, 2021
@eharkins
Copy link
Contributor Author

eharkins commented Oct 1, 2021

Last increase in memory we ask for was earlier this week: 5db5d25. Maybe we should raise it again for now while we implement a more scalable solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant