Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize path values in configuration data within context of job workers #407

Open
robertbartel opened this issue Jul 28, 2023 · 5 comments
Assignees
Labels
bug Something isn't working maas MaaS Workstream

Comments

@robertbartel
Copy link
Contributor

Some configuration data applicable to various DMOD jobs involves paths. Examples include paths to other configuration files, BMI module shared libraries, data files/directories, etc.

Paths in configuration data need to be valid within the context of the job worker's file system. Because users are generally not expected to be aware of this file system structure, functionality must be added to DMOD so that it can normalize these path values.

The normalization must be done at some point between receiving the data from the user (via form, direct file upload, etc.), but the best place for this still needs to be determined. Two initial possibilities are:

  • within the InitialDataAdder class (or subclasses) and executed within the data service
  • within the worker (potentially just within one if a job has many) at the beginning of its execution

There are also implications to consider from the ngen-config dependency, in how it validates paths (in particular when applicable types are used within the data service, which won't always have the backing config data loaded).

@aaraney
Copy link
Member

aaraney commented Feb 22, 2024

@aaraney
Copy link
Member

aaraney commented Apr 19, 2024

This is also now blocking this work:

@robertbartel
Copy link
Contributor Author

@aaraney, can you clarify the blocking relationship with NOAA-OWP/ngen-cal#119; i.e.:

  • this issue strictly cannot be completed without ngen-cal-119
  • this issue is best addressed in a way that depends on the changes in ngen-cal-119, assuming ngen-cal-119 is in progress
  • something else

@aaraney
Copy link
Member

aaraney commented Apr 30, 2024

This issue can be completed without NOAA-OWP/ngen-cal#119. However, if NOAA-OWP/ngen-cal#119 is introduced, changes will be required to DMOD source inclusive of the feature discussed in this issue.

@robertbartel
Copy link
Contributor Author

@aaraney, in thinking about #637, #654, #593, and similar issues, I'm considering whether we need to adapt/extend dataset formats and data orchestration behavior in certain ways that will be significant to this issue. We might need to discuss some of this further.

For one, we may need to store certain data that comes in the form of a large number of individual files as archives for performance reasons. Also, we may need to have a way to truly download data locally on disk (i.e., of the actual host node) for containerized workers, perhaps to a simple Docker volume or something. These likely both have implications on how we set up paths in configs to find the data within the workers.

Archiving files before uploading to an object store (e.g., generated BMI init configs) is orders of magnitude faster, but means workers would need to extract things. Retrieving all data files all at once - even without archiving - also appears to be orders of magnitude faster, at least for forcings CSV, than retrieving files one at a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working maas MaaS Workstream
Projects
None yet
Development

No branches or pull requests

2 participants