-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize path values in configuration data within context of job workers #407
Comments
This is blocked until the following are merged: |
This is also now blocking this work: |
@aaraney, can you clarify the blocking relationship with NOAA-OWP/ngen-cal#119; i.e.:
|
This issue can be completed without NOAA-OWP/ngen-cal#119. However, if NOAA-OWP/ngen-cal#119 is introduced, changes will be required to DMOD source inclusive of the feature discussed in this issue. |
@aaraney, in thinking about #637, #654, #593, and similar issues, I'm considering whether we need to adapt/extend dataset formats and data orchestration behavior in certain ways that will be significant to this issue. We might need to discuss some of this further. For one, we may need to store certain data that comes in the form of a large number of individual files as archives for performance reasons. Also, we may need to have a way to truly download data locally on disk (i.e., of the actual host node) for containerized workers, perhaps to a simple Docker volume or something. These likely both have implications on how we set up paths in configs to find the data within the workers. Archiving files before uploading to an object store (e.g., generated BMI init configs) is orders of magnitude faster, but means workers would need to extract things. Retrieving all data files all at once - even without archiving - also appears to be orders of magnitude faster, at least for forcings CSV, than retrieving files one at a time. |
Some configuration data applicable to various DMOD jobs involves paths. Examples include paths to other configuration files, BMI module shared libraries, data files/directories, etc.
Paths in configuration data need to be valid within the context of the job worker's file system. Because users are generally not expected to be aware of this file system structure, functionality must be added to DMOD so that it can normalize these path values.
The normalization must be done at some point between receiving the data from the user (via form, direct file upload, etc.), but the best place for this still needs to be determined. Two initial possibilities are:
There are also implications to consider from the ngen-config dependency, in how it validates paths (in particular when applicable types are used within the data service, which won't always have the backing config data loaded).
The text was updated successfully, but these errors were encountered: