Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CMR query to get collection-level metadata elements that can be used in UMM-G construction. #15

Open
lisakaser opened this issue Jul 3, 2024 · 13 comments
Assignees
Milestone

Comments

@lisakaser
Copy link

lisakaser commented Jul 3, 2024

Pre-met and UMM-G files may require information already available from collection metadata. Determine whether there are any granule metadata building blocks in a dataset's collection-level metadata, and if so write a new story to implement a simple retrieval mechanism from CMR.

Information available from collection metadata:

  • GranuleSpatialRepresentation (may be set to GEODETIC, CARTESIAN, ORBIT)
  • collection level spatial
  • collection level temporal

Authentication: Earthdata login is needed - consider Operators are entering their personal authentication

sample data: NSIDC-0081

Acceptance criteria for implementation:

When app is run:

  • UMM-C metadata is retrieved from CMR
  • GranuleSpatialRepresentation, spatial and temporal will be printed out but not used in this issue
  • Spatial representation in UMM-G metadata files reflects the spatial representation associated with the collection. "GEODETIC" will result in Gpolygons representing the spatial coverage for a granule.

Version padding: should not be needed for MetGenC - ignore for now and dont retrieve version number from CMR

Separate issue: allow Ops to use the collection level metadata for the file level UMM-G

@lisakaser lisakaser added this to the Research phase milestone Jul 3, 2024
@juliacollins
Copy link
Contributor

I didn't think through the use case of a SIPS provider when I suggested utilizing collection-level metadata. In that case, I believe there will be no collection-level metadata available when granule-metgen executes. I suggest working through the SIPS use case in more detail, along with other use cases (e.g. the Data Production Team generating UMM-G as part of the data production workflow, and data producers who provide us with only data files), to clarify where different bits of information will be needed.

My current feeling is that Ops users configuring the data ingest process may benefit most from information existing in collection-level metadata files. Any regexes we have to produce to scrape metadata from data input files (in the case where the data producer only delivers data files) would also be candidates for re-use in both the UMM-G generation and data ingest steps.

@lisakaser lisakaser changed the title Identify collection-level metadata elements that can be used in UMM-G construction. Implement CMR query to get collection-level metadata elements that can be used in UMM-G construction. Sep 9, 2024
@lisakaser
Copy link
Author

lisakaser commented Oct 7, 2024

The use case of a SIPS does not need to be covered by MetGenC

@lisakaser lisakaser added the enhancement New feature or request label Oct 7, 2024
@lisakaser lisakaser removed this from the DPT Minimum Viable Product milestone Oct 7, 2024
@lisakaser lisakaser added the question Further information is requested label Dec 4, 2024
@lisakaser lisakaser added this to the Dec-Jan-Feb milestone Dec 5, 2024
@lisakaser lisakaser added high priority and removed enhancement New feature or request question Further information is requested labels Dec 5, 2024
@lisakaser
Copy link
Author

Ops would like to have a applications identification rather than each operator using their own. @eigenbeam will add more details.

@eigenbeam
Copy link
Contributor

Conversation with Troy:

K: MetGenC question: we are going to add a feature where the tool will query CMR to get some collection-level metadata. To do so, the tool will need an EDL userid/pwd or token when it makes the query.
Would it make more sense for MetGenC to:
Use an application credential
Use an operator credential
?

T: Application. would it be like OA where we need to maintain a valid service account/shared account token in something like Vault, or can it dynamically call for a token before each call to CMR?

K: Yeah, either way you prefer. If you want metgenc to get the app userid/pwd from Vault (or whatever), we can then go get or create a valid token. That might cost extra though.

@eigenbeam
Copy link
Contributor

So this translates into:

We could:

  1. Provide a way for the user to configure an EDL token that the metgenc will use. Metgenc wouldn't care what account the token belongs to, and it would be up to the user to renew the token.
  2. Provide a way for the user to configure an EDL userid & pwd. Metgenc would use this to retrieve one of the user's EDL tokens, and could also include code to renew the token if it has, or will soon, expire.
  3. In either case, we could also extend Metgenc to retrieve either (1) the EDL token, or (2) the EDL userid/pwd from an external provider like Vault or AWS Secrets Manager.

@eigenbeam
Copy link
Contributor

There's no free lunch, obviously, so those options all boil down to:

  1. A user being able to provide EDL secrets, or
  2. A user being able to provide a secret that allows the app to go to some other service and get the EDL secrets

@juliacollins
Copy link
Contributor

@eigenbeam API docs are at https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html in case you don't have them on speed dial. But! You might be able to ignore those if earthaccess magic is usable!

Here's my conversation with Luis from August:

example, if user123 has permissions to read from UAT and using the latest earthaccess version we should be able to:
import earthaccess 

earthaccess.login(system=earthaccess.UAT)

earthaccess will look for credentials in the environment, or a netrc and ultimately ask

Getting metadata:

# this returns a list of records from CMR (collection level UMM)
umm_records = earthaccess.search_datasets(
    short_name="ATL06",
    version="006",
    cloud_hosted=True,
)
# each record has a "meta" part and a "umm" part
umm_records[0]["meta"]

earthaccess is updated frequently so the above example might need refreshing. The umm_records[0]["meta"] part should have the details we want.

@eigenbeam
Copy link
Contributor

Do you know if it uses EDL tokens, or just a userid/pwd?

@juliacollins
Copy link
Contributor

I don't know what options earthaccess supports -- in the cmr-mediator we use EDL tokens (or Launchpad tokens, but those are for writing metadata to CMR which we don't need to worry about here). You'll have to check with the earthaccess experts.

@eigenbeam
Copy link
Contributor

eigenbeam commented Dec 20, 2024

I looked through the code and it first authenticates with userid/pwd, and asks EDL for an existing user token or creates a new one.

It doesn't seem that I can provide my own token to the library, though. I have to put the EDL userid/pwd somewhere available to it, but I guess is 'okay-ish' but not great from a security point-of-view.

@juliacollins
Copy link
Contributor

Notes from standup on 2/4/25: Don't worry about supporting padded datasets. Issue an error saying dataset couldn't be found, perhaps with note about checking version number padding.

@lisakaser
Copy link
Author

Originally 3 story points; 1 point left so I am changing the value.
Expectation of creation of more issues to close this ticket out.

@lisakaser
Copy link
Author

Changing story points from 1 to 0 points in this sprint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants