Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a use-cases notebook and a top level function to manage calls to different classes. #2

Merged
merged 7 commits into from
Jul 26, 2022

Conversation

zoghbi-a
Copy link
Contributor

  • The use-cases notebook documents the different access cases discussed in Cloud access use cases #1.

  • Added a top level function get_data_product in fornax.py to handle calls to different DataHandler classes.

@zoghbi-a
Copy link
Contributor Author

@bsipocz,
In 6f9f4d7, I added a simple example of accessing Spitzer images on the bucket share with daskhub team. The example first calls the IPAC SIA server, and then adds the cloud_access column by hand.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a couple of comments, most of them are either long-term rules of thumb or nitpick, so I go ahead and merge the PR as is now.

I may also raise the idea of switching to markdown (myst markdown) for the notebooks, it's a bit easier to handle them in pull requests than when they are in the ipynb format.

column returned with the data product

Returns
-------
The same input dict with any standardization applied.
"""

# TODO; more rigorous checks
keys = list(info.keys())
assert('region' in keys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long term we should remove the asserts from the code



class AWSDataHandler(DataHandler):
"""Class for managaing access to data in AWS"""

def __init__(self, product, profile=None, **kwargs):
def __init__(self, product, access_url_column='access_url', profile=None, requester_pays=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use keyword only for all optional parameters, it will make life down the line easier if/when we need to extend the API, etc.

Suggested change
def __init__(self, product, access_url_column='access_url', profile=None, requester_pays=False):
def __init__(self, product, *, access_url_column='access_url', profile=None, requester_pays=False):


try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a little bit long try/except, long term should be refactored.

info['data_region'] = data_region
info['data_access'] = data_access

except AWSDataHandlerException as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extreme nitpick, use longer than one letter variables, easier to grep on them, etc.

try:
header_info = s3_client.head_object(Bucket=bucket_name, Key=path)
accessible, msg = True, ''
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long term comment: we should be more specific what type of exceptions to expect

@bsipocz bsipocz merged commit 2951764 into nasa-fornax:main Jul 26, 2022
@zoghbi-a zoghbi-a deleted the top_level_function branch September 6, 2022 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants