-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a use-cases notebook and a top level function to manage calls to different classes. #2
Conversation
…any returns in process_data_info
…oved old simple-use-cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a couple of comments, most of them are either long-term rules of thumb or nitpick, so I go ahead and merge the PR as is now.
I may also raise the idea of switching to markdown (myst markdown) for the notebooks, it's a bit easier to handle them in pull requests than when they are in the ipynb format.
column returned with the data product | ||
|
||
Returns | ||
------- | ||
The same input dict with any standardization applied. | ||
""" | ||
|
||
# TODO; more rigorous checks | ||
keys = list(info.keys()) | ||
assert('region' in keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long term we should remove the asserts from the code
|
||
|
||
class AWSDataHandler(DataHandler): | ||
"""Class for managaing access to data in AWS""" | ||
|
||
def __init__(self, product, profile=None, **kwargs): | ||
def __init__(self, product, access_url_column='access_url', profile=None, requester_pays=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use keyword only for all optional parameters, it will make life down the line easier if/when we need to extend the API, etc.
def __init__(self, product, access_url_column='access_url', profile=None, requester_pays=False): | |
def __init__(self, product, *, access_url_column='access_url', profile=None, requester_pays=False): |
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a little bit long try/except, long term should be refactored.
info['data_region'] = data_region | ||
info['data_access'] = data_access | ||
|
||
except AWSDataHandlerException as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extreme nitpick, use longer than one letter variables, easier to grep on them, etc.
try: | ||
header_info = s3_client.head_object(Bucket=bucket_name, Key=path) | ||
accessible, msg = True, '' | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long term comment: we should be more specific what type of exceptions to expect
The use-cases notebook documents the different access cases discussed in Cloud access use cases #1.
Added a top level function
get_data_product
in fornax.py to handle calls to different DataHandler classes.