Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for user provided archive access url #75

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions tools/archives/pyvo_integration/astronomical_archives.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import functools
import json
import os
import re
import signal
import sys
import urllib
Expand Down Expand Up @@ -507,6 +508,20 @@ def _set_archive(self):
self._archives.append(
TapArchive(access_url=self._service_access_url))

elif self._archive_type == 'custom':
self._service_access_url = \
self._json_parameters['archive_selection']['access_url']

if Utils.is_valid_url(self._service_access_url):
self._archives.append(
TapArchive(access_url=self._service_access_url))
else:
error_message = "archive access url is not a valid url"
Logger.create_action_log(
Logger.ACTION_ERROR,
Logger.ACTION_TYPE_ARCHIVE_CONNECTION,
error_message)

else:
keyword = \
self._json_parameters['archive_selection']['keyword']
Expand Down Expand Up @@ -1305,6 +1320,17 @@ def collect_resource_keys(urls_data: list) -> list:
resource_keys.append(key)
return resource_keys

@staticmethod
def is_valid_url(url: str) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can use here a function from urllib, e.g. urlparse()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I saw some people using urlparse but apparently it's too permissive in some cases. The one I got is what is or was used to validate url in Django (and I removed local and ip addresses + other protocols).

I will investigate a bit more

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also https://github.com/python-validators/validators which you may consider to use here.
But if regex covers all cases, we can go with it, I think. However, the current regex seems to be a bit wrong, it allows for empty hostname due to | in the end of line 1328.
This could be covered by some kind of test. @bgruening is there an established practice of adding unit-tests (with e.g. pytest) to the tool's script, or it should be incorporated in planemo tests ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit test will not be run. Planemo test is the only tests that we run automatically. That is, we usually don't want to ship complex code with the wrappers. The usual way is that we depend on packages and more complex scripts should be packaged etc ...

However, we could run unit-tests here on CI for all .py scripts, if that helps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also https://github.com/python-validators/validators which you may consider to use here. But if regex covers all cases, we can go with it, I think. However, the current regex seems to be a bit wrong, it allows for empty hostname due to | in the end of line 1328. This could be covered by some kind of test. @bgruening is there an established practice of adding unit-tests (with e.g. pytest) to the tool's script, or it should be incorporated in planemo tests ?

I saw that the validators package seemed to be widely used but it's not built in and you have to install it separately if I'm not mistaken.

Thanks for the find I will correct the regex and then we can see if we keep it or use another solution

regex_url = re.compile(
r'^(?:http)s?://'
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+'
r'(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|)'
r'(?::\d+)?'
r'(?:/?|[/?]\S+)$', re.IGNORECASE)

return re.match(regex_url, url) is not None


class Logger:
_logs = []
Expand Down
4 changes: 4 additions & 0 deletions tools/archives/pyvo_integration/astronomical_archives.xml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
<param name="archive_type" type="select" label="Archive Selection">
<option value="archive">Query specific IVOA archive</option>
<option value="registry">Query all matching IVOA archives</option>
<option value="custom">Query custom TAP archive</option>
</param>
<when value="registry">
<param name="keyword" type="text" label="Keyword" />
Expand All @@ -47,6 +48,9 @@
<options from_data_table="astronomical_archives" />
</param>
</when>
<when value="custom">
<param name="access_url" type="text" label="TAP archive access url" />
</when>
</conditional>
<section name="query_section" title="Query selection" expanded="true">
<conditional name="query_selection">
Expand Down
Loading