-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update es with dataservices objects #48
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a good first iteration for dataservice search! 🚀
udata_search_service/container.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to add a changelog entry :)
I think you could add the commands needed to make the dataservice search work, ie udata-search-service init-es
on udata-search-service and udata search index dataservice
on udata
for key, value in filters.items(): | ||
s = s.filter('term', **{key: value}) | ||
|
||
dataservices_score_functions = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can update the score functions factors depending on search results in production
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that was a big one, congratz!
Good job, a couple of nits that you man want to ignore ;)
"public_cible": | ||
[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is it the auto-formatter that splits that line on two lines?
index_resp = client.post(url_for('api.dataservice_index'), json=query) | ||
assert index_resp.status_code == 200 | ||
|
||
time.sleep(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment for a future change: we might want to do some polling or check if there's an api call or something to avoid sleeps in test code, it can quickly become unwieldy and slow tests a lot for no good reason, while at the same time making the test flaky (on very slow connections or computers).
dataservice = { | ||
"id": faker.md5(), | ||
"title": faker.sentence(), | ||
"description": faker.text(), | ||
"created_at": faker.past_datetime().isoformat(), | ||
"followers": faker.random_int(), | ||
'organization': { | ||
'id': faker.md5(), | ||
'name': faker.company(), | ||
'public_service': faker.random_int(min=0, max=1), | ||
'followers': faker.random_int() | ||
}, | ||
"owner": None, | ||
"tags": [], | ||
"is_restricted": False, | ||
"extras": | ||
{ | ||
"availability_url": "", | ||
"is_franceconnect": False, | ||
"public_cible": | ||
[] | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: would it make sense to define this "test dataservice" at the class level, or through a pytest fixture? It seems to be the exact same as the one in the previous test, and would avoid duplicating (while still allowing it to be modified if it's generated for each test in a fixture).
Or use the domain.factories.DataserviceFactory
?
@@ -339,6 +359,54 @@ def query_reuses(self, query_text: str, offset: int, page_size: int, filters: di | |||
res = [hit.to_dict(skip_empty=False) for hit in response.hits] | |||
return results_number, res | |||
|
|||
def query_dataservices(self, query_text: str, offset: int, page_size: int, filters: dict, sort: Optional[str] = None) -> Tuple[int, List[dict]]: | |||
s = SearchableDataservice.search() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could we possible not use single letter variables? Maybe something like search
?
choices = sorts + ['-' + k for k in sorts] | ||
if value not in choices: | ||
raise ValueError('Sort parameter is not in the sorts available choices.') | ||
return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is this for future sorts? It feels a bit overkill for the moment 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand, we're just validating the sorts
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorts are just one element, created
, right? So this whole code could be shortened to
sorts = ["created", "-created"]
if value not in sorts:
...
No big deal, just looked a bit funny ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it's the same logic in other models, Datasets having more sorts for example
|
||
next_url = url_for('api.dataservice_search', q=request_args.q, page=request_args.page + 1, | ||
page_size=request_args.page_size, _external=True) | ||
prev_url = url_for('api.dataservice_search', q=request_args.q, page=request_args.page - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should this be clamped to make sure we never get a page=0
or worse, a negative one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think edge cases are dealth in make_response
|
||
results, results_number, total_pages = dataservice_service.search(request_args.dict()) | ||
|
||
next_url = url_for('api.dataservice_search', q=request_args.q, page=request_args.page + 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should this be checked so we return None
when request_args.page == total_pages
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think edge cases are dealth in make_response
I think we don't want to refactor the code too much since we're targeting a major refacto of this. |
Completely agree! 🚢 |
Issue datagouv/data.gouv.fr#1405