Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update es with dataservices objects #48

Merged
merged 12 commits into from
Nov 5, 2024
Merged

Conversation

geoffreyaldebert
Copy link
Contributor

@geoffreyaldebert geoffreyaldebert commented Oct 17, 2024

Copy link
Contributor

@maudetes maudetes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a good first iteration for dataservice search! 🚀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add a changelog entry :)
I think you could add the commands needed to make the dataservice search work, ie udata-search-service init-es on udata-search-service and udata search index dataservice on udata

for key, value in filters.items():
s = s.filter('term', **{key: value})

dataservices_score_functions = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can update the score functions factors depending on search results in production

Copy link

@magopian magopian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that was a big one, congratz!
Good job, a couple of nits that you man want to ignore ;)

Comment on lines +371 to +372
"public_cible":
[]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is it the auto-formatter that splits that line on two lines?

index_resp = client.post(url_for('api.dataservice_index'), json=query)
assert index_resp.status_code == 200

time.sleep(2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment for a future change: we might want to do some polling or check if there's an api call or something to avoid sleeps in test code, it can quickly become unwieldy and slow tests a lot for no good reason, while at the same time making the test flaky (on very slow connections or computers).

tests/test_api.py Show resolved Hide resolved
tests/test_api.py Outdated Show resolved Hide resolved
Comment on lines +426 to +448
dataservice = {
"id": faker.md5(),
"title": faker.sentence(),
"description": faker.text(),
"created_at": faker.past_datetime().isoformat(),
"followers": faker.random_int(),
'organization': {
'id': faker.md5(),
'name': faker.company(),
'public_service': faker.random_int(min=0, max=1),
'followers': faker.random_int()
},
"owner": None,
"tags": [],
"is_restricted": False,
"extras":
{
"availability_url": "",
"is_franceconnect": False,
"public_cible":
[]
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: would it make sense to define this "test dataservice" at the class level, or through a pytest fixture? It seems to be the exact same as the one in the previous test, and would avoid duplicating (while still allowing it to be modified if it's generated for each test in a fixture).

Or use the domain.factories.DataserviceFactory?

udata_search_service/domain/factories.py Show resolved Hide resolved
@@ -339,6 +359,54 @@ def query_reuses(self, query_text: str, offset: int, page_size: int, filters: di
res = [hit.to_dict(skip_empty=False) for hit in response.hits]
return results_number, res

def query_dataservices(self, query_text: str, offset: int, page_size: int, filters: dict, sort: Optional[str] = None) -> Tuple[int, List[dict]]:
s = SearchableDataservice.search()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could we possible not use single letter variables? Maybe something like search?

choices = sorts + ['-' + k for k in sorts]
if value not in choices:
raise ValueError('Sort parameter is not in the sorts available choices.')
return value

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is this for future sorts? It feels a bit overkill for the moment 🤣

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand, we're just validating the sorts here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorts are just one element, created, right? So this whole code could be shortened to

sorts = ["created", "-created"]
if value not in sorts:
...

No big deal, just looked a bit funny ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it's the same logic in other models, Datasets having more sorts for example


next_url = url_for('api.dataservice_search', q=request_args.q, page=request_args.page + 1,
page_size=request_args.page_size, _external=True)
prev_url = url_for('api.dataservice_search', q=request_args.q, page=request_args.page - 1,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should this be clamped to make sure we never get a page=0 or worse, a negative one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think edge cases are dealth in make_response


results, results_number, total_pages = dataservice_service.search(request_args.dict())

next_url = url_for('api.dataservice_search', q=request_args.q, page=request_args.page + 1,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should this be checked so we return None when request_args.page == total_pages?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think edge cases are dealth in make_response

@maudetes
Copy link
Contributor

maudetes commented Nov 4, 2024

I think we don't want to refactor the code too much since we're targeting a major refacto of this.
Thus I would not modify styling or tests for now if it's okay with you.
See opendatateam/udata#3109

@magopian
Copy link

magopian commented Nov 4, 2024

Completely agree! 🚢

@maudetes maudetes merged commit 34ea069 into main Nov 5, 2024
1 check passed
@maudetes maudetes deleted the feat/dataservices-es branch November 5, 2024 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants