Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests sometimes fail/timeout in github actions #210

Closed
dsavchenko opened this issue Nov 13, 2024 · 8 comments · Fixed by #216
Closed

Tests sometimes fail/timeout in github actions #210

dsavchenko opened this issue Nov 13, 2024 · 8 comments · Fixed by #216
Assignees

Comments

@dsavchenko
Copy link
Member

A volatile issue, but annoying. It started to arise very often

Example

Any ideas how to debug it?

@volodymyrss
Copy link
Member

First, I propose to add log by default, so we can see what fails. Looks like on timeout pytest does not dump the log.

@dsavchenko
Copy link
Member Author

One of the errors that were dumped was this one

@dsavchenko
Copy link
Member Author

I managed to reproduce one of the failures locally: test_service_async_repo fails, but not constantly, but sometimes

This is not the only test I saw failing in actions, but the one that it hangs on (or after?) the most often, I think.

@volodymyrss
Copy link
Member

Timeout seems to happen here, at least sometimes:

2024-11-13T12:20:14.5346885Z 2024-11-13 12:20:14,534 - papermill - INFO - Ending Cell 11-----------------------------------------
2024-11-13T12:20:14.5401029Z 2024-11-13 12:20:14,539 - papermill - INFO - Executing Cell 12--------------------------------------
2024-11-13T12:20:14.5828225Z 2024-11-13 12:20:14,582 - papermill - INFO - glueing file spec.png
2024-11-13T12:20:14.5830444Z 
2024-11-13T12:20:14.5961585Z 2024-11-13 12:20:14,595 - papermill - INFO - glueing file energies.fits
2024-11-13T12:20:14.5963341Z 
2024-11-13T12:20:14.6061938Z 2024-11-13 12:20:14,605 - papermill - INFO - Ending Cell 12-----------------------------------------
2024-11-13T12:20:15.0683717Z 2024-11-13 12:20:15,067 - nb2workflow.service - INFO - exceptions: []
2024-11-13T12:20:15.0981749Z 2024-11-13 12:20:15,097 - nb2workflow.service - INFO - completed, output length 6
2024-11-13T12:20:15.0991097Z 2024-11-13 12:20:15,098 - nb2workflow.service - INFO - updating key d8c70ed8368b4ef2686bb67fa9d8e5c97799b2de2f09dcb88338ef46
2024-11-13T12:20:15.0995006Z 2024-11-13 12:20:15,099 - nb2workflow.service - INFO - will perform callback: file://callback.json
2024-11-13T12:20:15.0999129Z 2024-11-13 12:20:15,099 - nb2workflow.service - INFO - stored callback in a file file://callback.json
2024-11-13T18:09:44.1238716Z ##[error]The operation was canceled.
2024-11-13T18:09:44.1315815Z Post job cleanup.
2024-11-13T18:09:44.2222897Z [command]/usr/bin/git version
2024-11-13T18:09:44.2262497Z git version 2.47.0
2024-11-13T18:09:44.2306116Z Temporarily overriding HOME='/home/runner/work/_temp/38a41250-7d7e-417e-b3b0-b8e72c9e1aa6' before making global git config changes
2024-11-13T18:09:44.2307283Z Adding repository directory to the temporary g

@volodymyrss
Copy link
Member

I managed to reproduce one of the failures locally: test_service_async_repo fails, but not constantly, but sometimes

This is not the only test I saw failing in actions, but the one that it hangs on (or after?) the most often, I think.

I see this too, and it happens in a variety of ways. But the cause I see is always an anomaly during sparql query: dict changing in iteration or something else.

Dumping the graph into turtle and querying it directly does not seem to show any problems. I suspect some issue in pyparsing/rdflib dealing in-memory object.

Is this what you see too?

@volodymyrss
Copy link
Member

If this is, as it seems, an issue of rdflib and pyparsing, it has broader impact than just our project.
We ourselves also use rdflib in ESG and solidipes.

@dsavchenko
Copy link
Member Author

The pyparsing, hence rdflib is not thread safe. It's a known fact. I created a related issue for oda_hub some time ago.
Meanwhile, I tried to mitigate it somehow by using ontology helper only where it should be safe. But I don't remember all the details.

Maybe, the async test doesn't completely mimic how it works in real world?

@volodymyrss
Copy link
Member

The pyparsing, hence rdflib is not thread safe. It's a known fact. I created a related issue for oda_hub some time ago. Meanwhile, I tried to mitigate it somehow by using ontology helper only where it should be safe. But I don't remember all the details.

Maybe, the async test doesn't completely mimic how it works in real world?

Ah ok. In what way would it not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants