-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected termination of scrapy-playwrigth jobs; caused by node:events:496 #316
Comments
Please provide a minimal reproducible example. Also, additional information about your environment (e.g. |
HI @elacuesta thank you for getting back to me. Regarding the environment, I'm running Ubuntu 22.04.1 LTS on the VPS and Unfortunately I don't know how to provide a MRE, giving that the issue is intermittent and I can't pin point the cause. I can provide the spider (which is pretty standard: So far, there isn't a specific page that this seems to happen, to reproduce I just let it run until the process unexpectedly dies (from minutes to hours). BTW I mentioned the event loop because it seems to be the cause of the problem for that particular issue linked, of course, it may not be related at all to this issue. |
For some time I've suspected the request & response loggers might be causing some trouble by keeping references to the requests and responses longer than strictly necessary. This is by no means a tested hypothesis, but at this point I don't have any other explanation about this specific subject. Would you be able to try the code from the disable-request-response-logger branch (5de5a52) and set |
Thank you @elacuesta, I've executed three jobs so far with the branch you linked, a quick summary of my findings are:
For more details:
|
I've been debugging this problem for a while, it's intermittent making it harder to reproduce.
When running some jobs with
scrapy-playwright
the jobs get's abruptly terminated, if you observe the log of the job, it doesn't even acknowledges the termination, as it would in a SIGTERM case. The process apparently gets killed.As an example, a simple spider (with
scrapy-playwright
) for scraping webstaurant.com, here is how the log terminates (literally the last 3 lines)I first noticed the problem when running the jobs with
scrapyd
, and here is what scrapyd logs when the problem happens:This is just for extra data, the problem is unrelated to
scrapyd
, since it's reproducible without it.In all occurrences the error that seems to be the cause is a node error:
PLAYWRIGHT_MAX_PAGES_PER_CONTEXT
andPLAYWRIGHT_MAX_CONTEXTS
all the way to 1 had no effectFinally, I found two issues in
python-playwright
that bear some resemblance, the first one appears logs the same exception and is caused by the handling of the event loop.microsoft/playwright-python#2275
microsoft/playwright-python#2454
The text was updated successfully, but these errors were encountered: