Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My scrapy program cannot keep pulling data from redis queue #148

Open
Cehae opened this issue Jun 25, 2019 · 1 comment
Open

My scrapy program cannot keep pulling data from redis queue #148

Cehae opened this issue Jun 25, 2019 · 1 comment

Comments

@Cehae
Copy link

Cehae commented Jun 25, 2019

When I open the scrapy program, the program executes normally. But after tens of minutes my program log keet printing:

2019-06-25 17:42:12 [scrapy.extensions.logstats] INFO: Crawled 64 pages (at 0 pages/min), scraped 64 items (at 0 items/min)
2019-06-25 17:43:12 [scrapy.extensions.logstats] INFO: Crawled 64 pages (at 0 pages/min), scraped 64 items (at 0 items/min)
2019-06-25 17:44:12 [scrapy.extensions.logstats] INFO: Crawled 64 pages (at 0 pages/min), scraped 64 items (at 0 items/min)
2019-06-25 17:45:12 [scrapy.extensions.logstats] INFO: Crawled 64 pages (at 0 pages/min), scraped 64 items (at 0 items/min)

But at this point my redis queue still has data, so I can only close the program and restart it. What should I do to solve this problem? Thank you!

@hewm
Copy link

hewm commented Aug 16, 2019

When I open the scrapy program, the program executes normally. But after dozens of minutes, my program recorded the keet print
I found that scrapy went through the request middleware and got the proxy, but no parse was parsed. In other words, it might be scrapy to get an empty url or just not get the url. Just after going to a proxy, I don't know what to do. I added a log to the top of each parsed parse method to print the url of the current request. I found that this log was not returned, and all the requests were broken after DOWNLOADER_MIDDLEWARES and have not been reported. My log level is info.

2019-08-16 14:54:57 [scrapy.extensions.logstats] INFO: Crawled 29795 pages (at 0 pages/min), scraped 28644 items (at 0 items/min)
2019-08-16 14:55:57 [scrapy.extensions.logstats] INFO: Crawled 29795 pages (at 0 pages/min), scraped 28644 items (at 0 items/min)
2019-08-16 14:56:23 [dgk_update_detail] INFO: [proxy] https://3.113.251.65:3128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants