You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used the GitHub search to find a similar issue and didn't find it.
I have verified that this issue is not related to the underlying library hhyrsev/recipe-scrapers by 1) checking
the debugger and data is returned, 2)
verifying that there are errors in the log related to application level code, or 3) verified that the site provides recipe data, or is otherwise supported by hhyrsev/recipe-scrapers
INFO 2024-11-18T15:47:33 - HTTP Request: GET https://theloopywhisk.com/2024/06/29/gluten-free-burger-buns/ "HTTP/1.1 403 Forbidden"
INFO 2024-11-18T15:47:34 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO 2024-11-18T15:47:34 - [xx.xx.xx.xx:0] 400 Bad Request "POST /api/recipes/create/url HTTP/1.1"
INFO 2024-11-18T15:47:58 - [127.0.0.1:43332] 200 OK "GET /api/app/about HTTP/1.1"
It seems like the site is checking user-agent and blocking based on that.
First Check
I used the GitHub search to find a similar issue and didn't find it.
I have verified that this issue is not related to the underlying library
hhyrsev/recipe-scrapers by 1) checking
the debugger and data is returned, 2)
verifying that there are errors in the log related to application level code, or
3) verified that the site provides recipe data, or is otherwise supported by
hhyrsev/recipe-scrapers
This issue can be replicated on the demo site (https://demo.mealie.io/)
Please provide 1-5 example URLs that are having errors
https://theloopywhisk.com/2024/06/29/gluten-free-burger-buns/
Please provide your logs for the Mealie container
docker logs <container-id> > mealie.logs
theloopywhisk.com was added as a supported site in hhursev/recipe-scrapers#1220
INFO 2024-11-18T15:47:33 - HTTP Request: GET https://theloopywhisk.com/2024/06/29/gluten-free-burger-buns/ "HTTP/1.1 403 Forbidden"
INFO 2024-11-18T15:47:34 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO 2024-11-18T15:47:34 - [xx.xx.xx.xx:0] 400 Bad Request "POST /api/recipes/create/url HTTP/1.1"
INFO 2024-11-18T15:47:58 - [127.0.0.1:43332] 200 OK "GET /api/app/about HTTP/1.1"
It seems like the site is checking user-agent and blocking based on that.
curl https://theloopywhisk.com/2024/06/29/gluten-free-burger-buns/ results in "Sorry, you have been blocked" - but curl --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15" https://theloopywhisk.com/2024/06/29/gluten-free-burger-buns/ returns the complete recipe
The site seems to be behind cloudflare, so most likely they drop connections from specific user agents
Should support for using a non-standard user-agent for scraping be requested here, or in the upstream scraping library?
Deployment
Docker (Linux)
The text was updated successfully, but these errors were encountered: