[SCRAPER] - Scraping recipe from theloopywhisk.com fails #4575

janchrillesen · 2024-11-18T15:08:50Z

First Check

I used the GitHub search to find a similar issue and didn't find it.
I have verified that this issue is not related to the underlying library
hhyrsev/recipe-scrapers by 1) checking
the debugger and data is returned, 2)
verifying that there are errors in the log related to application level code, or
3) verified that the site provides recipe data, or is otherwise supported by
hhyrsev/recipe-scrapers
This issue can be replicated on the demo site (https://demo.mealie.io/)

Please provide 1-5 example URLs that are having errors

https://theloopywhisk.com/2024/06/29/gluten-free-burger-buns/

Please provide your logs for the Mealie container `docker logs <container-id> > mealie.logs`

theloopywhisk.com was added as a supported site in hhursev/recipe-scrapers#1220

INFO 2024-11-18T15:47:33 - HTTP Request: GET https://theloopywhisk.com/2024/06/29/gluten-free-burger-buns/ "HTTP/1.1 403 Forbidden"
INFO 2024-11-18T15:47:34 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO 2024-11-18T15:47:34 - [xx.xx.xx.xx:0] 400 Bad Request "POST /api/recipes/create/url HTTP/1.1"
INFO 2024-11-18T15:47:58 - [127.0.0.1:43332] 200 OK "GET /api/app/about HTTP/1.1"

It seems like the site is checking user-agent and blocking based on that.

curl https://theloopywhisk.com/2024/06/29/gluten-free-burger-buns/ results in "Sorry, you have been blocked" - but curl --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15" https://theloopywhisk.com/2024/06/29/gluten-free-burger-buns/ returns the complete recipe

The site seems to be behind cloudflare, so most likely they drop connections from specific user agents

Should support for using a non-standard user-agent for scraping be requested here, or in the upstream scraping library?

Deployment

Docker (Linux)

The text was updated successfully, but these errors were encountered:

janchrillesen added bug Something isn't working scraper triage labels Nov 18, 2024

outlying mentioned this issue Nov 23, 2024

[SCRAPER] - (some?) www.kwestiasmaku.com recipes are not working with Mealie #4600

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SCRAPER] - Scraping recipe from theloopywhisk.com fails #4575

[SCRAPER] - Scraping recipe from theloopywhisk.com fails #4575

janchrillesen commented Nov 18, 2024

[SCRAPER] - Scraping recipe from theloopywhisk.com fails #4575

[SCRAPER] - Scraping recipe from theloopywhisk.com fails #4575

Comments

janchrillesen commented Nov 18, 2024

First Check

Please provide 1-5 example URLs that are having errors

Please provide your logs for the Mealie container docker logs <container-id> > mealie.logs

Deployment

Please provide your logs for the Mealie container `docker logs <container-id> > mealie.logs`