-
-
Notifications
You must be signed in to change notification settings - Fork 375
Scrape tv.yandex.ru using Puppeteer. #2851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Toha <[email protected]>
Signed-off-by: Toha <[email protected]>
freearhey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am getting an error message:
npm run grab --- --site=tv.yandex.ru
> grab
> tsx scripts/commands/epg/grab.ts --site=tv.yandex.ru
◐ starting... 5:54:37 PM
ℹ config: 5:54:37 PM
output: guide.xml
maxConnections: 1
gzip: false
curl: false
site: tv.yandex.ru
ℹ loading channels... 5:54:37 PM
ℹ found 91 channel(s) 5:54:37 PM
ℹ run: 5:54:37 PM
ℹ [1/182] tv.yandex.ru (ru) - 22 - Aug 31, 2025 (0 programs) 5:54:43 PM
ℹ ERR: Failed to launch the browser process! 5:54:43 PM
Reason: image not found.|
It seems Puppeteer can't find Chrome binary. On non Windows systems, Puppeteer v24 which Chrome 139 requires some dependencies such as |
Resolves conflict
Changed line endings to CRLF
|
I still couldn't run the script even after specifying the path to Chrome via executablePath. The process just freezes. @BellezaEmporium @PopeyeTheSai10r @CasperMcFadden95 Am I the only one experiencing this? |
|
I'm going to check what's going on, and if we can actually switch that out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Puppeteer works for me, but maybe make it headless, to use less memory and potentially gain some time.
EDIT : it was headless, my bad. The runtime is consequent (almost 6/7 seconds per channel, per date).
Same for me. It won't start it loads the config then get stuck npm run grab --- --site=tv.yandex.ru
> grab
> tsx scripts/commands/epg/grab.ts --site=tv.yandex.ru
o starting... 11:34:19 AM
i config: 11:34:19 AM
output: guide.xml
maxConnections: 1
gzip: false
curl: false
site: tv.yandex.ru
i loading channels... 11:34:19 AM
i found 91 channel(s) 11:34:19 AM
i run: 11:34:19 AM |
# Conflicts: # package-lock.json # package.json # scripts/commands/channels/parse.ts Signed-off-by: Toha <[email protected]>
Signed-off-by: Toha <[email protected]>
Signed-off-by: Toha <[email protected]>
What OS do you use? |
Windows |
|
What's the output of |
npx puppeteer browsers list |
|
I can confirm it sounds difficult without Puppeteer to get through Yandex's TV page. |
By using Puppeter, it is now completely remove cookies requirements.
This fix #2803.