Skip to content

Can Zendriver Work in Headless Mode with Cloudflare? #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
afkarxyz opened this issue Jan 11, 2025 · 8 comments
Closed

Can Zendriver Work in Headless Mode with Cloudflare? #35

afkarxyz opened this issue Jan 11, 2025 · 8 comments
Labels
question Further information is requested stale

Comments

@afkarxyz
Copy link

afkarxyz commented Jan 11, 2025

Is it possible to run Zendriver in headless mode for websites with Cloudflare? I've tried headless, but it failed. If I run it without headless, it works fine. Please help me fix my code if headless mode is indeed possible. Thank you.

import asyncio
import zendriver as zd
import re
import random

SPOTIFY_URLS = [
    "https://open.spotify.com/track/2plbrEY59IikOBgBGLjaoe",
    "https://open.spotify.com/track/4wJ5Qq0jBN4ajy7ouZIV1c",
    "https://open.spotify.com/track/6dOtVTDdiauQNBQEDOtlAB",
    "https://open.spotify.com/track/7uoFMmxln0GPXQ0AcCBXRq",
    "https://open.spotify.com/track/2HRqTpkrJO5ggZyyK6NPWz"
]

async def wait_for_element(page, selector, timeout=30000):
    try:
        element = await page.wait_for(selector, timeout=timeout)
        return element
    except asyncio.TimeoutError:
        raise Exception(f"Timeout waiting for element: {selector}")
    except Exception as e:
        raise Exception(f"Error finding element {selector}: {str(e)}")

async def wait_for_token(page, max_attempts=10, check_interval=0.5):
    for _ in range(max_attempts):
        requests = await page.evaluate("window.requests")
        for req in requests:
            if "api.spotifydown.com/download" in req['url']:
                token_match = re.search(r'token=(.+)$', req['url'])
                if token_match:
                    return token_match.group(1)
        await asyncio.sleep(check_interval)
    raise Exception("Token not found within timeout period")

async def fetch_token(url, delay=5):
    browser = await zd.start(headless=False)
    try:
        page = await browser.get("https://spotifydown.com/en")
        
        await page.evaluate("""
            window.requests = [];
            const originalFetch = window.fetch;
            window.fetch = function() {
                return new Promise((resolve, reject) => {
                    originalFetch.apply(this, arguments)
                        .then(response => {
                            window.requests.push({
                                url: response.url,
                                status: response.status,
                                headers: Object.fromEntries(response.headers.entries())
                            });
                            resolve(response);
                        })
                        .catch(reject);
                });
            };
        """)
        
        await asyncio.sleep(delay)
        
        print("Finding input element...")
        input_element = await wait_for_element(page, ".searchInput")
        await input_element.send_keys(url)
        
        print("Clicking submit button...")
        submit_button = await wait_for_element(page, "button.flex.justify-center.items-center.bg-button")
        await submit_button.click()
        
        print("Clicking download button...")
        download_selector = "div.flex.items-center.justify-end button.w-24.sm\\:w-32.mt-2.p-2.cursor-pointer.bg-button.rounded-full.text-gray-100.hover\\:bg-button-active"
        download_button = await wait_for_element(page, download_selector)
        await download_button.click()
        
        print("Waiting for token...")
        token = await wait_for_token(page)
        return token
                
    finally:
        await browser.stop()

async def main():
    try:
        url = random.choice(SPOTIFY_URLS)
        print(f"Using URL: {url}")
        
        token = await fetch_token(url)
        print(f"Token retrieved: {token}")
        return token
        
    except Exception as e:
        print(f"Error: {str(e)}")
        return None

if __name__ == "__main__":
    token = asyncio.run(main())
@ZenulAbidin
Copy link

I use zendriver on many docker hosts to scrape a lot of pages from a website with cloudflare protection, so yes, it should stay undetected in headless.

@afkarxyz
Copy link
Author

I use zendriver on many docker hosts to scrape a lot of pages from a website with cloudflare protection, so yes, it should stay undetected in headless.

Sorry my question is very basic. Does the docker still need chrome installed even though it is headless?
I don't know what the concept of docker is, is it like a virtual machine?

@ZenulAbidin
Copy link

ZenulAbidin commented Jan 15, 2025 via email

@fbtariq
Copy link

fbtariq commented Jan 21, 2025

On Wednesday, January 15th, 2025 at 8:22 AM, afkarxyz @.***> wrote:

I use zendriver on many docker hosts to scrape a lot of pages from a website with cloudflare protection, so yes, it should stay undetected in headless.

Sorry my question is very basic. Does the docker still need chrome installed even though it is headless?
I don't know what the concept of docker is, is it like a virtual machine?


Reply to this email directly, [view it on GitHub](#35 (comment)), or unsubscribe.
You are receiving this because you commented.Message ID: @.***>
Headless just means Chrome without a GUI - so no fancy libraries like X11 or Wayland have to be installed - just Chrome itself along with fewer dependencies. You'd use zendriver to control a headless chrome in any case.

Docker is basically a VM hypervisor except the 'VMs' (actually containers) use much less RAM, CPU, disk etc. than a virtual machine would use.

Could you share the Dockerfile used to run zendriver in docker in headless mode?

@stephanlensky stephanlensky added the question Further information is requested label Jan 22, 2025
@stephanlensky
Copy link
Owner

stephanlensky commented Jan 22, 2025

I have an example repository here zendriver-docker which shows how to use Docker & Zendriver with both headful (Wayland) and headless Chrome (just add headless=True to the example code, it should work just the same).

There is a lot of added complexity in the Docker image which is required in order to run Chrome in headful mode, though I find it quite helpful since it allows you to VNC into the container and actually interact with the running browser.

If you only want to run in headless mode, the image could likely be substantially simplified. I'd be happy to accept a PR in that repo to add a Dockerfile for a simplified headless image if anyone is interested 🙂

@chompie
Copy link

chompie commented Feb 18, 2025

I have an example repository here zendriver-docker which shows how to use Docker & Zendriver with both headful (Wayland) and headless Chrome (just add headless=True to the example code, it should work just the same).

There is a lot of added complexity in the Docker image which is required in order to run Chrome in headful mode, though I find it quite helpful since it allows you to VNC into the container and actually interact with the running browser.

If you only want to run in headless mode, the image could likely be substantially simplified. I'd be happy to accept a PR in that repo to add a Dockerfile for a simplified headless image if anyone is interested 🙂

An image you could (also) use for AWS lambda would be great, e.g. with a default config that just does javascript rendering and returns the page content, the resulting URL (after possible redirects) and a http status code.

Copy link
Contributor

This issue has been marked stale because it has been open for 30 days with no activity. If there is no activity within 7 days, it will be automatically closed.

@github-actions github-actions bot added the stale label Mar 23, 2025
Copy link
Contributor

This issue was automatically closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested stale
Projects
None yet
Development

No branches or pull requests

5 participants