-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need help with python -m ichrome.web #140
Comments
|
Thanks @ClericPy ,it open the browser in the page, when the browser stops loading, loads the recaptcha but It looks that the response that returns me its without recaptcha url. I tried this: from bs4 import BeautifulSoup
from torequests import tPool
from inspect import getsource
req = tPool()
async def tab_callback(task, tab, data, timeout):
await tab.wait_loading(5000)
await tab.screenshot(save_path='./screenshot.png')
return await tab.html
json = {
'tab_callback': getsource(tab_callback),
"timeout": 5000,
"incognito_args": {
"url": "https://oficinajudicialvirtual.pjud.cl/home/index.php",
"proxyServer": "http://37.19.220.129:8443"
}
}
response = req.post('http://127.0.0.1:8080/chrome/do',json=json)
soup = BeautifulSoup(response.text, 'html.parser')
recaptcha_url = soup.select('iframe[title="reCAPTCHA"]')[0]["src"] |
what did you see use async def tab_callback(task, tab, data, timeout):
await asyncio.sleep(10000)
return await tab.html to check the HTML in real chrome |
@ClericPy can you implement one day an API request like this and pass a proxy as a parameter in the payload to the API call? It's better like this because in this way, async/await it's removed import requests
from bs4 import BeautifulSoup
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'Accept-Language': 'es-ES,es;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36',
'sec-ch-ua': '"Google Chrome";v="117", "Not;A=Brand";v="8", "Chromium";v="117"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"macOS"',
}
params = (
('url', "https://oficinajudicialvirtual.pjud.cl/home/index.php"),
)
data = {
"proxyServer": "http://37.19.220.129:8443"
}
response = requests.get('http://127.0.0.1:8080/chrome/preview', headers=headers, params=params, data = data)
soup = BeautifulSoup(response.text, 'html.parser')
recaptcha_url = soup.select('iframe[title="reCAPTCHA"]')[0]["src"] |
The headers didn't be used by ichrome yet |
If i launch a browser as a service:
python -m ichrome.web
Then
I have the recaptcha url
But If I do it like this:
I'm not having the fully load soup, I guess it could be some security measure of the origin website im scraping.
Any help?
The text was updated successfully, but these errors were encountered: