Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sites with HTTP/2 do not work #17

Open
trupf opened this issue May 14, 2022 · 5 comments
Open

Sites with HTTP/2 do not work #17

trupf opened this issue May 14, 2022 · 5 comments
Labels

Comments

@trupf
Copy link

trupf commented May 14, 2022

Many sites are not working, but some do

working examples: https://taz.de/ ; https://www.world.de

not working: https://www.spiegel.de; https://www.google.com; https://www.gmx.net
The error is always "Proxy: 1014 - No Content" although the pages work without the proxy. here is the output for one page:

Dumping knHTTP object:
knHTTP request type: GET
knHTTP request: https://www.gmx.net(19 bytes)
knHTTP content type: text/html;charset=UTF-8
knHTTP is HTTPS mode: True
knHTTP form post(urlencoded): (0 bytes)
knHTTP cookies: AEC=AakniGOXlCVM--XdYnzddkFexNtHKXmY-3b6TL695vnB51XgGcFY3Pyk2-0;_pk_id_1_c2a3=016d031118e2de39.1652542765.;_sp_enable_dfp_personalized_ads=false(144 bytes)
knHTTP headers:
HTTP/2 200
date: Sat, 14 May 2022 16:12:54 GMT
server: Apache
strict-transport-security: max-age=31536000; includeSubdomains; preload
expires: Thu, 01 Jan 1970 00:00:00 GMT
pragma: no-cache
cache-control: no-cache, no-store
x-frame-options: deny
content-security-policy: frame-ancestors 'none'
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
referrer-policy: origin-when-cross-origin
feature-policy: microphone 'none'; camera 'none'
vary: Accept-Encoding
content-type: text/html;charset=UTF-8
set-cookie: clktype=; Max-Age=0; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/; Secure; HttpOnly
set-cookie: ui_cid=OPTOUT; Max-Age=31536000; Expires=Sun, 14-May-2023 16:12:54 GMT; Path=/; Secure; HttpOnly
set-cookie: SSLB=.0; domain=.gmx.net ;path=/
p3p: CP="{}"
set-cookie: TS72888fff027=08105a8158ab2000544d44c81066627bd9933497aaef713437369d880345fb683dfe2dacac64f894082efae5051130009a977db83968db88760b90916960d1448fc4c5cf6e9abb99031426363478499608fbcb8618dc2e14c94d237fa75262fa;Path=/(1018 bytes)
knHTTP parsed headers:
UNKNOWN: (Truncated)
DATE: Sat, 14 May 2022 16:12:54 GMT
EXPIRES: Thu, 01 Jan 1970 00:00:00 GMT
CACHE_CONTROL: no-cache, no-store
CONTENT_TYPE: text/html;charset=UTF-8
HTTP_COOKIES: (Truncated)
knHTTP return body length: 279116 bytes
knHTTP return body(html encoded): Show/Hide

@jabbany
Copy link
Owner

jabbany commented May 17, 2022

Ohhh, from a brief look, it looks like the HTTP/2 200 is the problem. When this code was built http/2 did not exist yet so some parts of the code assumed that it would only ever get to see http/1.0 and http/1.1 responses. If the server speaks http/2, we are unable to understand the response code which results in a 1014 error even though we are able to make the request.

To fix this (maybe), consider adding the following code:

curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);

before every line that contains curl_exec($ch);in includes/module_http.php and see if it works. (If it does, please tell me and I'll update the code)

@jabbany jabbany added the Bugs label May 17, 2022
@jabbany jabbany changed the title Many sites not working Sites with HTTP/2 do not work May 17, 2022
@trupf
Copy link
Author

trupf commented May 17, 2022

The modification works - at least for the first two mentioned pages. The third one does not show the error 1014 any more but the page is completely empty (no content shown)

As far as I understand your code, the modification makes the server to answer with a http/1.1 response. Wouldn't it be better to update the code to understand http/2 instead?

@jabbany
Copy link
Owner

jabbany commented May 17, 2022

Wouldn't it be better to update the code to understand http/2 instead?

For the purposes of the proxy, it doesn't seem like HTTP/2 offers any particular benefits (protocol wise it's largely the same, with the main addition being TCP multiplexing which the proxy can't take advantage of anyways---instead you'd configure the server hosting the proxy to negotiate HTTP/2 with the browser).

That said, it also shouldn't be too hard to make the proxy HTTP/2 tolerant by just making HTTP/2 a valid response type. Will need to do a bit more research on what other complications there may be for the proxy and will keep this bug open in the meantime.

The third one does not show the error 1014 any more but the page is completely empty (no content shown)

Judging from the website, this is probably because the site actually uses JavaScript to load most of its functionality and JS support is very experimental at the moment (and possibly forever). KnProxy right now uses a combination of rewriting JS files + adding a "translation" layer for XHRs by injecting hooks into the JS. It's the best I could come up with for what is essentially a rewriting proxy (not a proper forwarding proxy). Also KnProxy was built before HTML5 was a thing so some of the HTML5 stuff like srcset are largely not supported yet either.

I'm quite surprised there is still interest in this code... As can be seen in the repo I last touched this like 5 years ago (and started the project like a decade ago as a high school student).

Some other things on the list of improvements that may or may not ever happen:

  • Move to PHP 7 or 8. This codebase was written for PHP 5...
  • Proper HTML parser. Maybe involving some 3rd party libraries instead of the current regex hack.
  • Support all the new HTML5 stuff.
  • Better JS translation layer (i.e. hooks on fetch() in addition to XHR) to make more dynamic Ajax-based sites work. (Will likely never be perfect)
  • Better cookie handling. Cookies are dealt with in an extremely insecure way right now.

@trupf
Copy link
Author

trupf commented May 17, 2022

Just for the info, I'm using php7.4 without error so doe obviously work already and there are other proxy servers out there (best results I get with phpr0xi or censordodge), so I could also use an other one, if you think updating the code isn't worth it.

@jabbany
Copy link
Owner

jabbany commented May 17, 2022

The code is still maintained, just updates to it could be slow. Feel free to keep using the previous fix.

Also, this code is actually intentionally meant to be a complementary implementation to the phproxy family (which is also quite old now). The phproxy family of implementations use PHP sockets for fetching remote resources, which back in the day some web hosts banned due to security issues (some exploits using sockets to edit/view local passwords etc. on shared PHP webhosts). That's why this code mainly focuses on implementing everything through the cURL extension, which was quite available on free webhosts back in the day despite being optional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants