Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Katana JSONL file Issue on raw request field #876

Open
exploit-io opened this issue May 3, 2024 · 3 comments · May be fixed by #918
Open

Katana JSONL file Issue on raw request field #876

exploit-io opened this issue May 3, 2024 · 3 comments · May be fixed by #918
Assignees
Labels
Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@exploit-io
Copy link

Katana JSONL file has issues on saving POST requests, for Example:

{"timestamp":"2024-05-02T16:28:07.690561274Z","request":{"method":"POST","endpoint":"http://testphp.vulnweb.com/secured/newuser.php","body":"signup=signup\u0026uuname=katana\u0026upass=katanaP@assw0rd1\u0026upass2=katanaP@assw0rd1\u0026urname=katana\u0026ucc=katana\u0026uemail=katana\u0026uphone=katana","headers":{"Content-Type":"application/x-www-form-urlencoded"},"tag":"form","attribute":"action","source":"http://testphp.vulnweb.com/signup.php","raw":"GET /secured/newuser.php HTTP/1.1\r\nHost: testphp.vulnweb.com\r\nUser-Agent: Go-http-client/1.1\r\nCookie: PHPSESSIONID=XXXXXXXXX\r\nHost-Header: hostname.tld\r\nX-Api-Key: XXXXX\r\nX-Powered-By: Raider\r\nAccept-Encoding: gzip\r\n\r\n"},"response":{"status_code":200,"headers":{"x_powered_by":"PHP/5.6.40-38+ubuntu20.04.1+deb.sury.org+1","content_encoding":"gzip","server":"nginx/1.19.0","date":"Thu, 02 May 2024 16:28:05 GMT","content_type":"text/html; charset=UTF-8","transfer_encoding":"chunked","connection":"keep-alive"},"body":"\u003chtml\u003e\u003chead\u003e\n\u003ctitle\u003eadd new user\u003c/title\u003e\n\u003cmeta http-equiv=\"Content-Type\" content=\"text/html; charset=iso-8859-1\"\u003e\n\u003clink href=\"style.css\" rel=\"stylesheet\" type=\"text/css\"\u003e\n\u003c/head\u003e\n\u003cbody\u003e\n\u003cdiv id=\"masthead\"\u003e \n  \u003ch1 id=\"siteName\"\u003eACUNETIX ART\u003c/h1\u003e \n\u003c/div\u003e\n\u003cdiv id=\"content\"\u003e\n\t\u003c/div\u003e\n\n\n\u003c/body\u003e\u003c/html\u003e","technologies":["PHP:5.6.40","Nginx:1.19.0","Ubuntu"],"raw":"HTTP/1.1 200 OK\r\nContent-Length: 415\r\nConnection: keep-alive\r\nContent-Encoding: gzip\r\nContent-Type: text/html; charset=UTF-8\r\nDate: Thu, 02 May 2024 16:28:05 GMT\r\nServer: nginx/1.19.0\r\nX-Powered-By: PHP/5.6.40-38+ubuntu20.04.1+deb.sury.org+1\r\n\r\n\u003c!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\"\u003e\r\n\u003chtml\u003e\r\n\u003chead\u003e\r\n\u003ctitle\u003eadd new user\u003c/title\u003e\r\n\u003cmeta http-equiv=\"Content-Type\" content=\"text/html; charset=iso-8859-1\"\u003e\r\n\u003clink href=\"style.css\" rel=\"stylesheet\" type=\"text/css\"\u003e\r\n\u003c/head\u003e\r\n\u003cbody\u003e\r\n\u003cdiv id=\"masthead\"\u003e \r\n  \u003ch1 id=\"siteName\"\u003eACUNETIX ART\u003c/h1\u003e \r\n\u003c/div\u003e\r\n\u003cdiv id=\"content\"\u003e\r\n\t\u003c/div\u003e\r\n\u003c/body\u003e\r\n\u003c/html\u003e\r\n"}}

but it seems OK in GET Requests:

{"timestamp":"2024-05-02T16:28:03.490823562Z","request":{"method":"GET","endpoint":"http://testphp.vulnweb.com/artists.php?artist=1","tag":"a","attribute":"href","source":"http://testphp.vulnweb.com/artists.php","raw":"GET /artists.php?artist=1 HTTP/1.1\r\nHost: testphp.vulnweb.com\r\nUser-Agent: Go-http-client/1.1\r\nCookie: PHPSESSIONID=XXXXXXXXX\r\nHost-Header: hostname.tld\r\nX-Api-Key: XXXXX\r\nX-Powered-By: Raider\r\nAccept-Encoding: gzip\r\n\r\n"},"response":{"status_code":200,"headers":{"server":"nginx/1.19.0","date":"Thu, 02 May 2024 16:28:01 GMT","content_type":"text/html; charset=UTF-8","transfer_encoding":"chunked","connection":"keep-alive","x_powered_by":"PHP/5.6.40-38+ubuntu20.04.1+deb.sury.org+1","content_encoding":"gzip"},"body":"\u003chtml\u003e\u003chead\u003e\u003c/head\u003e\u003cbody\u003eWarning: mysql_connect(): Connection refused in /hj/var/www/database_connect.php on line 2\nWebsite is out of order. Please visit back later. Thank you for understanding.\u003c/body\u003e\u003c/html\u003e","technologies":["PHP:5.6.40","Nginx:1.19.0","Ubuntu"],"raw":"HTTP/1.1 200 OK\r\nContent-Length: 170\r\nConnection: keep-alive\r\nContent-Encoding: gzip\r\nContent-Type: text/html; charset=UTF-8\r\nDate: Thu, 02 May 2024 16:28:01 GMT\r\nServer: nginx/1.19.0\r\nX-Powered-By: PHP/5.6.40-38+ubuntu20.04.1+deb.sury.org+1\r\n\r\n\nWarning: mysql_connect(): Connection refused in /hj/var/www/database_connect.php on line 2\nWebsite is out of order. Please visit back later. Thank you for understanding."}}
@exploit-io exploit-io added the Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label May 3, 2024
@dogancanbakir dogancanbakir self-assigned this May 6, 2024
@alban-stourbe-wmx
Copy link

Hi, there,
This bug was also preventing me from getting the desired behavior.

Moreover, on my side, in addition to the method, there is also the content of the body that is not taken into account in the raw variable.

I investigated the code to find its source.

Firstly, it seems that this bug only affect headless mode

The bug seems to come from the navigateRequest function in the pkg/engine/hybrid/crawl.go and more precisely here
httpreq, err := http.NewRequest(e.Request.Method, URL.String(), strings.NewReader(e.Request.PostData))

Here, the crawler attempts to recreate the navigation request as an http object. To recreate it, the function relies on the object e *proto.FetchRequestPaused.

When testing its behavior, it turns out that regardless of the method and body of the Navigation Request request (the input), the values issued from e.Request.Method and e.Request.PostData do not correlate. They are always equal to:
e.Request.Method = "GET" e.Request.PostData = ""

This causes the initial input request to be incorrectly reconstructed. All requests that are not GET and/or with a body will not be correctly reconstructed in the output.

To correct this bug, I've changed the function to
httpreq, err := http.NewRequest(request.Method, URL.String(), strings.NewReader(request.Body))
I base the reconstruction on the input request. This change allows me to have the right construction in my ouputs.

Being new and having no advanced knowledge of golang . I haven't made any pull requests yet. Hopefully this comment will confirm whether this solution is acceptable.

Have a nice day!

Proof:
{ "timestamp": "2024-05-24T16:45:37.945813+02:00", "request": { "method": "POST", "endpoint": "http://127.0.0.1:4000/contributions", "body": "preTax=0\u0026roth=0\u0026afterTax=0", "headers": { "Content-Type": "application/x-www-form-urlencoded" }, "tag": "form", "attribute": "action", "source": "http://127.0.0.1:4000/contributions", "raw": "POST /contributions HTTP/1.1\r\nHost: 127.0.0.1:4000\r\nUser-Agent: Go-http-client/1.1\r\nContent-Length: 26\r\nContent-Type: application/x-www-form-urlencoded\r\nCookie: XXXX\r\nAccept-Encoding: gzip\r\n\r\npreTax=0\u0026roth=0\u0026afterTax=0" }, "response": { "status_code": 200, "headers": { "etag": "W/\"24de-hS4ySFk8FPabFlMHgOaA7UxITPk\"", "x_powered_by": "Express", "date": "Fri, 24 May 2024 14:45:36 GMT", "content_type": "text/html; charset=utf-8", "content_length": "9438" }, "technologies": [ "Express", "Node.js", "Bootstrap" ], "raw": "HTTP/1.1 200 OK\r\nContent-Length: 9438\r\nContent-Type: text/html; charset=utf-8\r\nDate: Fri, 24 May 2024 14:45:36 GMT\r\nETag: W/\"24de-hS4ySFk8FPabFlMHgOaA7UxITPk\"\r\nX-Powered-By: Express\r\n\r\n\u003c!DOCTYPE html\u003e\n\u003chtml lang=\"en\"\u003e\n\n\u003chead\u003e\n \u003cmeta charset=\"utf-8\"\u003e\n \u003cmeta name=\"viewport\" ....", "forms": [ { "method": "POST", "action": "http://127.0.0.1:4000/contributions", "enctype": "application/x-www-form-urlencoded", "parameters": [ "preTax", "roth", "afterTax" ] } ] } }

@dogancanbakir dogancanbakir linked a pull request Jun 5, 2024 that will close this issue
@ehsandeep ehsandeep linked a pull request Jun 5, 2024 that will close this issue
@alban-stourbe-wmx
Copy link

@dogancanbakir, @ehsandeep Sorry but I made a mistake. Changing this line will solve the problem with the output but it creates another bug.
I'll write a detailed explanation of what's going on and how we should change the function or not.

@alban-stourbe-wmx
Copy link

alban-stourbe-wmx commented Jun 6, 2024

Why the raw request is corrupted

First of all,
The bug with JSON output occurs when the -automatic-fill-form option is enabled with the hybrid crawler.

When a form is detected in a response, the crawler creates a navigation.Request and queues it.
This request is then processed by the navigateRequest function.

For example, let's imagine a form request is created

"request": { "method": "POST", "endpoint": "https://test.com", "body": "pincode=katanaP@assw0rd1\u0026pincode_confirm=katanaP@assw0rd1\u0026submit-pin=Faire+une+demande+d'activation+du+code+pin", }
The raw field is now empty and will be rebuilt during processing of the navigateRequest function.

Firstly, it performs to load the webpage :
page.Navigate(request.URL)
Here, it is equivalent to the GET request
GET test.com
The action is performed independently of the value of the request method passed as input.

After this, the initial request is recreated in the go routine *go pageRouter.Start(func(e proto.FetchRequestPaused) which hijacks the page in search of possible requests sent from the loaded page (important to understand for next...).
The line responsible for this reconstruction is
httpreq, err := http.NewRequest(e.Request.Method, URL.String(), strings.NewReader(e.Request.PostData))
Therefore, as the request is sent via the page.Navigate function, the value of e.Request.Method can only be GET and e.Request.PostData null.
var rawBytesRequest, rawBytesResponse []byte if r, err := retryablehttp.FromRequest(httpreq); err == nil { rawBytesRequest, _ = r.Dump() } ... if matchOriginalURL { request.Raw = string(rawBytesRequest) response = resp }
In this case, the raw value is
request.Raw = "GET https://test.com"
Hence the fact that this can be interpreted as a bug because the initial POST request with a body was not made, but was replaced by a GET, hence the buggy raw value in the JSONL.

Why my change is not working well

To obtain a raw value that matches the query argument in the output file. I proposed the change:
httpreq, err := http.NewRequest(request.Method, URL.String(), strings.NewReader(request.Body))
Based directly on the values of the initial query, the raw field is consistent with the previous information.
We obtain
request.Raw = "POST https://test.com\r\n\r\npincode=katanaP@assw0rd1\u0026pincode_confirm=katanaP@assw0rd1\u0026submit-pin=Faire+une+demande+d'activation+du+code+pin"
However, from a functional point of view, there's always a bug because the response obtained doesn't match the raw value because the request sent remains “GET” without a body, as it is carried out by page.Navigate function.

This line is also very important:
httpreq, err := http.NewRequest(e.Request.Method, URL.String(), strings.NewReader(e.Request.PostData))
because it detects POST or other requests sent from a page loader in a browser.
Example:
Capture d’écran 2024-06-06 à 11 00 19

**So my solution isn't sustainable. **

Room for improvement

The bug is linked to the use of the automatic-form-fill option with the hybrid crawl. All requests are processed dynamically, whereas form requests are static and cannot be sent independently via a browser page.

There are two possibilities in this case:

  1. Do not use this option in this case, only with the standard crawler. This is not the best solution, as I think it is possible to do better.
  2. Filter all requests with the “form” tag to send them via a retttryablehttp and not through page.Navigate. As the request is for a form to be sent, it is not necessary (and not possible) to emulate the request in a browser.

Comments

I hope my answer is clear enough.
I apologize for the loss of time caused by my previous error.
I hope we'll be able to improve this feature.

If you'd like to discuss the matter, or if there's anything I haven't clarified, please don't hesitate to get in touch. Have a nice day!

@dogancanbakir
@ehsandeep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants