Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes/502 error when act runner is running #33777

Open
NadeemSadiq opened this issue Mar 3, 2025 · 15 comments
Open

Crashes/502 error when act runner is running #33777

NadeemSadiq opened this issue Mar 3, 2025 · 15 comments
Labels
issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail type/bug

Comments

@NadeemSadiq
Copy link

Description

If i have a act runner connected to my gitea, the gitea (or cloudflare, not 100% sure tbh) returns a 502 error. I am unable to pull any packages while act runner is connected and the webpage randomly crashes a few seconds. Below is the full list of all the info i can think of.

Logs from Gitea Monitor/stack page:
gitea-diagnosis-20250303-112721.zip

Logs from console output:
console-log-output.txt

View on resources on kubernetes:
Image
Image
Image

Gitea Version

1.23.4

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

1.23.4

Operating System

Arm64-Kubernetes (k3s)-Linux

How are you running Gitea?

Environment:

  • Kubernetes Version:k3s Kubernetes
  • Arm64 CPU Arch

Version:

  • Gitea-1.23.4
  • Act runner-v0.2.11
  • DockerInDocker-docker:20.10-dind

Deployment YAML for Gita:
deployment-yaml.txt

Database

SQLite

@wxiaoguang
Copy link
Contributor

Is it a public instance?

@NadeemSadiq
Copy link
Author

NadeemSadiq commented Mar 3, 2025

Its technically private but It is exposed. You can view it here:
https://git.areaq.xyz/

You can just refresh the page couple times to see it 502. Don't need to even sign up to see the problem

@wxiaoguang
Copy link
Contributor

From your log, I didn't see any "ERROR" log. And Gitea itself doesn't respond 502 (it only responds 500 if internal error occurs)

Could you try to curl your server directly without cloudflare to see whether there would be still any errors?

@wxiaoguang
Copy link
Contributor

And, is it 100% related to "when act runner is running"? For example: when no runner, no 502, when runner starts, then randomly 502?

@NadeemSadiq
Copy link
Author

Yes, when i stop the git runner, it works fine. Been using it for a long while now and wanted to include act runner now which is why i'm getting this issue. I did get this issue on an older version, upgrade since i assumed it was a version issue which was not the case. I can disable it now if you want to view the site with the 502?

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Mar 3, 2025

Hmm, that's quite strange. If it is surely related to the runner, then maybe we need to figure more details. So let's keep the runner running:

  • Did you see any ERROR logs in Gitea's log when the 502 error occurs? Or other reverse proxies?
  • Could you try to curl your server directly without cloudflare to see whether there would be still any errors?

I can see that even some simple static files are also responded as 502 randomly (for example: https://git.areaq.xyz/assets/img/favicon.png, it only serves a static file, no git/db operation but it could also be 502), I guess it's somewhat related to the HTTP reverse proxy or the Gitea's builtin HTTP server?

@NadeemSadiq
Copy link
Author

Odd, i redirected the webpage in Kubernetes to expose it directly as well and it seems to not be crashing at all now only when talking directly with it. I am a bit confused since i do see events show up in Gitea when it crashes when going through the cloudflare link. I think the issue may be on my ingress possibly but not 100% sure. Will investigate a bit further to make sure the act runner is still connected while i changed the service since this may not be connected correctly (hence the lack of crashing)
.

@wxiaoguang wxiaoguang added the issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail label Mar 3, 2025
@NadeemSadiq
Copy link
Author

Sorry, been a little while. Had other issues that was self inflicted that forced me to reset up k3s cluster. Sadly the issue is still there.

When trying, when i get a 504 error, i do see an event show up when crashing but i don't see the page that is trying to load when it gets a 504:

2025/03/06 21:41:33 ...eb/routing/logger.go:102:func1() [I] router: completed POST /api/actions/runner.v1.RunnerService/FetchTask for 10.42.0.25:59068, 200 OK in 41.8ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm) 2025/03/06 21:41:35 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 10.42.0.12:33454, 200 OK in 150059.1ms @ events/events.go:18(events.Events) 2025/03/06 21:41:35 ...eb/routing/logger.go:102:func1() [I] router: completed POST /api/actions/runner.v1.RunnerService/FetchTask for 10.42.0.25:59068, 200 OK in 42.1ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm)

This happens when i go through the traefik ingress. I looked into the traefik logs, but i get nothing. When stopping the act runner, it doesn't crash any more. Not sure why the act runner would have such an effect.

Any ideas? I can't think of anything else to check on kuberenetes, and the fact i see the even show up during a 503 crash suggest gitea is receiving the message but not sure how to verify this further

@wxiaoguang
Copy link
Contributor

It's highly likely related to traefik ingress or the k3s.

If cloudflare reports 504 but you could connect to your local k3s/gitea, it means the cloudflare is not able to create TCP connection to your ingress's port. There could be various reasons, for example: exhausted system resource (connection number hard-limit), firewall rules, etc.

Still, that's just a guess. And still since only cloudflare could reproduce the problem, I think it's not really related to Gitea

@NadeemSadiq
Copy link
Author

Yeah, the issue could be traefik related but unable to find any errors or anything to show it. The only thing that makes me think it is a gitea issue is when it does 504, there is an event created in gitea logs (not sure how I can access this event though) as seen in my previous post. If it was traefik issue, I would imagine gitea not showing this event log at all. Very odd that this happens only when act runner is running as well.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Mar 7, 2025

The only thing that makes me think it is a gitea issue is when it does 504, there is an event created in gitea logs

I do not think it is related, because these 50x errors happen randomly.

Suppose there are 3 requests to your k3s, then 2 succeed and one fails, you still see 2 logs in Gitea's log.

The key point is that the 50x error happens "randomly". And: Gitea never responds any 50x errors other than 500. Every server failure error responded by Gitea is "500 Internal Server Error".

502 "Bad Gateway" / 504 "Gateway Timeout" errors are from your "reverse proxies".


Very odd that this happens only when act runner is running as well.

There could be various other reasons causing 502/504 problem, unstable network, exhausted system resource (connection number hard-limit), firewall rules, etc.

@NadeemSadiq
Copy link
Author

Hi, I showed an example of the event popping up right after i get the gateway error. Wouldn't this mean it is hitting gitea? Not sure if there is any more/other data i can get to validate if its actually hitting gitea or not.

event_showing_in_gitea_log.mp4

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Mar 10, 2025

No, it doesn't hit Gitea. the /user/events endpoint is for long-polling, but you are accessing /-/admin/xxx. There is no /-/admin/xxx request to Gitea in your screenshot.

Maybe you could try to disable the /user/events endpoint (related: Slow browsing on http2 enabled reverse proxy (apache2), long-polling /user/events blocks other requests #19265)

;; This setting determines how often the db is queried to get the latest notification counts.
;; If the browser client supports EventSource and SharedWorker, a SharedWorker will be used in preference to polling notification. Set to -1 to disable the EventSource
;EVENT_SOURCE_UPDATE_TIME = 10s

@NadeemSadiq
Copy link
Author

NadeemSadiq commented Mar 10, 2025

git_pull_error.mp4

I removed the ingress (traefik) and I am now talking directly to the gitea server. Web pages seem fine and I am unable to see any errors on the page itself.

However, when pulling, i randomly get errors. See the video for an example of when i stop the act runner service and when i start it and how the git pull reacts to that.

@wxiaoguang
Copy link
Contributor

"Failed to connect to ..." means a network error. Not Gitea's error. Gitea didn't receive that request.

Again, suggest a third time: check you network and system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail type/bug
Projects
None yet
Development

No branches or pull requests

2 participants