-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CSIT-1947] Rare VPP crash in nat avf tests #4029
Comments
... and so is the clib_dlist_remove symptom [7]. |
The nat44_ed_in2out_fast_path_node_fn_inline symptom is still present [6] in rls2410. Ticket |
1C2T failure is also possible, seen in soak [5]. |
Also seen on non-small scale and only 2C. Core [4] points to clib_dlist_remove called by nat44_session_update_lru (slow path). I still assume this is all just one issue, but somehow corrupting NAT state, so crash does not happen in single place. |
In rls2406 I see this happening also in TCP (small scale CPS AVF 4c), I assume it is the same issue. Core [3] points to nat44_ed_in2out_fast_path_node_fn_inline. |
Seems hard to reproduce in verify jobs. So far I got one [2] crash with debug image, but the mechanism is not clear to me yet. |
I do not see this crash in rls2502 results, but still occasionally happens [8] in periodic jobs (without core). |
Description
So far seen only in small scale cps tests, originally UDP and 4C only. [0] [1]
Probably not a duplicate of CSIT-1901, although 4c and avf, that issue needs high traffic.
Also improbable to be a duplicate of CSIT-1937, although udp, that issue appears on different nic+driver and only causes small packet drop, not a crash.
RC1 testing is in progress, so I will try to get core dumps later.
[0] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-spr/45/log.html.gz#s1-s1-s1-s2-s8-t3-k3-k5-k1-k1
[1] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-spr/45/log.html.gz#s1-s1-s1-s2-s22-t3-k3-k5-k1-k1
Assignee
Unassigned
Reporter
Vratko Polak
Comments
[7] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2410-2n-icx/51/log.html.gz#s1-s1-s1-s2-s4-t1-k3-k4-k1
The nat44_ed_in2out_fast_path_node_fn_inline symptom is still present [6] in rls2410. Ticket
VPP-2117may describe the same underlying cause in as a different symptom.[6] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2410-2n-icx/35/log.html.gz#s1-s1-s1-s2-s10-t3-k3-k4-k1
[5] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-2n-clx/94/log.html.gz#s1-s1-s1-s2-s5-t1-k3-k4
Also seen on non-small scale and only 2C. Core [4] points to clib_dlist_remove called by nat44_session_update_lru (slow path). I still assume this is all just one issue, but somehow corrupting NAT state, so crash does not happen in single place.
Still happening only rarely, most iterative runs have no failure.
[4] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-2n-clx/79/log.html.gz#s1-s1-s1-s2-s37-t2-k3-k4-k1
[3] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-2n-clx/72/log.html.gz#s1-s1-s1-s2-s20-t3-k3-k4-k1
[2] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/vpp-csit-verify-perf-master-ubuntu2204-x86_64-2n-spr/37/csit_current/0/log.html.gz#s1-s1-s1-s1-s1-t1-k3-k4-k1
Original issue: https://jira.fd.io/browse/CSIT-1947
The text was updated successfully, but these errors were encountered: