-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CSIT-1963] 3n-icxd: Various symptoms pointing to hardware (cable/nic/driver) issues #4044
Comments
Maybe fixed in infra, need more runs to confirm both testbeds work reliably, not just sometimes. For rls2410 iterative results, the failures are still very frequent, and can happen at any point, for example here [5] the test failed only in third latency trial. |
DPDK tests are also affected in rls2410. One more symptom [4] from testpmd test: trex.common.trex_exceptions.TRexError: *** [RPC] - Failed to get server response from tcp://localhost:4501 |
> the failing tests actually outnumber the passing ones True for rls2406 iterative tests, but trending shows [3] NDRPDR tests mostly pass, so maybe the frequency depends on "idle time" before the run? Soak tests (almost) never pass, so test duration and trial length may also be factors. [3] https://csit.fd.io/stats/#eNrLyk-yTS7OLNEtKyjQLUgtStPNSykqSCnSLU9Nzc6p1M1NLC5JLdI1ztPNTK5IAQCbihFa |
Description
Sometimes TRex says [0] "trex.common.trex_exceptions.TRexError: action requires at least one port", or even [1] "trex.common.trex_exceptions.TRexError: Port 1 : *** please acquire the port before modifying port state".
Sometimes VPP says [2] "Interfaces still not in link-up state: ['TwentyFiveGigabitEthernet89/0/2']".
There may be other symptoms, depending on test type and exact time the issue happens.
In past, we have seen similar issues on other testbed, that got fixed by connecting the cable properly, but even then the failures were not very frequent.
On 3n-icx (both of them), the failing tests actually outnumber the passing ones. As the frequency is similar, maybe the issue is not in nic/cable but in driver?
Either way, whis does not look like a VPP (or CSIT or TRex) issue.
[0] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-3n-icxd/58/log.html.gz#s1-s1-s1-s1-s18-t2-k2-k11-k9-k13-k9-k2
[1] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-3n-icxd/58/log.html.gz#s1-s1-s1-s1-s18-t1-k2-k11-k9-k13-k7-k2
[2] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-3n-icxd/58/log.html.gz#s1-s1-s1-s1-s18-t6-k2-k6-k3-k2-k1
Assignee
Vratko Polak
Reporter
Vratko Polak
Comments
For rls2410 iterative results, the failures are still very frequent, and can happen at any point, for example here [5] the test failed only in third latency trial.
[5] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2410-3n-icxd/33/log.html.gz#s1-s1-s1-s1-s18-t5-k2-k11-k28-k11
trex.common.trex_exceptions.TRexError: *** [RPC] - Failed to get server response from tcp://localhost:4501
[4] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-iterative-2410-3n-icxd/4/log.html.gz#s1-s1-s1-s1-t3-k2-k5-k14
True for rls2406 iterative tests, but trending shows [3] NDRPDR tests mostly pass, so maybe the frequency depends on "idle time" before the run? Soak tests (almost) never pass, so test duration and trial length may also be factors.
[3] https://csit.fd.io/stats/#eNrLyk-yTS7OLNEtKyjQLUgtStPNSykqSCnSLU9Nzc6p1M1NLC5JLdI1ztPNTK5IAQCbihFa
Original issue: https://jira.fd.io/browse/CSIT-1963
The text was updated successfully, but these errors were encountered: