Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CSIT-1963] 3n-icxd: Various symptoms pointing to hardware (cable/nic/driver) issues #4044

Open
vvalderrv opened this issue Feb 4, 2025 · 4 comments

Comments

@vvalderrv
Copy link
Contributor

Description

Sometimes TRex says [0] "trex.common.trex_exceptions.TRexError: action requires at least one port", or even [1] "trex.common.trex_exceptions.TRexError: Port 1 : *** please acquire the port before modifying port state".

Sometimes VPP says [2] "Interfaces still not in link-up state: ['TwentyFiveGigabitEthernet89/0/2']".

There may be other symptoms, depending on test type and exact time the issue happens.

In past, we have seen similar issues on other testbed, that got fixed by connecting the cable properly, but even then the failures were not very frequent.

On 3n-icx (both of them), the failing tests actually outnumber the passing ones. As the frequency is similar, maybe the issue is not in nic/cable but in driver?

Either way, whis does not look like a VPP (or CSIT or TRex) issue.

[0] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-3n-icxd/58/log.html.gz#s1-s1-s1-s1-s18-t2-k2-k11-k9-k13-k9-k2

[1] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-3n-icxd/58/log.html.gz#s1-s1-s1-s1-s18-t1-k2-k11-k9-k13-k7-k2

[2] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-3n-icxd/58/log.html.gz#s1-s1-s1-s1-s18-t6-k2-k6-k3-k2-k1

Assignee

Vratko Polak

Reporter

Vratko Polak

Comments

  • vrpolak (Fri, 15 Nov 2024 09:56:22 +0000): Maybe fixed in infra, need more runs to confirm both testbeds work reliably, not just sometimes.

For rls2410 iterative results, the failures are still very frequent, and can happen at any point, for example here [5] the test failed only in third latency trial.

[5] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2410-3n-icxd/33/log.html.gz#s1-s1-s1-s1-s18-t5-k2-k11-k28-k11

  • vrpolak (Wed, 6 Nov 2024 14:21:42 +0000): DPDK tests are also affected in rls2410. One more symptom [4] from testpmd test:

trex.common.trex_exceptions.TRexError: *** [RPC] - Failed to get server response from tcp://localhost:4501

[4] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-iterative-2410-3n-icxd/4/log.html.gz#s1-s1-s1-s1-t3-k2-k5-k14

  • vrpolak (Fri, 26 Jul 2024 14:27:52 +0000): > the failing tests actually outnumber the passing ones

True for rls2406 iterative tests, but trending shows [3] NDRPDR tests mostly pass, so maybe the frequency depends on "idle time" before the run? Soak tests (almost) never pass, so test duration and trial length may also be factors.

[3] https://csit.fd.io/stats/#eNrLyk-yTS7OLNEtKyjQLUgtStPNSykqSCnSLU9Nzc6p1M1NLC5JLdI1ztPNTK5IAQCbihFa

Original issue: https://jira.fd.io/browse/CSIT-1963

@vvalderrv
Copy link
Contributor Author

Maybe fixed in infra, need more runs to confirm both testbeds work reliably, not just sometimes.

For rls2410 iterative results, the failures are still very frequent, and can happen at any point, for example here [5] the test failed only in third latency trial.

[5] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2410-3n-icxd/33/log.html.gz#s1-s1-s1-s1-s18-t5-k2-k11-k28-k11

@vvalderrv
Copy link
Contributor Author

DPDK tests are also affected in rls2410. One more symptom [4] from testpmd test:

trex.common.trex_exceptions.TRexError: *** [RPC] - Failed to get server response from tcp://localhost:4501

[4] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-iterative-2410-3n-icxd/4/log.html.gz#s1-s1-s1-s1-t3-k2-k5-k14

@vvalderrv
Copy link
Contributor Author

> the failing tests actually outnumber the passing ones

True for rls2406 iterative tests, but trending shows [3] NDRPDR tests mostly pass, so maybe the frequency depends on "idle time" before the run? Soak tests (almost) never pass, so test duration and trial length may also be factors.

[3] https://csit.fd.io/stats/#eNrLyk-yTS7OLNEtKyjQLUgtStPNSykqSCnSLU9Nzc6p1M1NLC5JLdI1ztPNTK5IAQCbihFa

@vrpolakatcisco
Copy link
Contributor

vrpolakatcisco commented Mar 17, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants