[CSIT-1963] 3n-icxd: Various symptoms pointing to hardware (cable/nic/driver) issues #4044

vvalderrv · 2025-02-04T22:47:02Z

Description

Sometimes TRex says [0] "trex.common.trex_exceptions.TRexError: action requires at least one port", or even [1] "trex.common.trex_exceptions.TRexError: Port 1 : *** please acquire the port before modifying port state".

Sometimes VPP says [2] "Interfaces still not in link-up state: ['TwentyFiveGigabitEthernet89/0/2']".

There may be other symptoms, depending on test type and exact time the issue happens.

In past, we have seen similar issues on other testbed, that got fixed by connecting the cable properly, but even then the failures were not very frequent.

On 3n-icx (both of them), the failing tests actually outnumber the passing ones. As the frequency is similar, maybe the issue is not in nic/cable but in driver?

Either way, whis does not look like a VPP (or CSIT or TRex) issue.

[0] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-3n-icxd/58/log.html.gz#s1-s1-s1-s1-s18-t2-k2-k11-k9-k13-k9-k2

[1] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-3n-icxd/58/log.html.gz#s1-s1-s1-s1-s18-t1-k2-k11-k9-k13-k7-k2

[2] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-3n-icxd/58/log.html.gz#s1-s1-s1-s1-s18-t6-k2-k6-k3-k2-k1

Assignee

Vratko Polak

Reporter

Vratko Polak

Comments

vrpolak (Fri, 15 Nov 2024 09:56:22 +0000): Maybe fixed in infra, need more runs to confirm both testbeds work reliably, not just sometimes.

For rls2410 iterative results, the failures are still very frequent, and can happen at any point, for example here [5] the test failed only in third latency trial.

[5] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2410-3n-icxd/33/log.html.gz#s1-s1-s1-s1-s18-t5-k2-k11-k28-k11

vrpolak (Wed, 6 Nov 2024 14:21:42 +0000): DPDK tests are also affected in rls2410. One more symptom [4] from testpmd test:

trex.common.trex_exceptions.TRexError: *** [RPC] - Failed to get server response from tcp://localhost:4501

[4] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-iterative-2410-3n-icxd/4/log.html.gz#s1-s1-s1-s1-t3-k2-k5-k14

vrpolak (Fri, 26 Jul 2024 14:27:52 +0000): > the failing tests actually outnumber the passing ones

True for rls2406 iterative tests, but trending shows [3] NDRPDR tests mostly pass, so maybe the frequency depends on "idle time" before the run? Soak tests (almost) never pass, so test duration and trial length may also be factors.

[3] https://csit.fd.io/stats/#eNrLyk-yTS7OLNEtKyjQLUgtStPNSykqSCnSLU9Nzc6p1M1NLC5JLdI1ztPNTK5IAQCbihFa

Original issue: https://jira.fd.io/browse/CSIT-1963

vvalderrv · 2025-02-04T22:47:03Z

Maybe fixed in infra, need more runs to confirm both testbeds work reliably, not just sometimes.

For rls2410 iterative results, the failures are still very frequent, and can happen at any point, for example here [5] the test failed only in third latency trial.

[5] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2410-3n-icxd/33/log.html.gz#s1-s1-s1-s1-s18-t5-k2-k11-k28-k11

vvalderrv · 2025-02-04T22:47:05Z

DPDK tests are also affected in rls2410. One more symptom [4] from testpmd test:

trex.common.trex_exceptions.TRexError: *** [RPC] - Failed to get server response from tcp://localhost:4501

[4] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-iterative-2410-3n-icxd/4/log.html.gz#s1-s1-s1-s1-t3-k2-k5-k14

vvalderrv · 2025-02-04T22:47:07Z

> the failing tests actually outnumber the passing ones

True for rls2406 iterative tests, but trending shows [3] NDRPDR tests mostly pass, so maybe the frequency depends on "idle time" before the run? Soak tests (almost) never pass, so test duration and trial length may also be factors.

[3] https://csit.fd.io/stats/#eNrLyk-yTS7OLNEtKyjQLUgtStPNSykqSCnSLU9Nzc6p1M1NLC5JLdI1ztPNTK5IAQCbihFa

vrpolakatcisco · 2025-03-17T11:56:03Z

While the failures seem less frequent than before, some jobs are still largely affected. Release data shows MRR tests are affected the least, trending shows [7] soak tests are mostly failing.
Symptoms are similar to #4018 but even more frequent. For example here [8] the trex.common.trex_exceptions.TRexError: action requires at least one port symptom appeared late in main search.

[6] https://csit.fd.io/report/#eNrtlk1OwzAQhU8TNmgQmdiEDYuW3AO59pRGTRrLNhHl9DhV0DRCQaiotAsv8vtexpP59CT70Dl68dQ8ZXKZlcsMy9rEU1YsbuPFNR7lPUJvLaC8i3eOGlKeoNhBrd9NfPtKaHN6xEKD6tdQWwEPYgW5Bgqb4Skeq-ET36ntUBqfx9Lf1mHVvAVW4-oTpSfH4qQtttnNnj1zzbJdOVLsjx2zFMgf9TL9J3atnWrJ1x_E1jgF1nWcM0u5ntYPe3ukfg2qrA6Wv1Ix1mx_wNI6d0VUhmavFMs4qP-hYk2i8isq46DOQQVTVuap4KWygikrp1I5Y1ZEyso8FXGprIiUlVOpcFZkdbPrXHvYIcvqE7Ys_T4
[7] https://csit.fd.io/stats/#eNrLyk-yTS7OLNFNKUjJ1i1ILUrTzS0q0i1PTc3OqdTNTSwuSS3SNc7TzUyuSAEAdFIQbg
[8] https://logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2502-3n-icxd/42/log.html.gz#s1-s1-s1-s4-s1-t1-k2-k9-k14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CSIT-1963] 3n-icxd: Various symptoms pointing to hardware (cable/nic/driver) issues #4044

[CSIT-1963] 3n-icxd: Various symptoms pointing to hardware (cable/nic/driver) issues #4044

vvalderrv commented Feb 4, 2025

vvalderrv commented Feb 4, 2025

vvalderrv commented Feb 4, 2025

vvalderrv commented Feb 4, 2025

vrpolakatcisco commented Mar 17, 2025 •

edited

Loading

[CSIT-1963] 3n-icxd: Various symptoms pointing to hardware (cable/nic/driver) issues #4044

[CSIT-1963] 3n-icxd: Various symptoms pointing to hardware (cable/nic/driver) issues #4044

Comments

vvalderrv commented Feb 4, 2025

Description

Assignee

Reporter

Comments

vvalderrv commented Feb 4, 2025

vvalderrv commented Feb 4, 2025

vvalderrv commented Feb 4, 2025

vrpolakatcisco commented Mar 17, 2025 • edited Loading

vrpolakatcisco commented Mar 17, 2025 •

edited

Loading