Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is ConnectX-6 supported? #19

Open
jslouisyou opened this issue Dec 13, 2022 · 6 comments
Open

Is ConnectX-6 supported? #19

jslouisyou opened this issue Dec 13, 2022 · 6 comments

Comments

@jslouisyou
Copy link

Dear,

I'm currently using Mellanox ConnectX-6 Adapter (HPE InfiniBand HDR/Ethernet 200Gb 2-port QSFP56 PCIe4 x16 MCX653106A-HDAT Adapter) and trying to using sriov-network-metrics-exporter in Kubernetes cluster, but any sriov-network-metrics-exporter PODs can't get metrics from Infiniband Physical & Virtual Function when I tried to k exec -it -n monitoring sriov-metrics-exporter-lj8rq -- wget -O- localhost:9808/metrics (sriov-metrics-exporter-lj8rq is deployed exporter in Kubernetes cluster).

BTW,
I digged into some codes in collectors/sriovdev.go and netClass is only defined for 0x020000.

var (
	sysBusPci             = flag.String("path.sysbuspci", "/sys/bus/pci/devices", "Path to sys/bus/pci on host")
	sysClassNet           = flag.String("path.sysclassnet", "/sys/class/net/", "Path to sys/class/net on host")
	netlinkEnabled        = flag.Bool("collector.netlink", true, "Enable or disable use of netlink for VF stats collection in favor of driver specific collectors.")
	totalVfFile           = "sriov_totalvfs"
	pfNameFile            = "/net"
	netClassFile          = "/class"
	driverFile            = "/driver"
	netClass        int64 = 0x020000
	vfStatSubsystem       = "vf"
	sriovDev              = "vfstats"
	sriovPFs              = make([]string, 0)
)

It seems that, in case of ConnectX-6, 0x020000 is only for ethernet adapter and 0x020700 is for Infiniband adapter.

Here's my environment as below - ibs* is for Infiniband adapter and ens* is for ethernet adapter.

$ mst status -v
.....
PCI devices:
------------
DEVICE_TYPE             MST                           PCI       RDMA            NET                       NUMA  
ConnectX6(rev:0)        /dev/mst/mt4123_pciconf5      c5:00.0   mlx5_2          net-ibs20                 5     
ConnectX6(rev:0)        /dev/mst/mt4123_pciconf4.1    b0:00.1   mlx5_6          net-ibs21f1               6     
ConnectX6(rev:0)        /dev/mst/mt4123_pciconf4      b0:00.0   mlx5_5          net-ens21f0np0            6     
ConnectX6(rev:0)        /dev/mst/mt4123_pciconf3.1    af:00.1   mlx5_4          net-ibs22f1               6     
ConnectX6(rev:0)        /dev/mst/mt4123_pciconf3      af:00.0   mlx5_3          net-ens22f0np0            6     
ConnectX6(rev:0)        /dev/mst/mt4123_pciconf2      85:00.0   mlx5_7          net-ibs19                 7     
ConnectX6(rev:0)        /dev/mst/mt4123_pciconf1      45:00.0   mlx5_0          net-ibs18                 1     
ConnectX6(rev:0)        /dev/mst/mt4123_pciconf0      0e:00.0   mlx5_1          net-ibs17                 3     

and ens* has 0x020000 class when I checked in below.

$ cat /sys/bus/pci/devices/0000\:b0\:00.0/class -> for net-ens21f0np0
0x020000

$ /sys/bus/pci/devices/0000\:af\:00.0/class -> for net-ens22f0np0
0x020000

but All Infiniband adapter has 0x020700 class accordingly,

$ cat /sys/bus/pci/devices/0000\:c5\:00.0/class -> for net-ibs20
0x020700

$ cat /sys/bus/pci/devices/0000\:b0\:00.1/class -> for net-ibs21f1
0x020700

... and so on

so I changed netClass from 0x020000 to 0x020700 and then sriov-network-metrics-exporter can find all IB PFs and VFs.
Before change it, sriov-metrics-exporter POD is showing only ethernet adapters are caught;

2022/12/13 06:15:08 The kubepoddevice collector is enabled
2022/12/13 06:15:08 The vfstats collector is enabled
2022/12/13 06:15:08 listening on :9808
2022/12/13 06:15:26 using netlink for ens22f0np0
2022/12/13 06:15:26 PerPF called for ens22f0np0
2022/12/13 06:15:26 using netlink for ens21f0np0
2022/12/13 06:15:26 PerPF called for ens21f0np0
2022/12/13 06:15:56 using netlink for ens22f0np0
2022/12/13 06:15:56 PerPF called for ens22f0np0
2022/12/13 06:15:56 using netlink for ens21f0np0
2022/12/13 06:15:56 PerPF called for ens21f0np0
...

After change it, sriov-metrics-exporter POD can catch IB adapter;

2022/12/13 07:39:38 The vfstats collector is enabled
2022/12/13 07:39:38 The kubepoddevice collector is enabled
2022/12/13 07:39:38 listening on :9808
2022/12/13 07:39:39 using netlink for ibs21f1
2022/12/13 07:39:39 PerPF called for ibs21f1
2022/12/13 07:39:39 using netlink for ibs20
2022/12/13 07:39:39 PerPF called for ibs20
2022/12/13 07:39:39 using netlink for ibs17
2022/12/13 07:39:39 PerPF called for ibs17
2022/12/13 07:39:39 using netlink for ibs18
2022/12/13 07:39:39 PerPF called for ibs18
2022/12/13 07:39:39 using netlink for ibs19
2022/12/13 07:39:39 PerPF called for ibs19
2022/12/13 07:39:39 using netlink for ibs22f1
2022/12/13 07:39:39 PerPF called for ibs22f1
...

But all metrics except sriov_kubepoddevice show 0 in prometheus, even if I attach all VFs in each 2 POD and ran ib_send_bw between them.

# HELP sriov_vf_tx_packets Statistic tx_packets.
# TYPE sriov_vf_tx_packets counter
sriov_vf_tx_packets{numa_node="1",pciAddr="0000:45:00.1",pf="ibs18",vf="0"} 0
sriov_vf_tx_packets{numa_node="1",pciAddr="0000:45:00.2",pf="ibs18",vf="1"} 0
sriov_vf_tx_packets{numa_node="1",pciAddr="0000:45:00.3",pf="ibs18",vf="2"} 0
sriov_vf_tx_packets{numa_node="1",pciAddr="0000:45:00.4",pf="ibs18",vf="3"} 0
sriov_vf_tx_packets{numa_node="1",pciAddr="0000:45:00.5",pf="ibs18",vf="4"} 0
sriov_vf_tx_packets{numa_node="1",pciAddr="0000:45:00.6",pf="ibs18",vf="5"} 0
sriov_vf_tx_packets{numa_node="1",pciAddr="0000:45:00.7",pf="ibs18",vf="6"} 0
sriov_vf_tx_packets{numa_node="1",pciAddr="0000:45:01.0",pf="ibs18",vf="7"} 0
sriov_vf_tx_packets{numa_node="3",pciAddr="0000:0e:00.1",pf="ibs17",vf="0"} 0
sriov_vf_tx_packets{numa_node="3",pciAddr="0000:0e:00.2",pf="ibs17",vf="1"} 0
sriov_vf_tx_packets{numa_node="3",pciAddr="0000:0e:00.3",pf="ibs17",vf="2"} 0
sriov_vf_tx_packets{numa_node="3",pciAddr="0000:0e:00.4",pf="ibs17",vf="3"} 0
sriov_vf_tx_packets{numa_node="3",pciAddr="0000:0e:00.5",pf="ibs17",vf="4"} 0
sriov_vf_tx_packets{numa_node="3",pciAddr="0000:0e:00.6",pf="ibs17",vf="5"} 0
sriov_vf_tx_packets{numa_node="3",pciAddr="0000:0e:00.7",pf="ibs17",vf="6"} 0
sriov_vf_tx_packets{numa_node="3",pciAddr="0000:0e:01.0",pf="ibs17",vf="7"} 0
...

I think these PFs are not recognized in current pkg/vfstats/netlink.go.

So, is ConnectX-6 supported in current sriov-network-metrics-exporter? and if not, is there any plan for supporting ConnectX-6 later?

Thanks!

@SchSeba
Copy link
Collaborator

SchSeba commented Dec 18, 2022

@Eoghan1232 I think right now it's not supported but it should be with the net implementation you are working on right?

@eoghanlawless
Copy link
Collaborator

Hey @jslouisyou,

The current version does not officially support Mellanox InfiniBand interfaces, though with the Netlink collector enabled and your device class change, it might work.

We are planning to support Mellanox cards with the latest Mellanox EN and OFED drivers.

@jslouisyou
Copy link
Author

Thanks @eoghanlawless
Are there any specific plans or release date?

@eoghanlawless
Copy link
Collaborator

We have a few changes coming soon, but haven't looked at implementing Mellanox InfiniBand support just yet.

The next release should be in the new year, and the following release should include InfiniBand support.

@jslouisyou
Copy link
Author

@eoghanlawless
Hello, recently I see that #24 (new 1.0 version) is going to be updated soon.
Does it include Mellanox Infiniband (eg, ConnectX-6) support?

Thanks.

@Eoghan1232
Copy link
Collaborator

Hi @jslouisyou - 1.0 version focusses on common functionality across vendors - and prioritizes Ethernet as the common use case - Mellanox bespoke stats for InfiniBand are outside the current scope. 

Intel do not currently support InfiniBand, and have no way to validate it's functionality.

Metrics provides an extendable interface for others to contribute, which could include InfiniBand support. 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants