Skip to content

Problem using SwitchML with PCI NIC virtual function #31

@kfertakis

Description

@kfertakis

Hi,

I am running SwitchML allreduce_benchmarks on a cluster of nodes with a mix of MLX5 NICs and some with Intel 82599 ES 10G NICs thus I'm using DPDK as the communication backend. I need to share the NIC on each host with other traffic so I'm virtualizing it by creating a virtual function of the PCI device in order to use the original device for general purpose traffic and the virtual device to run the SwitchML app. However, when I try to run SwitchML with the virtual device, I'm getting the following error:

Submitting 5 warmup jobs.
EAL: Detected 20 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:10.1 on NUMA socket 0
EAL:   probe driver: 8086:10ed net_ixgbe_vf
EAL:   using IOMMU type 1 (Type 1)
E1116 19:08:30.807562 74629 dpdk_master_thread_utils.inc:277] Flow isolated mode failed: 1 Function not implemented
F1116 19:08:31.361418 74629 dpdk_master_thread_utils.inc:154] Flow rule can't be added: 1Function not implemented
*** Check failure stack trace: ***
    @     0x7f2291e280cd  google::LogMessage::Fail()
    @     0x7f2291e29f33  google::LogMessage::SendToLog()
    @     0x7f2291e27c28  google::LogMessage::Flush()
    @     0x7f2291e2a999  google::LogMessageFatal::~LogMessageFatal()
    @     0x564504203fb2  switchml::InsertFlowRule()
    @     0x5645042048ca  switchml::InitPort()
    @     0x564504205ae6  switchml::DpdkMasterThread::operator()()
    @     0x7f2291ae34c0  (unknown)
    @     0x7f22915766db  start_thread
    @     0x7f229015761f  clone

which seems to be caused by struct rte_flow_error error; LOG_IF(FATAL, rte_flow_validate(port_id, &attr, pattern, action, &error) != 0) << "Flow rule can't be added: " << error.type << (error.message ? error.message : "(no stated reason)"); in InsertFlowRule function. Any ideas on why this is happening and whether I can overcome this? Much appreciate it.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions