Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interface state change causes other nexthop group to get deleted #16896

Closed
2 tasks done
fzakfeld opened this issue Sep 23, 2024 · 3 comments
Closed
2 tasks done

Interface state change causes other nexthop group to get deleted #16896

fzakfeld opened this issue Sep 23, 2024 · 3 comments
Labels
triage Needs further investigation

Comments

@fzakfeld
Copy link

fzakfeld commented Sep 23, 2024

Description

I have a setup with two links that are using BGP to receive a route. My routing table looks like this:

# ip r
default via 10.123.168.1 dev enp131s0 proto dhcp src 10.123.168.184 metric 100
10.123.168.0/22 dev enp131s0 proto kernel scope link src 10.123.168.184 metric 100
10.123.168.1 dev enp131s0 proto dhcp scope link src 10.123.168.184 metric 100
10.123.172.0/23 nhid 32 proto bgp src 10.123.172.184 metric 20
	nexthop via inet6 fe80::b66a:d4ff:fe76:49e9 dev enp193s0f0np0 weight 1
	nexthop via inet6 fe80::b66a:d4ff:fe76:55e9 dev enp193s0f1np1 weight 1

enp193s0f1np1 and enp193s0f0np0 are unnumbered BGP neighbours. FRR receives the prefix 10.123.172.0/23 and installs it in the routing table. enp131s0 is a management connection which receives routes via DHCP.

When enp131s0 goes down, the route 10.123.172.0/23 is also somehow withdrawn.

I can see this in the bgpd/zebra log:

Sep 23 10:19:10 compute005 bgpd[5487]: [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF enp131s0 in VRF 0
Sep 23 10:19:10 compute005 bgpd[5487]: [YNQCS-MR20J][EC 100663301] INTERFACE_VRF_UPDATE: Cannot find IF enp131s0 in VRF 0
Sep 23 10:19:10 compute005 zebra[5482]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: No such file or directory, type=RTM_DELNEXTHOP(105), seq=31, pid=3654003404
Sep 23 10:19:10 compute005 zebra[5482]: [YA619-S7J5M][EC 4043309075] Failed to uninstall Nexthop ID (15) from the kernel
Sep 23 10:19:10 compute005 zebra[5482]: [N2AGF-TSFB9][EC 4043309102] Kernel deleted a nexthop group with ID (34) that we are still using for a route, sending it back down

Not sure what the cause is here. Might be noteworthy that only occurs when I have multiple neighbours (assuming this has something to do with nexthop groups). Any help is appreciated.

Version

FRRouting 8.1 (compute005).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--localstatedir=/var/run/frr' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-rpki' '--disable-scripting' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'

How to reproduce

  • Have one host with two BGP neighbours and a third interface for management
  • Receive a prefix via both peers
  • Set the third, non BGP interface down
  • Observe routing and nexthop table

For reference, here is my config:

!
frr version 8.1
frr defaults traditional
hostname compute005
log syslog informational
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 4222000184
 bgp router-id 10.123.172.184
 no bgp ebgp-requires-policy
 bgp bestpath as-path multipath-relax
 neighbor enp193s0f1np1 interface remote-as external
 neighbor enp193s0f0np0 interface remote-as external

 !
 address-family ipv4 unicast
  redistribute connected route-map bgp_out
  neighbor enp193s0f1np1 route-map bgp_out out
  neighbor enp193s0f0np0 route-map bgp_out out
  maximum-paths 4
 exit-address-family
exit
!
route-map bgp_out permit 10
 match interface dummy0
exit
!
route-map RM_SET_SRC4 permit 10
 set src 10.123.172.184
exit
!
ip protocol bgp route-map RM_SET_SRC4
!
end

I did check if this is related to BGP unnumbered. Changing to an IPv4 address for the neighbour has not helped.

Expected behavior

I expect the prefix 10.123.172.0/23 to still be installed if one or both nexthops are still present and up.

Actual behavior

Prefix 10.123.172.0/23 is deleted from routing table

Additional context

No response

Checklist

  • I have searched the open issues for this bug.
  • I have not included sensitive information in this report.
@fzakfeld fzakfeld added the triage Needs further investigation label Sep 23, 2024
@ne-vlezay80
Copy link
Contributor

Please provide linux kernel version.

@donaldsharp
Copy link
Member

This has been fixed in never versions of frr. Please try latest master.

@fzakfeld
Copy link
Author

I found out that this most likely related to systemd/systemd#29034 and not FRR/zebra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

3 participants