Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R2S rousette crash on boot if LAN not connected #751

Open
minexn opened this issue Oct 22, 2024 · 6 comments · May be fixed by #771
Open

R2S rousette crash on boot if LAN not connected #751

minexn opened this issue Oct 22, 2024 · 6 comments · May be fixed by #771
Labels
bug Something isn't working

Comments

@minexn
Copy link

minexn commented Oct 22, 2024

Current Behavior

The system booted with WAN connected and LAN disconnected.

lan             ethernet   DOWN        b2:b2:41:23:71:82                        
                ipv4                   192.168.2.1/24 (static)
wan             ethernet   UP          b2:b2:41:23:71:83                        
                ipv4                   10.10.3.101/24 (dhcp)

Rousette keeps crashing

Oct 22 06:47:02 r2s rousette[1883]: [2024-10-22 06:47:02.245] [rousette] [info] NACM config validation: Anonymous user access disabled 
Oct 22 06:47:02 r2s rousette[1883]: [2024-10-22 06:47:02.248] [rousette] [warning] Telemetry disabled. No CzechLight YANG modules found. 
Oct 22 06:47:07 r2s finit[1]: Service rousette keeps crashing, not restarting.

The system booted with WAN and LAN connected.

lan             ethernet   UP          b2:b2:41:23:71:82                        
                ipv4                   192.168.2.1/24 (static)
wan             ethernet   UP          b2:b2:41:23:71:83                        
                ipv4                   10.10.3.101/24 (dhcp)

Rousette starts and responds to queries.

Oct 22 06:49:09 r2s rousette[1542]: [2024-10-22 06:49:09.645] [rousette] [info] NACM config validation: Anonymous user access disabled 
Oct 22 06:49:09 r2s rousette[1542]: [2024-10-22 06:49:09.662] [rousette] [warning] Telemetry disabled. No CzechLight YANG modules found. 

Expected Behavior

Rousette starts and responds to queries.

Oct 22 06:49:09 r2s rousette[1542]: [2024-10-22 06:49:09.645] [rousette] [info] NACM config validation: Anonymous user access disabled 
Oct 22 06:49:09 r2s rousette[1542]: [2024-10-22 06:49:09.662] [rousette] [warning] Telemetry disabled. No CzechLight YANG modules found. 

Steps To Reproduce

load v24.10.1
unplug LAN
reboot
check log

Additional information

Factory configuration

@minexn minexn added bug Something isn't working triage Pending investigation & classification (CCB) labels Oct 22, 2024
@troglobit
Copy link
Contributor

troglobit commented Oct 23, 2024

Reproduced on my R2S:

Oct 23 05:35:26 r2s finit[1]: Service rousette[2080] died, restarting in 5000 msec (10/10)
Oct 23 05:35:27 r2s finit[1]: Starting rousette[2163]
Oct 23 05:35:27 r2s rousette[2163]: [2024-10-23 05:35:27.538] [rousette] [info] NACM config validation: Anonymous user access disabled 
Oct 23 05:35:27 r2s rousette[2163]: [2024-10-23 05:35:27.541] [rousette] [warning] Telemetry disabled. No CzechLight YANG modules found. 
Oct 23 05:35:27 r2s rousette[2163]: terminate called after throwing an instance of 'std::runtime_error'
Oct 23 05:35:27 r2s rousette[2163]:   what():  Server error: Host not found (authoritative)
Oct 23 05:35:32 r2s finit[1]: Service rousette keeps crashing, not restarting.

@troglobit troglobit removed the triage Pending investigation & classification (CCB) label Oct 23, 2024
@troglobit
Copy link
Contributor

Workaround, as suggested by @mattiaswal, helps:

admin@r2s:/cfg$ diff backup.cfg startup-config.cfg 
--- backup.cfg
+++ startup-config.cfg
@@ -39,7 +39,8 @@
       },
       {
         "name": "wan",
-        "type": "infix-if-type:ethernet"
+        "type": "infix-if-type:ethernet",
+        "ietf-ip:ipv6": {}
       }
     ]
   },

@troglobit
Copy link
Contributor

If I try to mimic the same setup in Qemu, using the x86_64 build, by disabling ipv6 on all ethernet interfaces, I cannot reproduce the problem. Very odd, need to discuss this further with @mattiaswal.

@troglobit
Copy link
Contributor

After discussions with @mattiaswal and the rest of core team, we decided yesterday to check if this was an issue also with the standard aarch64 builds on tier one customer HW (Marvell CRB derivatives).

These tests were concluded this morning, without any problems.

So, it seems this issue is localized to the R2S build.

@sgsx3
Copy link

sgsx3 commented Oct 26, 2024

Also had that issue with rousette bailing out with:

rousette[1957]: terminate called after throwing an instance of 'std::runtime_error'
rousette[1957]: what(): Server error: Host not found (authoritative)

Turns out that the boost library is not willing to resolve a numeric IPv6 host (::1) because its resolver flags are set to address_configured by default.
See https://www.boost.org/doc/libs/1_83_0/doc/html/boost_asio/reference/ip__resolver_base.html for more info.

The following patch resolved it for me:

--- nghttp2-asio-e877868abe06a83ed0a6ac6e245c07f6f20866b5/lib/asio_server.cc
+++ nghttp2-asio-e877868abe06a83ed0a6ac6e245c07f6f20866b5/lib/asio_server.cc
@@ -82,8 +82,13 @@ boost::system::error_code server::bind_and_listen(boost::system::error_code &ec,
   // Open the acceptor with the option to reuse the address (i.e.
   // SO_REUSEADDR).
   tcp::resolver resolver(io_service_pool_.get_io_service());
+
   tcp::resolver::query query(address, port);
   auto it = resolver.resolve(query, ec);
+  if (ec) {
+    tcp::resolver::query query(address, port, boost::asio::ip::resolver_query_base::numeric_host);
+    auto it = resolver.resolve(query, ec);
+  }
   if (ec) {
     return ec;
   }

@troglobit
Copy link
Contributor

Nice catch! Do you think you could try and get this patch in upstream so we can use a backport of that in Infix? A bit unsure of the state of that upstream though, do you know more @mattiaswal?

sgsx3 added a commit to sgsx3/infix that referenced this issue Oct 28, 2024
The boost library refuses to resolve a numeric IPv6 host (::1) because its resolver flags are set to 'address_configured' by default.
This patch simply runs an additional query in such a case with flags set to 'numeric_host'.

See https://www.boost.org/doc/libs/1_83_0/doc/html/boost_asio/reference/ip__resolver_base.html for more info.

Fixes kernelkit#751

Signed-off-by: Stefan Schlosser <[email protected]>
@sgsx3 sgsx3 linked a pull request Oct 28, 2024 that will close this issue
17 tasks
sgsx3 added a commit to sgsx3/infix that referenced this issue Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

3 participants