Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault once maxRecords is reached #10892

Open
mat1010 opened this issue Nov 30, 2023 · 5 comments
Open

Segfault once maxRecords is reached #10892

mat1010 opened this issue Nov 30, 2023 · 5 comments
Labels

Comments

@mat1010
Copy link

mat1010 commented Nov 30, 2023

We are running Trafficserver 9.2.3 and ran into an issue where trafficserver reached the maximum amount of stats and records which is set by maxRecords.

The reason for this is that we are also running podman containers on the same server. Every new container and every restart of a container causes a change of the virtual network interfaces. A new container get's a new interface and a restarted container removes it's current interfaces and gets a new one, with a new name. Every interface creates new records

plugin.system_stats.net.vethfb0aaa00.speed 10000
plugin.system_stats.net.vethfb0aaa00.collisions 0
plugin.system_stats.net.vethfb0aaa00.multicast 0
plugin.system_stats.net.vethfb0aaa00.rx_bytes 71171126
plugin.system_stats.net.vethfb0aaa00.rx_compressed 0
plugin.system_stats.net.vethfb0aaa00.rx_crc_errors 0
plugin.system_stats.net.vethfb0aaa00.rx_dropped 0
plugin.system_stats.net.vethfb0aaa00.rx_errors 0
plugin.system_stats.net.vethfb0aaa00.rx_fifo_errors 0
plugin.system_stats.net.vethfb0aaa00.rx_frame_errors 0
plugin.system_stats.net.vethfb0aaa00.rx_length_errors 0
plugin.system_stats.net.vethfb0aaa00.rx_missed_errors 0
plugin.system_stats.net.vethfb0aaa00.rx_nohandler 0
plugin.system_stats.net.vethfb0aaa00.rx_over_errors 0
plugin.system_stats.net.vethfb0aaa00.rx_packets 983190
plugin.system_stats.net.vethfb0aaa00.tx_aborted_errors 0
plugin.system_stats.net.vethfb0aaa00.tx_bytes 133071338
plugin.system_stats.net.vethfb0aaa00.tx_carrier_errors 0
plugin.system_stats.net.vethfb0aaa00.tx_compressed 0
plugin.system_stats.net.vethfb0aaa00.tx_dropped 0
plugin.system_stats.net.vethfb0aaa00.tx_errors 0
plugin.system_stats.net.vethfb0aaa00.tx_fifo_errors 0
plugin.system_stats.net.vethfb0aaa00.tx_heartbeat_errors 0
plugin.system_stats.net.vethfb0aaa00.tx_packets 1912343
plugin.system_stats.net.vethfb0aaa00.tx_window_errors 0

This would not be an issue if we either could purge the records, not only the values, from time to time (without restarting trafficserver), or the creation of new stats would just not be possible anymore with a corresponding log message. Unfortunately once the value of maxRecords is reached the trafficserver segfaults and does not recover by itself since the traffic_manager process is not getting killed so systemd is not able to handle it with the restart=on-failure directive.

Is this a known issue, or is this the expected bevahiour? Is it save to increase the maxRecords limit to a huge number? What might be the drawbacks?

I attached the crashlogs from systemd and trafficserver
systemd.log
crash-2023-11-28-164715.log

Thanks in advance

@mat1010
Copy link
Author

mat1010 commented Dec 4, 2023

Setting --maxRecords leads to other issues and causes the system_stats and remap_stats plugins to stop reporting at all.

@ezelkow1
Copy link
Member

ezelkow1 commented Dec 4, 2023

have you tried with -m instead of --maxRecords
i.e. ExecStart=/opt/trafficserver/bin/traffic_manager -m 4096 ? We hit the max a couple months back and yea you start to see weird things happen and things breaking, but I used -m in our systemd script to increase the amount and it's been happy with that. Just throwing it out there, maybe some others will have better input :)

@mat1010
Copy link
Author

mat1010 commented Dec 4, 2023

Thank you @ezelkow1 . Your comment got me into the right direction. Both -m and --maxRecords seem to work.
The issue was that my value was too high and my test was invalid. After each modification I checked for the existence of remap_stats in the stats output - but those are only being populated once a request hits a remap rule. Since the server I was testing with is not active in the loadbalancing right now remap_stats where never getting populated... testing with system_stats worked.

@bryancall
Copy link
Contributor

@mat1010 Did you get this issue resolved? If so, can you please close it.

@mat1010
Copy link
Author

mat1010 commented Dec 5, 2023

@bryancall The main issue, mentioned in my first post, still exists. I'm not sure if segfaulting should be an expected result in case the maxRecords limit is reached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants