Skip to content

Expose statistics about packet loss #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
carver opened this issue Sep 6, 2023 · 0 comments
Open

Expose statistics about packet loss #112

carver opened this issue Sep 6, 2023 · 0 comments

Comments

@carver
Copy link
Contributor

carver commented Sep 6, 2023

When packet loss rates get high, the client may choose to handle things in different ways (like skip a transfer altogether, or delay it). But it's hard for the client to tell from the outside that this is the situation. It might be an interesting avenue to pursue exposing some kind of statistics to the client about packet loss rates.

Packet loss would be tracked as the % of sent packets that time out. Maybe as an EMA to both smooth the number, and minimize tracking costs.

Originally posted by @carver in #87 (comment)

One interesting way to test this would be to add a test that launches, say 20k connections, as quickly as possible. This would demolish the current implementation in packet loss, when actually routed over UDP. (It can just barely handle 2k) The test would watch the new API for some packet loss threshold, and pause launching the next connection until it the stats improve.

Test pseudocode:

for _ in range(20_000):
  while socket.packet_loss_ratio() > 0.15:
    sleep(0.05)
  spawn(connect_and_send(socket, data, ...))
  sleep(0.0001) # give a tiny bit of time to start collecting data, but still keep the pressure high

Make sure the test fails now (where packet_loss_ratio() is hard-coded to 0). Then start tracking and reporting, to confirm that the test starts passing.

The 15% packet loss ratio was picked based on the probability of a single failure in 20000 (0.00005), and the independent probability that the loss happens 6 times in a row (which is the default number of connection attempts), which gives about 19%. So we try to stay a bit under that number for the test.


Separately, this statistic is something that we would love to collect & report in logs for trin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant