Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue339 try4 part1 fail connections in main loop #1247

Conversation

petersilva
Copy link
Contributor

replaced #1244 by cherry picking just the relevant commits and adding them to an uptodate development tree... #1244 was getting very confusing with > 20 commits of mixes of the two branches. With this restart, it's only a handful of commits, much easier to review.

goal: If we have multiple broker connections, then we can't loop waiting to connect to each one... because every time
one broker connection goes down, it will just loop there.

  • old behaviour: getSetup and putSetup loop waiting to get a connection.
  • new behaviour: getSetup and putSetup fail and return.
    main loop then calls them again.

This is also consistent with the pattern of trying to have a single place where the code sleeps.

The only change in this code is to remove the loop (which outdented the entire routine by 4... sigh...) changing "break" to "return" in a few places, and removing the sleeps.

So that's the first few patches... later it dawned that even if you loop from mainline... when you do a connection attempt and it hangs... you will spend all your time looping and hanging. To get some free time (so other connections can be used.) added exponential back off to connection attempts. That's the rest of the commits. Look at how long it took for the connection to fail, and then back off for at least twice that long.

note that when failures are quick, they can take variable amounts of time, so the ebo might not continuously increase because the backoff period is a multiple of how long it took to fail, so if it fails faster, it might want to retry sooner.

you get an odd pattern of retries... that generally does the right thing, but looks wrong...

@petersilva petersilva marked this pull request as ready for review October 3, 2024 20:33
Copy link

github-actions bot commented Oct 3, 2024

Test Results

249 tests   247 ✅  1m 34s ⏱️
  1 suites    1 💤
  1 files      1 ❌

For more details on these failures, see this check.

Results for commit b4deee3.

@petersilva
Copy link
Contributor Author

people are worried about this one. idea is to defer until .57 to stablize the changes we have in .56 and get that out the door first.

@petersilva
Copy link
Contributor Author

I'm going to do this over again later... with cleaner patches.

@petersilva petersilva closed this Oct 25, 2024
@petersilva petersilva deleted the issue339_try4_part1_fail_connections_in_main_loop branch November 28, 2024 20:36
@petersilva petersilva restored the issue339_try4_part1_fail_connections_in_main_loop branch November 28, 2024 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant