Add retries to apple push notifications #128

larkox · 2024-09-17T15:22:05Z

Summary

Add retries to apple notifications. This may solve some RequestError errors we see in the metrics.

Ticket Link

https://mattermost.atlassian.net/browse/MM-60572

marianunez · 2024-09-17T19:08:06Z

server/apple_notification_server.go

+	// Set the retry context timeout to a value that allow us to retry the notification
+	// the MAX RETRIES with exponential backoff. With default values, this timeout
+	// will be 5 seconds.
+	retryContextTimeout := me.sendTimeout / (MAX_RETRIES * 2)


The retry value would be 60 secs? So the timeout for the context here is 10 secs ? Just wondering is that what you intended

me.sendTimeout is by default 30s. MAX_RETRIES is 3.

30 / (3* 2) = 5 seconds

So... worst case scenario with the defaults...

5 seconds trying the first time and fail because the context exceeded the deadline

Wait 1 second

5 seconds trying the second time and fail because the context exceeded the deadline

Wait 2 seconds

5 seconds trying the third time and fail because the context exceeded the deadline

Return error

A total of 18s, more than enough to fit in the 30s timeout.

I can do something more hardcoded, or tweak the numbers if we prefer. I don't have a strong opinion on these.

Are we confident that 5 seconds is enough to set the timeout for the apple client call? It seems we are going from 30 secs per call to only 5 secs per call. In the dashboard we saw there were times it would take up to 10 secs.

Just trying to understand why we are trying to fit all the retry logic within the initial configuration of time out per send?

Agree. The sendTimeout is the timeout for a single send. It should not be made to limit the time including retries.

My worry is that 30 seconds 3 times is +90 seconds the server waiting for the push proxy response. The server problably has also a context deadline, so not sure if we want to wait that long.

But if you have a strong feeling we should just use the configured timeout for each retry, I am happy to do it.

It's about semantics. The sendTimeout should be the timeout for a single request. If you want to have a separate timeout including all the retries, then that should be a separate config setting.

The server timeout is hardcoded to 30s currently.

Added a new config, and use both to properly handle this. I set it to 8 seconds, since it is the highest integer we can use that will fit comfortably in the 30 seconds (8*3 + 1 + 2 = 27s).

agnivade

I think retries are better implemented with a roundTripper as done here: https://github.com/hashicorp/go-retryablehttp/blob/main/roundtripper.go. Or perhaps, we can simply use that library to do the retry.

agnivade · 2024-09-19T04:36:09Z

server/apple_notification_server.go

+	// Set the retry context timeout to a value that allow us to retry the notification
+	// the MAX RETRIES with exponential backoff. With default values, this timeout
+	// will be 5 seconds.
+	retryContextTimeout := me.sendTimeout / (MAX_RETRIES * 2)


Agree. The sendTimeout is the timeout for a single send. It should not be made to limit the time including retries.

agnivade · 2024-09-19T04:38:46Z

Also, note that the server might return a 429: https://developer.apple.com/documentation/usernotifications/handling-notification-responses-from-apns#Interpret-header-responses, in which case we should slow down, and retrying further might worsen the condition. This will require usage of a rate limiter. Probably something to do as a future improvement.

larkox · 2024-09-19T07:38:57Z

@agnivade A 429 is handled by the library, and doesn't become an error, but a "not sent" response with a ReasonTooManyRequests reason. That would not get retried.

larkox · 2024-09-19T07:52:25Z

I think retries are better implemented with a roundTripper as done here: https://github.com/hashicorp/go-retryablehttp/blob/main/roundtripper.go. Or perhaps, we can simply use that library to do the retry.

My problem with this approach is that the library already does plenty of stuff that I don't want to mess with. So I don't need the retry at the http client level, but higher. My idea is mainly retry when there is a network hiccup. Most of the errors we have seen like this are context deadline errors, which I don't think are automatically retried in these circumstances either.

agnivade · 2024-09-19T07:56:51Z

That's okay. Not a big deal.

marianunez

LGTM, thanks @larkox!

agnivade

One small thing to verify in config.

agnivade · 2024-09-20T04:16:28Z

server/config_push_proxy.go

+	if cfg.RetryTimeoutSec == 0 {
+		cfg.RetryTimeoutSec = 8
+	}
+
 	if cfg.EnableFileLog {


We should add a check here to verify that the retryTimeout isn't greater than the sendTimeout.

agnivade · 2024-09-20T04:22:38Z

server/apple_notification_server.go

+		start := time.Now()
+
+		retryContext, cancelRetryContext := context.WithTimeout(generalContext, me.retryTimeout)
+		defer cancelRetryContext()


Just a minor note: having defers in a for loop has a slight problem that it will keep getting accumulated until the whole function exits. But since this loop has a hardcoded upper bound, this is okay. Otherwise, it would be recommended to cancel as soon as one iteration finishes.

For readability sake, I feel it is better to keep it as is. As you said, the loop is bound, so it should not be a big issue.

Add retries to apple push notifications

33ea5ac

larkox added the 1: Dev Review Requires review by a core commiter label Sep 17, 2024

larkox requested review from marianunez and enahum September 17, 2024 15:22

marianunez reviewed Sep 17, 2024

View reviewed changes

enahum requested a review from agnivade September 18, 2024 02:23

enahum approved these changes Sep 18, 2024

View reviewed changes

agnivade reviewed Sep 19, 2024

View reviewed changes

Add retry timeout config

6ec7013

marianunez approved these changes Sep 19, 2024

View reviewed changes

agnivade reviewed Sep 20, 2024

View reviewed changes

Add check for retry timeout being greater than send timeout

def0093

larkox requested a review from agnivade September 24, 2024 10:45

agnivade approved these changes Sep 24, 2024

View reviewed changes

larkox added 2: Reviews Complete All reviewers have approved the pull request and removed 1: Dev Review Requires review by a core commiter labels Sep 24, 2024

larkox merged commit 681c1cb into master Sep 24, 2024
5 checks passed

larkox deleted the appleRetry branch September 24, 2024 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add retries to apple push notifications #128

Add retries to apple push notifications #128

larkox commented Sep 17, 2024

marianunez Sep 17, 2024 •

edited

Loading

larkox Sep 18, 2024

marianunez Sep 18, 2024 •

edited

Loading

agnivade Sep 19, 2024

larkox Sep 19, 2024

agnivade Sep 19, 2024

larkox Sep 19, 2024

agnivade left a comment

agnivade Sep 19, 2024

agnivade commented Sep 19, 2024

larkox commented Sep 19, 2024

larkox commented Sep 19, 2024

agnivade commented Sep 19, 2024

marianunez left a comment

agnivade left a comment

agnivade Sep 20, 2024

agnivade Sep 20, 2024

larkox Sep 24, 2024

Add retries to apple push notifications #128

Add retries to apple push notifications #128

Conversation

larkox commented Sep 17, 2024

Summary

Ticket Link

marianunez Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marianunez Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agnivade left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agnivade commented Sep 19, 2024

larkox commented Sep 19, 2024

larkox commented Sep 19, 2024

agnivade commented Sep 19, 2024

marianunez left a comment

Choose a reason for hiding this comment

agnivade left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marianunez Sep 17, 2024 •

edited

Loading

marianunez Sep 18, 2024 •

edited

Loading