Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differentiate status codes from functions vs OpenFaaS components #1792

Open
alexellis opened this issue May 10, 2023 · 2 comments
Open

Differentiate status codes from functions vs OpenFaaS components #1792

alexellis opened this issue May 10, 2023 · 2 comments

Comments

@alexellis
Copy link
Member

Description

Differentiate status codes from functions vs OpenFaaS components

Why do you need this?

When invoking functions or attempting to retry them, it's currently hard to differentiate between an error caused by Kubernetes (i.e. node eviction during an invocation) or by a function (i.e. a 429 because the Twitter API that was being used is overloaded)

Who is this for?

@kevin-lindsay-1 for Surge has requested this - but Waylay also wanted this for their integration cc @OcamsRazor

Expected Behaviour

A way to determine whether a 500 error was from the gateway / watchdog or from the function

Current Behaviour

There are some hints depending on the message body and headers, however no consistency right now.

List All Possible Solutions and Workarounds

  1. Add a header when handling an error condition in any of the OpenFaaS components
  2. Add a header when the watchdog receives a HTTP response from a function

Which Solution Do You Recommend?

I recommend 1 - because 2 depends on the use of the watchdog, which is not used by all users.

For 1 - the gateway needs a change since it can invoke functions directly. The provider also needs a change for when direct_functions is set to false and invocations flow through it instead. The watchdog should also have a change so that if it handles an error that can be passed back up the stream.

Headers do support multiple values for a key, i.e. X-OpenFaaS-Source: [watchdog, gateway]

For 1 - the watchdog does not need to be updated, and even when it's not in use, this header would still propagate and flow.

Then there's a wider conversation about how the queue-worker should retry errors when the "X-OpenFaaS-Source" header is present - assuming that these need to be retried due to an error during scaling - or node eviction.

@kevin-lindsay-1
Copy link
Sponsor

kevin-lindsay-1 commented May 10, 2023

I think that options 1 and 2 should both be included, and that option 2 should be opt-in for the sake of backwards compatibility.

Here's a truth table for describing the cases using only option 1:

source x-openfaas-component
infrastructure/function
gateway gateway
queue worker queue-worker
watchdog watchdog
multiple queue-worker; gateway

As you can see, in this approach you don't know if the response is coming from infrastructure (non-openfaas, non-function. k8s, istio, etc) or from the function.

Here's another table using only option 2:

source x-openfaas-component
infrastructure/gateway/queue worker/watchdog
function function

This option only lets you know if the response came from a function. All cases other than the function are effectively unknown as to what the source was.

Here's a final table using both:

source x-openfaas-component
infrastructure
gateway gateway
queue worker queue-worker
watchdog watchdog
function function
multiple queue-worker; gateway

This approach allows you know the most; it allows you to know if the response came from inside openfaas, the user's function, or some other unknown/infrastructural source.


As for whether or not option 1 should be "only when an OF component has an error", I would like to clarify that I don't think that's specifically necessary to call out, as the source of a response should effectively be invisible to the user unless there's an error. In other words, there's no specific reason why the x-openfaas-component cannot be set when the response is not an error, but in normal operating procedure, the other components are not responding, they are proxying, and therefore the response itself isn't coming from them.

If it matters, you could have a header stating every OF component that "touched" this request along the way, but that's more of a debug feature in my opinion.

@alexellis
Copy link
Member Author

Summarised from Kevin on weekly call:

Responses coming from OpenFaaS components should be retried according to a default retry policy - i.e. defined in Helm chart.

Responses not coming from OpenFaaS components would be retried according to the configured policy on the component i.e. queue-worker / connector.

Responses coming from function would use the retry configuration set through annotations on the function. If no annotations are set the function would not be retried. This functionality should be opt-in and can be turned on via a parameter in the chart. This would need the watchdog update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants