Inconsistent server-side error in ServerStreamTracer vs ServerInterceptor #8120

JackOfMostTrades · 2021-04-27T16:18:54Z

What version of gRPC-Java are you using?

1.37.0

What is your environment?

Linux, OpenJDK 1.8
openjdk version "1.8.0_282"
OpenJDK Runtime Environment (build 1.8.0_282-8u282-b08-0ubuntu1~20.04-b08)
OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode)

What did you expect to see?

In a bidirectional streaming service, I'm observing different a status passed to ServerCall.close(Status, Metadata) versus the status passed to ServerStreamTracer.streamClosed(Status) when a misbehaving client is abruptly terminating the connection (without sending any sort of cancel or reset).

What did you see instead?

The server has a logging interceptor which intercepts the ServerCall.close(Status, Metadata) method and logs the status. The status seen here is Status{code=CANCELLED, description=Cancelling request because of error from client., cause=null}. We also have a ServerStreamTracer installed on the server, but the Status passed to ServerStreamTracer.streamClosed(Status) is Status{code=UNAVAILABLE, description=connection terminated for unknown reason, cause=null}.

Steps to reproduce the bug

I have a minimal reproducer with some instructions here.

Comments

If the client is terminating the connection then I suppose there really is no true status (either from the server or from the client). So I could be convinced that either of these Status codes would be valid. What really threw off our metrics and sent a couple of engineers down a debugging rabbit hole is that we couldn't correlate any request logs on the server (being logged from the ServerInterceptor) with the status metrics being emitted from the ServerStreamTracer. So I think regardless of what Status is ultimately reported for a abruptly closed connection, it should at least be consistent between these two sources.

This may also be related to #7558 .

The text was updated successfully, but these errors were encountered:

JackOfMostTrades · 2021-04-27T16:49:32Z

Edit, updated ticket to reflect inconsistency between ServerStreamTracer and ServerInterceptor (rather than StreamObserver as previously mentioned).

voidzcy · 2021-04-27T18:38:02Z

On the server call layer, it does not preserve the error/reason of the call being terminated. The call is either finished normally (client side half-close) or cancelled (anything other than normal half-close is considered as cancellation). Server call canceling does not propagate the error/reason (ServerCall.Listener.onCancel()).

gRPC is based on HTTP/2, which is designed for client-server communication model (instead of peer-to-peer). It is inherently asymmetric on the two sides. From a server application's perspective, anything happens in the middle of a HTTP request that causes the request to be terminated is a cancellation, no matter if it is caused by timeout, explicit cancellation by the client, network errors, etc. Like HTTP servers, server applications usually do not care the reason of requests being cancelled.

ServerStreamTracer works in a lower level than the server call. It directly interacts with the stream and HTTP traffic. So it knows what's happening on the network and the status seen by ServerStreamTracer tells the real state of the network.

What really threw off our metrics and sent a couple of engineers down a debugging rabbit hole is that we couldn't correlate any request logs on the server (being logged from the ServerInterceptor) with the status metrics being emitted from the ServerStreamTracer.

Seen by consumers from different perspective (server application vs. traffic monitoring), the two layers present different levels of state information. Server calls and streams are of a one-on-one mapping, but the status they deliver is minimally correlated (as server calls throw away the status information and anything other than normal termination is a cancellation).

JackOfMostTrades · 2021-04-27T21:33:21Z

Thanks for the insights @voidzcy . I think I'm understanding the different sources of the different Status values now. If I'm understanding it correctly, I think I have a follow up question/request, which is that it would be nice inside ServerStreamTracer.streamClosed() to be able to discern if the Status actually from the server itself (i.e. a result of the server calling streamObserver.onError(Status.fromCode(Status.Code.UNAVAILABLE).asException())) in which case the status is actually assed verbatim through to ServerStreamTracer.streamClosed() versus a Status that's derived from a underlying transport issue (in the case of the example I posted, it's coming from NettyServerHandler.channelInactive()). We can obviously inspect the description on the status, but that feels brittle.

voidzcy · 2021-04-28T02:04:34Z

The Status that pops through ServerStreamTracer.streamClosed() intends to keep the original cause that leads to the stream being terminated. So it should be reasonable to carry the Status from server's outbound StreamObserver.onError().

Status is being used general-purposely for carrying end-of-life information. The Status code itself doesn't mean to indicate anything about whether something is happening locally or from the network. The description/cause of Status does play a significant role and should not be ignore. There is a convention that we follow and also suggested to users: never use a Status without description/cause.

voidzcy · 2021-05-16T23:26:27Z

Closing, no action is needed for the current work-as-intended behaviors. We would be improving some doc according to the discussion after #7558.

JackOfMostTrades changed the title ~~Inconsistent server-side error in ServerStreamTracer vs StreamObserver~~ Inconsistent server-side error in ServerStreamTracer vs ServerInterceptor Apr 27, 2021

voidzcy closed this as completed May 16, 2021

github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent server-side error in ServerStreamTracer vs ServerInterceptor #8120

Inconsistent server-side error in ServerStreamTracer vs ServerInterceptor #8120

JackOfMostTrades commented Apr 27, 2021 •

edited

Loading

JackOfMostTrades commented Apr 27, 2021

voidzcy commented Apr 27, 2021 •

edited

Loading

JackOfMostTrades commented Apr 27, 2021 •

edited

Loading

voidzcy commented Apr 28, 2021

voidzcy commented May 16, 2021

Inconsistent server-side error in ServerStreamTracer vs ServerInterceptor #8120

Inconsistent server-side error in ServerStreamTracer vs ServerInterceptor #8120

Comments

JackOfMostTrades commented Apr 27, 2021 • edited Loading

What version of gRPC-Java are you using?

What is your environment?

What did you expect to see?

What did you see instead?

Steps to reproduce the bug

Comments

JackOfMostTrades commented Apr 27, 2021

voidzcy commented Apr 27, 2021 • edited Loading

JackOfMostTrades commented Apr 27, 2021 • edited Loading

voidzcy commented Apr 28, 2021

voidzcy commented May 16, 2021

JackOfMostTrades commented Apr 27, 2021 •

edited

Loading

voidzcy commented Apr 27, 2021 •

edited

Loading

JackOfMostTrades commented Apr 27, 2021 •

edited

Loading