Skip to content

Conversation

@jorgecuesta
Copy link
Contributor

Summary

  • Preserves original HTTP status codes from backend endpoints instead of transforming them based on JSON-RPC error codes
  • Allows clients to receive accurate HTTP status information (e.g., 429 Too Many Requests, 503 Service Unavailable) from backend services
  • Updates RequestQoSContext interface to include httpStatusCode parameter across all QoS implementations (EVM, Solana, Cosmos, NoOp)

Test plan

  • Unit tests pass (make test_unit)
  • Linting passes (make go_lint)
  • E2E tests pass for eth service (make e2e_test eth) - 85% success rate

Addresses memory exhaustion issues causing 12GB RAM OOM crashes:

1. Add 100MB request body size limits (supports Solana's ~75MB blocks)
2. Cap endpoint observations per request (uses MaxConcurrentRelaysPerRequest)
3. Reduce WebSocket observation channel buffer from 1000 to 50
4. Add hydrator graceful shutdown with context cancellation
5. Add 30s timeouts to hydrator operations
Additional fixes completing the OOM prevention release:

- 2.3 Session rollover: Add context for graceful shutdown of block height monitor
- 2.4 Observation goroutines: Add 30s timeout to prevent indefinite hanging
- 2.5 time.After leak: Replace with time.NewTimer + defer Stop()
- 2.6 WebSocket cleanup: Close client connection if endpoint connection fails
Per JSON-RPC 2.0 spec (https://www.jsonrpc.org/specification), responses
with null IDs are valid for error cases when the server couldn't parse
the request ID. This is documented in Section 5 - Response object:
"If there was an error in detecting the id in the Request object
(e.g. Parse error/Invalid Request), it MUST be Null."

Changes:
- Update validateResponseIDs to treat null ID responses as "wildcards"
  that can match unmatched request IDs
- Update createResponseObservations to skip null ID responses gracefully
  with debug logging instead of error logging
- Downgrade "could not find request for response ID" from error to warn
Preserves original HTTP status codes from backend endpoints instead of
transforming them based on JSON-RPC error codes. This allows clients to
receive accurate HTTP status information (e.g., 429 Too Many Requests,
503 Service Unavailable) from backend services.

Changes:
- Update RequestQoSContext interface to include httpStatusCode parameter
- Modify all QoS implementations (EVM, Solana, Cosmos, NoOp) to capture
  and propagate HTTP status codes
- Change protocol/shannon to pass through non-2xx responses instead of
  returning errors
- Make qos.HTTPResponse fields public for cross-package access
@jorgecuesta jorgecuesta force-pushed the fix/http-status-passthrough branch from c961b5d to e4bad1d Compare November 28, 2025 14:57
@jorgecuesta jorgecuesta requested a review from oten91 November 28, 2025 15:31
@jorgecuesta jorgecuesta self-assigned this Nov 28, 2025
@jorgecuesta jorgecuesta added the bug Something isn't working label Nov 28, 2025
@oten91
Copy link
Contributor

oten91 commented Nov 29, 2025

change were contained on fix/memory-optimization

@oten91 oten91 closed this Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants