feat: Add consistent batch request metrics tracking across all QoS services #485

jorgecuesta · 2025-11-28T14:40:08Z

Summary

Adds is_batch_request label to Cosmos and Solana metrics (EVM already had it)
Adds batch size histogram (*_batch_request_size) to all QoS services (EVM, Cosmos, Solana)
Adds GetRequestMethods() method to Solana interpreter for batch support
Updates Solana metrics to iterate through methods like EVM/Cosmos

Changes by Service

Service	`is_batch_request` Label	Batch Size Histogram	Per-Method Counting
EVM	Already had	Added	Already had
Cosmos	Added	Added	Already had
Solana	Added	Added	Added

New Metrics

path_evm_batch_request_size - Distribution of EVM batch request sizes
path_cosmos_batch_request_size - Distribution of Cosmos batch request sizes
path_solana_batch_request_size - Distribution of Solana batch request sizes

Test plan

Unit tests pass (make test_unit)
Linting passes (make go_lint)

Addresses memory exhaustion issues causing 12GB RAM OOM crashes: 1. Add 100MB request body size limits (supports Solana's ~75MB blocks) 2. Cap endpoint observations per request (uses MaxConcurrentRelaysPerRequest) 3. Reduce WebSocket observation channel buffer from 1000 to 50 4. Add hydrator graceful shutdown with context cancellation 5. Add 30s timeouts to hydrator operations

Additional fixes completing the OOM prevention release: - 2.3 Session rollover: Add context for graceful shutdown of block height monitor - 2.4 Observation goroutines: Add 30s timeout to prevent indefinite hanging - 2.5 time.After leak: Replace with time.NewTimer + defer Stop() - 2.6 WebSocket cleanup: Close client connection if endpoint connection fails

Per JSON-RPC 2.0 spec (https://www.jsonrpc.org/specification), responses with null IDs are valid for error cases when the server couldn't parse the request ID. This is documented in Section 5 - Response object: "If there was an error in detecting the id in the Request object (e.g. Parse error/Invalid Request), it MUST be Null." Changes: - Update validateResponseIDs to treat null ID responses as "wildcards" that can match unmatched request IDs - Update createResponseObservations to skip null ID responses gracefully with debug logging instead of error logging - Downgrade "could not find request for response ID" from error to warn

Preserves original HTTP status codes from backend endpoints instead of transforming them based on JSON-RPC error codes. This allows clients to receive accurate HTTP status information (e.g., 429 Too Many Requests, 503 Service Unavailable) from backend services. Changes: - Update RequestQoSContext interface to include httpStatusCode parameter - Modify all QoS implementations (EVM, Solana, Cosmos, NoOp) to capture and propagate HTTP status codes - Change protocol/shannon to pass through non-2xx responses instead of returning errors - Make qos.HTTPResponse fields public for cross-package access

…rvices Adds `is_batch_request` label and batch size histogram to all QoS services: Cosmos: - Add `is_batch_request` label to requestsTotal metric - Add `cosmos_batch_request_size` histogram Solana: - Add `GetRequestMethods()` method to interpreter for batch support - Add `is_batch_request` label to requestsTotal metric - Add `solana_batch_request_size` histogram - Update PublishMetrics to iterate through methods like EVM/Cosmos EVM: - Add `evm_batch_request_size` histogram (already had is_batch_request) This enables consistent batch request visibility across all services: - Filter batch vs single requests in Prometheus - Analyze batch size distribution patterns - Capacity planning based on batch request patterns

oten91 · 2025-11-29T01:13:24Z

change were contained on fix/memory-optimization

jorgecuesta added 4 commits November 28, 2025 11:10

jorgecuesta force-pushed the fix/batch-metrics-tracking branch from f8c7739 to 6c4664c Compare November 28, 2025 14:58

jorgecuesta added 2 commits November 28, 2025 11:04

chore: use 'gateway' instead of 'gtw' in comments and log messages

5b1ee23

jorgecuesta force-pushed the fix/batch-metrics-tracking branch from 6c4664c to 77df29e Compare November 28, 2025 15:05

jorgecuesta requested a review from oten91 November 28, 2025 15:31

jorgecuesta self-assigned this Nov 28, 2025

jorgecuesta added the bug Something isn't working label Nov 28, 2025

oten91 closed this Nov 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add consistent batch request metrics tracking across all QoS services #485

feat: Add consistent batch request metrics tracking across all QoS services #485

Uh oh!

jorgecuesta commented Nov 28, 2025

Uh oh!

oten91 commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Add consistent batch request metrics tracking across all QoS services #485

feat: Add consistent batch request metrics tracking across all QoS services #485

Uh oh!

Conversation

jorgecuesta commented Nov 28, 2025

Summary

Changes by Service

New Metrics

Test plan

Uh oh!

oten91 commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants