feat(ha): High Availability RelayMiner - Horizontally Scalable Architecture #1873

jorgecuesta · 2025-12-05T17:54:52Z

🚀 High Availability RelayMiner

A complete rewrite of the RelayMiner architecture enabling horizontal scalability for enterprise-grade deployments. Run multiple relayer and miner instances behind a load balancer with Redis-based coordination.

🎯 Why This Matters

The current RelayMiner is a single-process monolith that limits scalability. This HA architecture separates concerns and enables:

Horizontal scaling - Add more relayers/miners as traffic grows
High availability - No single point of failure
Zero-downtime deployments - Rolling updates without service interruption
Cost optimization - Scale relayers and miners independently

📊 Architecture Overview

flowchart TB
    subgraph Gateways["Gateway Layer"]
        G1[Gateway 1]
        G2[Gateway 2]
        G3[Gateway N]
    end

    subgraph LB["Load Balancer"]
        HAProxy[HAProxy / Nginx / K8s Ingress]
    end

    subgraph Relayers["Relayer Instances (Stateless)"]
        R1[HA Relayer 1]
        R2[HA Relayer 2]
        R3[HA Relayer N]
    end

    subgraph Redis["Redis Cluster"]
        RS[(Redis Streams)]
        RL[(Leader Election)]
        RD[(Deduplication)]
        RC[(Session Cache)]
    end

    subgraph Miners["Miner Instances (Stateful)"]
        M1[HA Miner 1<br/>Leader]
        M2[HA Miner 2<br/>Standby]
        M3[HA Miner N<br/>Standby]
    end

    subgraph Chain["Pocket Network"]
        PN[Full Node / RPC]
    end

    G1 & G2 & G3 --> HAProxy
    HAProxy --> R1 & R2 & R3
    R1 & R2 & R3 --> RS
    R1 & R2 & R3 -.-> PN
    RS --> M1 & M2 & M3
    M1 & M2 & M3 --> RL
    M1 & M2 & M3 --> RD
    M1 & M2 & M3 --> RC
    M1 -.->|Claims & Proofs| PN

🔄 Request Flow

sequenceDiagram
    participant G as Gateway
    participant R as HA Relayer
    participant B as Backend Service
    participant Redis as Redis Streams
    participant M as HA Miner (Leader)
    participant Chain as Pocket Network

    G->>R: RelayRequest (signed)
    R->>R: Validate session & signature
    R->>B: Forward request
    B-->>R: Response
    R->>R: Sign response + check mining difficulty
    R-->>G: RelayResponse (signed)
    
    alt Relay meets difficulty
        R->>Redis: Publish mined relay
        Redis->>M: Consume from stream
        M->>M: Deduplicate + Add to SMST
        M->>M: Update WAL
    end
    
    Note over M,Chain: At session end...
    M->>Chain: Submit Claim (leader only)
    Note over M,Chain: After claim window...
    M->>Chain: Submit Proof (leader only)

🏗️ Component Details

HA Relayer (Stateless)

HTTP/WebSocket/gRPC relay request handling
Optimistic or Eager validation modes
Ring signature verification
Mining difficulty checking
Response signing with supplier keys
Publishes mined relays to Redis Streams

HA Miner (Stateful with HA)

Consumes relays from Redis Streams
SMST (Sparse Merkle Sum Tree) per session
Write-Ahead Log (WAL) for crash recovery
Leader election for claim/proof submission
Relay deduplication across instances
Automatic claim and proof pipelines

⚙️ Configuration Examples

Relayer Configuration (`relayer-config.yaml`)

listen_addr: "0.0.0.0:8080"

redis:
  url: "redis://redis-cluster:6379"
  stream_prefix: "ha:relays"
  max_stream_len: 100000

pocket_node:
  query_node_rpc_url: "https://pocket-rpc.example.com"
  query_node_grpc_url: "pocket-grpc.example.com:443"

keys:
  keys_file: "/etc/pocket/supplier-keys.yaml"
  # Or use keyring:
  # keyring:
  #   backend: "file"
  #   dir: "/root/.pocket"

# Validation modes: "optimistic" (fast) or "eager" (safe for expensive backends)
default_validation_mode: "optimistic"
default_request_timeout_seconds: 30
default_max_body_size_bytes: 10485760  # 10MB

services:
  ethereum-mainnet:
    validation_mode: "optimistic"
    backends:
      json-rpc:
        url: "http://geth:8545"
        health_check:
          enabled: true
          endpoint: "/"
          interval_seconds: 10
      websocket:
        url: "ws://geth:8546"

  llm-inference:
    validation_mode: "eager"  # Validate first - LLM calls are expensive!
    request_timeout_seconds: 120
    backends:
      rest:
        url: "http://ollama:11434"
        headers:
          Authorization: "Bearer ${LLM_API_KEY}"

metrics:
  enabled: true
  addr: "0.0.0.0:9090"

health_check:
  enabled: true
  addr: "0.0.0.0:8081"

Miner Configuration (`miner-config.yaml`)

redis:
  url: "redis://redis-cluster:6379"
  stream_prefix: "ha:relays"
  consumer_group: "ha-miners"
  consumer_name: "${HOSTNAME}"  # Auto-generated from pod name
  block_timeout_ms: 5000
  claim_idle_timeout_ms: 60000

pocket_node:
  query_node_rpc_url: "https://pocket-rpc.example.com"
  query_node_grpc_url: "pocket-grpc.example.com:443"
  tx_node_rpc_url: "https://pocket-tx.example.com"  # For claim/proof submission

keys:
  keys_file: "/etc/pocket/supplier-keys.yaml"

suppliers:
  - operator_address: "pokt1supplier..."
    signing_key_name: "supplier1"
    services:
      - "ethereum-mainnet"
      - "llm-inference"

session_tree:
  storage_type: "badger"  # or "memory", "pebble"
  storage_path: "/data/session-trees"
  wal_enabled: true
  wal_path: "/data/wal"

# Relay deduplication across miner instances
deduplication_ttl_blocks: 10
batch_size: 100

metrics:
  enabled: true
  addr: "0.0.0.0:9091"

logging:
  level: "info"
  format: "json"

Supplier Keys File (`supplier-keys.yaml`)

suppliers:
  - address: "pokt1supplier1abc..."
    private_key_hex: "deadbeef..."
  - address: "pokt1supplier2def..."
    private_key_hex: "cafebabe..."

📈 Metrics

Relayer Metrics (`:9090/metrics`)

Metric	Type	Description
`ha_relayer_relays_received_total`	Counter	Total relay requests received
`ha_relayer_relays_served_total`	Counter	Successfully served relays
`ha_relayer_relays_rejected_total`	Counter	Rejected relays (by reason)
`ha_relayer_relays_published_total`	Counter	Mined relays published to Redis
`ha_relayer_relay_latency_seconds`	Histogram	End-to-end relay latency
`ha_relayer_backend_latency_seconds`	Histogram	Backend request latency
`ha_relayer_validation_latency_seconds`	Histogram	Request validation time
`ha_relayer_websocket_connections_active`	Gauge	Active WebSocket connections
`ha_relayer_grpc_streams_active`	Gauge	Active gRPC streams
`ha_relayer_current_block_height`	Gauge	Current block height

Miner Metrics (`:9091/metrics`)

Metric	Type	Description
`ha_miner_relays_consumed_total`	Counter	Relays consumed from Redis
`ha_miner_relays_processed_total`	Counter	Successfully processed relays
`ha_miner_relays_deduplicated_total`	Counter	Duplicate relays filtered
`ha_miner_sessions_by_state`	Gauge	Sessions by state (active/claiming/proving)
`ha_miner_session_relay_count`	Gauge	Relays per session
`ha_miner_session_compute_units`	Gauge	Compute units per session
`ha_miner_claims_submitted_total`	Counter	Claims submitted on-chain
`ha_miner_proofs_submitted_total`	Counter	Proofs submitted on-chain
`ha_miner_leader_status`	Gauge	Leader election status (1=leader)
`ha_miner_wal_size_entries`	Gauge	WAL entries per session

🧪 Testing

Tested with 150 parallel relay requests:

Transport	Success	Rate
HTTP	50/50	100%
WebSocket	50/50	100%
gRPC	46/50	92%*

*gRPC failures were test script timeouts, not relay failures

Running Tests

# HTTP load test
go run ./tools/scripts/ha_relayer_miner/http_relay_test/main.go \
  --n=100 --concurrency=10 \
  --app-key=<app_private_key_hex>

# WebSocket test
go run ./tools/scripts/ha_relayer_miner/ws_relay_test/main.go \
  --num-messages=50 \
  --app=<app_address> \
  --app-key-hex=<app_private_key_hex>

# gRPC test
go run ./tools/scripts/ha_relayer_miner/grpc_relay_test/main.go \
  --num-requests=50 \
  --app=<app_address> \
  --app-key-hex=<app_private_key_hex>

🚀 Deployment

Kubernetes (Recommended)

# Deploy Redis
kubectl apply -f localnet/kubernetes/redis.yaml

# Deploy HA Relayers (scale as needed)
kubectl apply -f localnet/kubernetes/ha-relayminer.yaml
kubectl scale deployment ha-relayer --replicas=3

# Deploy HA Miners (typically 2-3 for HA)
kubectl scale deployment ha-miner --replicas=2

Docker Compose

services:
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

  ha-relayer:
    image: poktroll:latest
    command: ["poktrolld", "ha-relayer", "--config", "/etc/config/relayer.yaml"]
    deploy:
      replicas: 3
    ports:
      - "8080:8080"

  ha-miner:
    image: poktroll:latest
    command: ["poktrolld", "ha-miner", "--config", "/etc/config/miner.yaml"]
    deploy:
      replicas: 2
    volumes:
      - miner-data:/data

📋 What's Included

pkg/ha/relayer/ - Stateless relay proxy (HTTP, WebSocket, gRPC)
pkg/ha/miner/ - Stateful miner with leader election
pkg/ha/transport/redis/ - Redis Streams transport layer
pkg/ha/cache/ - Session, supplier, and params caching
pkg/ha/keys/ - Multi-source key management
pkg/ha/observability/ - Prometheus metrics
pkg/ha/cmd/ - CLI commands
tools/scripts/ha_relayer_miner/ - Test scripts
localnet/kubernetes/ - K8s deployment manifests

🤝 Help Us Test!

We're looking for community members to help test this in various environments:

Load testing - How does it perform under heavy load?
Chaos testing - Kill relayers/miners randomly, verify recovery
Multi-service - Test with multiple services configured
Different backends - EVM, Cosmos, REST APIs, WebSocket services
Long-running - Run for extended periods, check for memory leaks

How to Contribute

Deploy in your test environment
Run the test scripts
Report issues or performance observations
Share your Grafana dashboards!

Note: This is a DRAFT PR for community feedback and testing. Not yet ready for production merge.

Implement a horizontally scalable RelayMiner architecture that supports multiple relayer and miner instances with Redis-based coordination. ## Relayer Features - HTTP, WebSocket, and gRPC relay request handling - Session validation with on-chain verification - Request signing with application ring signatures - Relay metering and metrics collection - Health check endpoints for load balancers - Configurable upstream proxy routing ## Miner Features - Redis Streams-based relay consumption from relayers - SMST (Sparse Merkle Sum Tree) management per session/supplier - Write-Ahead Log (WAL) for crash recovery - Leader election for claim/proof submission - Relay deduplication across miner instances - Session lifecycle management with grace periods - Automatic claim and proof pipeline ## Infrastructure - Redis transport layer with pub/sub and streams - Block height polling and subscription - Session and supplier caching - Shared parameters caching - Comprehensive Prometheus metrics - Structured logging with polylog ## Testing - HTTP, WebSocket, and gRPC relay test scripts - WebSocket echo server for testing - Unit tests for core components - Kubernetes deployment manifests for LocalNet Tested with 150 parallel relay requests: - HTTP: 50/50 SUCCESS (100%) - WebSocket: 50/50 SUCCESS (100%) - gRPC: 46/50 SUCCESS (92%)

github-actions bot added the consensus-breaking IMPORTANT! If the PR with this tag is merged, next release WILL HAVE TO BE an upgrade. label Dec 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ha): High Availability RelayMiner - Horizontally Scalable Architecture #1873

feat(ha): High Availability RelayMiner - Horizontally Scalable Architecture #1873

Uh oh!

jorgecuesta commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(ha): High Availability RelayMiner - Horizontally Scalable Architecture #1873

Are you sure you want to change the base?

feat(ha): High Availability RelayMiner - Horizontally Scalable Architecture #1873

Uh oh!

Conversation

jorgecuesta commented Dec 5, 2025

🚀 High Availability RelayMiner

🎯 Why This Matters

📊 Architecture Overview

🔄 Request Flow

🏗️ Component Details

HA Relayer (Stateless)

HA Miner (Stateful with HA)

⚙️ Configuration Examples

Relayer Configuration (relayer-config.yaml)

Miner Configuration (miner-config.yaml)

Supplier Keys File (supplier-keys.yaml)

📈 Metrics

Relayer Metrics (:9090/metrics)

Miner Metrics (:9091/metrics)

🧪 Testing

Running Tests

🚀 Deployment

Kubernetes (Recommended)

Docker Compose

📋 What's Included

🤝 Help Us Test!

How to Contribute

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Relayer Configuration (`relayer-config.yaml`)

Miner Configuration (`miner-config.yaml`)

Supplier Keys File (`supplier-keys.yaml`)

Relayer Metrics (`:9090/metrics`)

Miner Metrics (`:9091/metrics`)