Skip to content

Conversation

@jorgecuesta
Copy link
Contributor

🚀 High Availability RelayMiner

A complete rewrite of the RelayMiner architecture enabling horizontal scalability for enterprise-grade deployments. Run multiple relayer and miner instances behind a load balancer with Redis-based coordination.

🎯 Why This Matters

The current RelayMiner is a single-process monolith that limits scalability. This HA architecture separates concerns and enables:

  • Horizontal scaling - Add more relayers/miners as traffic grows
  • High availability - No single point of failure
  • Zero-downtime deployments - Rolling updates without service interruption
  • Cost optimization - Scale relayers and miners independently

📊 Architecture Overview

flowchart TB
    subgraph Gateways["Gateway Layer"]
        G1[Gateway 1]
        G2[Gateway 2]
        G3[Gateway N]
    end

    subgraph LB["Load Balancer"]
        HAProxy[HAProxy / Nginx / K8s Ingress]
    end

    subgraph Relayers["Relayer Instances (Stateless)"]
        R1[HA Relayer 1]
        R2[HA Relayer 2]
        R3[HA Relayer N]
    end

    subgraph Redis["Redis Cluster"]
        RS[(Redis Streams)]
        RL[(Leader Election)]
        RD[(Deduplication)]
        RC[(Session Cache)]
    end

    subgraph Miners["Miner Instances (Stateful)"]
        M1[HA Miner 1<br/>Leader]
        M2[HA Miner 2<br/>Standby]
        M3[HA Miner N<br/>Standby]
    end

    subgraph Chain["Pocket Network"]
        PN[Full Node / RPC]
    end

    G1 & G2 & G3 --> HAProxy
    HAProxy --> R1 & R2 & R3
    R1 & R2 & R3 --> RS
    R1 & R2 & R3 -.-> PN
    RS --> M1 & M2 & M3
    M1 & M2 & M3 --> RL
    M1 & M2 & M3 --> RD
    M1 & M2 & M3 --> RC
    M1 -.->|Claims & Proofs| PN
Loading

🔄 Request Flow

sequenceDiagram
    participant G as Gateway
    participant R as HA Relayer
    participant B as Backend Service
    participant Redis as Redis Streams
    participant M as HA Miner (Leader)
    participant Chain as Pocket Network

    G->>R: RelayRequest (signed)
    R->>R: Validate session & signature
    R->>B: Forward request
    B-->>R: Response
    R->>R: Sign response + check mining difficulty
    R-->>G: RelayResponse (signed)
    
    alt Relay meets difficulty
        R->>Redis: Publish mined relay
        Redis->>M: Consume from stream
        M->>M: Deduplicate + Add to SMST
        M->>M: Update WAL
    end
    
    Note over M,Chain: At session end...
    M->>Chain: Submit Claim (leader only)
    Note over M,Chain: After claim window...
    M->>Chain: Submit Proof (leader only)
Loading

🏗️ Component Details

HA Relayer (Stateless)

  • HTTP/WebSocket/gRPC relay request handling
  • Optimistic or Eager validation modes
  • Ring signature verification
  • Mining difficulty checking
  • Response signing with supplier keys
  • Publishes mined relays to Redis Streams

HA Miner (Stateful with HA)

  • Consumes relays from Redis Streams
  • SMST (Sparse Merkle Sum Tree) per session
  • Write-Ahead Log (WAL) for crash recovery
  • Leader election for claim/proof submission
  • Relay deduplication across instances
  • Automatic claim and proof pipelines

⚙️ Configuration Examples

Relayer Configuration (relayer-config.yaml)

listen_addr: "0.0.0.0:8080"

redis:
  url: "redis://redis-cluster:6379"
  stream_prefix: "ha:relays"
  max_stream_len: 100000

pocket_node:
  query_node_rpc_url: "https://pocket-rpc.example.com"
  query_node_grpc_url: "pocket-grpc.example.com:443"

keys:
  keys_file: "/etc/pocket/supplier-keys.yaml"
  # Or use keyring:
  # keyring:
  #   backend: "file"
  #   dir: "/root/.pocket"

# Validation modes: "optimistic" (fast) or "eager" (safe for expensive backends)
default_validation_mode: "optimistic"
default_request_timeout_seconds: 30
default_max_body_size_bytes: 10485760  # 10MB

services:
  ethereum-mainnet:
    validation_mode: "optimistic"
    backends:
      json-rpc:
        url: "http://geth:8545"
        health_check:
          enabled: true
          endpoint: "/"
          interval_seconds: 10
      websocket:
        url: "ws://geth:8546"

  llm-inference:
    validation_mode: "eager"  # Validate first - LLM calls are expensive!
    request_timeout_seconds: 120
    backends:
      rest:
        url: "http://ollama:11434"
        headers:
          Authorization: "Bearer ${LLM_API_KEY}"

metrics:
  enabled: true
  addr: "0.0.0.0:9090"

health_check:
  enabled: true
  addr: "0.0.0.0:8081"

Miner Configuration (miner-config.yaml)

redis:
  url: "redis://redis-cluster:6379"
  stream_prefix: "ha:relays"
  consumer_group: "ha-miners"
  consumer_name: "${HOSTNAME}"  # Auto-generated from pod name
  block_timeout_ms: 5000
  claim_idle_timeout_ms: 60000

pocket_node:
  query_node_rpc_url: "https://pocket-rpc.example.com"
  query_node_grpc_url: "pocket-grpc.example.com:443"
  tx_node_rpc_url: "https://pocket-tx.example.com"  # For claim/proof submission

keys:
  keys_file: "/etc/pocket/supplier-keys.yaml"

suppliers:
  - operator_address: "pokt1supplier..."
    signing_key_name: "supplier1"
    services:
      - "ethereum-mainnet"
      - "llm-inference"

session_tree:
  storage_type: "badger"  # or "memory", "pebble"
  storage_path: "/data/session-trees"
  wal_enabled: true
  wal_path: "/data/wal"

# Relay deduplication across miner instances
deduplication_ttl_blocks: 10
batch_size: 100

metrics:
  enabled: true
  addr: "0.0.0.0:9091"

logging:
  level: "info"
  format: "json"

Supplier Keys File (supplier-keys.yaml)

suppliers:
  - address: "pokt1supplier1abc..."
    private_key_hex: "deadbeef..."
  - address: "pokt1supplier2def..."
    private_key_hex: "cafebabe..."

📈 Metrics

Relayer Metrics (:9090/metrics)

Metric Type Description
ha_relayer_relays_received_total Counter Total relay requests received
ha_relayer_relays_served_total Counter Successfully served relays
ha_relayer_relays_rejected_total Counter Rejected relays (by reason)
ha_relayer_relays_published_total Counter Mined relays published to Redis
ha_relayer_relay_latency_seconds Histogram End-to-end relay latency
ha_relayer_backend_latency_seconds Histogram Backend request latency
ha_relayer_validation_latency_seconds Histogram Request validation time
ha_relayer_websocket_connections_active Gauge Active WebSocket connections
ha_relayer_grpc_streams_active Gauge Active gRPC streams
ha_relayer_current_block_height Gauge Current block height

Miner Metrics (:9091/metrics)

Metric Type Description
ha_miner_relays_consumed_total Counter Relays consumed from Redis
ha_miner_relays_processed_total Counter Successfully processed relays
ha_miner_relays_deduplicated_total Counter Duplicate relays filtered
ha_miner_sessions_by_state Gauge Sessions by state (active/claiming/proving)
ha_miner_session_relay_count Gauge Relays per session
ha_miner_session_compute_units Gauge Compute units per session
ha_miner_claims_submitted_total Counter Claims submitted on-chain
ha_miner_proofs_submitted_total Counter Proofs submitted on-chain
ha_miner_leader_status Gauge Leader election status (1=leader)
ha_miner_wal_size_entries Gauge WAL entries per session

🧪 Testing

Tested with 150 parallel relay requests:

Transport Success Rate
HTTP 50/50 100%
WebSocket 50/50 100%
gRPC 46/50 92%*

*gRPC failures were test script timeouts, not relay failures

Running Tests

# HTTP load test
go run ./tools/scripts/ha_relayer_miner/http_relay_test/main.go \
  --n=100 --concurrency=10 \
  --app-key=<app_private_key_hex>

# WebSocket test
go run ./tools/scripts/ha_relayer_miner/ws_relay_test/main.go \
  --num-messages=50 \
  --app=<app_address> \
  --app-key-hex=<app_private_key_hex>

# gRPC test
go run ./tools/scripts/ha_relayer_miner/grpc_relay_test/main.go \
  --num-requests=50 \
  --app=<app_address> \
  --app-key-hex=<app_private_key_hex>

🚀 Deployment

Kubernetes (Recommended)

# Deploy Redis
kubectl apply -f localnet/kubernetes/redis.yaml

# Deploy HA Relayers (scale as needed)
kubectl apply -f localnet/kubernetes/ha-relayminer.yaml
kubectl scale deployment ha-relayer --replicas=3

# Deploy HA Miners (typically 2-3 for HA)
kubectl scale deployment ha-miner --replicas=2

Docker Compose

services:
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

  ha-relayer:
    image: poktroll:latest
    command: ["poktrolld", "ha-relayer", "--config", "/etc/config/relayer.yaml"]
    deploy:
      replicas: 3
    ports:
      - "8080:8080"

  ha-miner:
    image: poktroll:latest
    command: ["poktrolld", "ha-miner", "--config", "/etc/config/miner.yaml"]
    deploy:
      replicas: 2
    volumes:
      - miner-data:/data

📋 What's Included

  • pkg/ha/relayer/ - Stateless relay proxy (HTTP, WebSocket, gRPC)
  • pkg/ha/miner/ - Stateful miner with leader election
  • pkg/ha/transport/redis/ - Redis Streams transport layer
  • pkg/ha/cache/ - Session, supplier, and params caching
  • pkg/ha/keys/ - Multi-source key management
  • pkg/ha/observability/ - Prometheus metrics
  • pkg/ha/cmd/ - CLI commands
  • tools/scripts/ha_relayer_miner/ - Test scripts
  • localnet/kubernetes/ - K8s deployment manifests

🤝 Help Us Test!

We're looking for community members to help test this in various environments:

  1. Load testing - How does it perform under heavy load?
  2. Chaos testing - Kill relayers/miners randomly, verify recovery
  3. Multi-service - Test with multiple services configured
  4. Different backends - EVM, Cosmos, REST APIs, WebSocket services
  5. Long-running - Run for extended periods, check for memory leaks

How to Contribute

  1. Deploy in your test environment
  2. Run the test scripts
  3. Report issues or performance observations
  4. Share your Grafana dashboards!

Note: This is a DRAFT PR for community feedback and testing. Not yet ready for production merge.

Implement a horizontally scalable RelayMiner architecture that supports
multiple relayer and miner instances with Redis-based coordination.

## Relayer Features
- HTTP, WebSocket, and gRPC relay request handling
- Session validation with on-chain verification
- Request signing with application ring signatures
- Relay metering and metrics collection
- Health check endpoints for load balancers
- Configurable upstream proxy routing

## Miner Features
- Redis Streams-based relay consumption from relayers
- SMST (Sparse Merkle Sum Tree) management per session/supplier
- Write-Ahead Log (WAL) for crash recovery
- Leader election for claim/proof submission
- Relay deduplication across miner instances
- Session lifecycle management with grace periods
- Automatic claim and proof pipeline

## Infrastructure
- Redis transport layer with pub/sub and streams
- Block height polling and subscription
- Session and supplier caching
- Shared parameters caching
- Comprehensive Prometheus metrics
- Structured logging with polylog

## Testing
- HTTP, WebSocket, and gRPC relay test scripts
- WebSocket echo server for testing
- Unit tests for core components
- Kubernetes deployment manifests for LocalNet

Tested with 150 parallel relay requests:
- HTTP: 50/50 SUCCESS (100%)
- WebSocket: 50/50 SUCCESS (100%)
- gRPC: 46/50 SUCCESS (92%)
@github-actions github-actions bot added the consensus-breaking IMPORTANT! If the PR with this tag is merged, next release WILL HAVE TO BE an upgrade. label Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

consensus-breaking IMPORTANT! If the PR with this tag is merged, next release WILL HAVE TO BE an upgrade.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants