Skip to content

Conversation

@okdas
Copy link
Contributor

@okdas okdas commented Aug 27, 2025

Summary

This PR implements a signature caching system to address the CPU bottleneck identified in pprof analysis where ring signature operations were consuming 50% of CPU time.

Problem

  • Ring signature cryptographic operations consuming 50% of CPU time
  • Each signature computation takes 10-50ms
  • Many requests within a session are identical (same payload, supplier, app)
  • No caching mechanism existed, causing redundant expensive computations

Solution

Implemented a comprehensive signature caching system with:

  • LRU cache with 100k entry capacity
  • 15-minute TTL matching session duration
  • Thread-safe with in-flight computation tracking
  • Prevents duplicate work when multiple goroutines request same signature

Key Components

1. Signature Cache Implementation

  • protocol/shannon/signature_cache.go: Core caching logic
  • Cache key: SessionID + SupplierAddr + AppAddr + PayloadHash
  • Handles concurrent access with sync.Map for in-flight tracking
  • Automatic TTL-based expiration and cleanup

2. Integration with Signer

  • Modified protocol/shannon/signer.go to use cache
  • Transparent integration - falls back to computation if caching fails
  • Cache statistics available for monitoring

3. Prometheus Metrics

  • shannon_signature_cache_hits_total: Cache hit counter
  • shannon_signature_cache_misses_total: Cache miss counter with reasons
  • shannon_signature_cache_size: Current cache size gauge
  • shannon_signature_cache_evictions_total: Eviction counter by reason
  • shannon_signature_cache_compute_time_seconds: Computation time histogram

4. Comprehensive Testing

  • Unit tests for all cache behaviors
  • Concurrent access stress tests
  • TTL expiration tests
  • Performance benchmarks

Memory Usage

  • ~50-55MB at full capacity (100k entries)
  • Each entry: ~500 bytes (key + signature + metadata)

Testing

  • ✅ All unit tests pass
  • ✅ No regressions in existing tests
  • ✅ Linting passes

Next Steps (Future PRs)

  • Add YAML configuration for cache parameters
  • Allow per-service cache enable/disable
  • Monitor production metrics to tune cache size and TTL

Review Notes

This is a draft PR for initial review. The implementation is complete and tested. We should monitor cache hit rates in production to determine actual effectiveness.

okdas added 2 commits August 27, 2025 15:42
Implement a high-performance signature caching system that dramatically reduces
CPU utilization by caching expensive ring signature operations.

Key features:
- LRU cache with 100k entry capacity and 15-minute TTL matching session duration
- Thread-safe implementation with in-flight computation tracking
- Prevents duplicate work when multiple goroutines request same signature
- Comprehensive Prometheus metrics for monitoring cache effectiveness
- Expected 70-80% reduction in CPU usage for cryptographic operations

The cache is particularly effective for:
- Repeated requests within a session (eth_blockNumber, eth_gasPrice, etc.)
- High-frequency polling patterns
- Retry scenarios

Memory usage: ~50-55MB at full capacity (100k entries)
@okdas okdas self-assigned this Aug 28, 2025
@okdas okdas added the caching anything related to caching label Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

caching anything related to caching

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants