Skip to content

Conversation

@jserv
Copy link
Collaborator

@jserv jserv commented Oct 31, 2025

This implements 8-set × 2-way set-associative cache for both load and store operations, replacing the previous direct-mapped design. This provides better hit rates while maintaining code simplicity.

  • Load cache: 65% → 98% hit rate (2-entry → 8×2 set-associative)
  • Store cache: 83% → 99% hit rate (1-entry → 8×2 set-associative)
  • 3-bit parity hash for even distribution across 8 sets
  • Simple 1-bit LRU for replacement policy
  • 94% reduction in store cache misses

Memory cost: +512 bytes per hart (256B for load + 256B for store)


Summary by cubic

Upgraded MMU load/store caches to 8-set, 2-way set-associative. This raises hit rates (load 65→98%, store 83→99%) and cuts store misses by 94%.

  • New Features
    • 3-bit parity hash indexing and 1-bit LRU replacement in the MMU caches.
    • Invalidation resets all sets/ways; stats aggregate per set/way. SIGINT/SIGTERM (when MMU_CACHE_STATS is enabled) defers printing via an async-signal-safe flag and exits.
    • Recalibrated timer coefficient to 1.744e8; optional SEMU_TIMER_STATS prints calibration data.
    • Memory cost: +512 B per hart (256B load + 256B store).

Written for commit c7728b8. Summary will update automatically on new commits.

This implements 8-set × 2-way set-associative cache for both load and
store operations, replacing the previous direct-mapped design. This
provides better hit rates while maintaining code simplicity.
- Load cache: 65% → 98% hit rate (2-entry → 8×2 set-associative)
- Store cache: 83% → 99% hit rate (1-entry → 8×2 set-associative)
- 3-bit parity hash for even distribution across 8 sets
- Simple 1-bit LRU for replacement policy
- 94% reduction in store cache misses

Memory cost: +512 bytes per hart (256B for load + 256B for store)
@jserv
Copy link
Collaborator Author

jserv commented Oct 31, 2025

Cc. @yy214123

cubic-dev-ai[bot]

This comment was marked as resolved.

The signal handler for SIGINT/SIGTERM was calling fprintf(), which
is not async-signal-safe and can lead to deadlocks or data corruption.
- Use volatile sig_atomic_t flag instead of calling fprintf directly
- Signal handler now only sets the flag (async-signal-safe)
- Main loops check the flag and print statistics when safe
- Applies to both SMP (coroutine) and single-hart execution paths
@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Oct 31, 2025
@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Oct 31, 2025
cubic-dev-ai[bot]

This comment was marked as resolved.

cubic-dev-ai[bot]

This comment was marked as resolved.

After MMU cache optimization (8×2 set-associative, 99%+ hit rate),
CPU execution became faster, reducing timer calls by 18.9%.

Update coefficient from 2.15e8 to 1.744e8 based on measurements.

Add calibration statistics (enabled via SEMU_TIMER_STATS) to help
future recalibration if needed.
@jserv jserv merged commit ec83f4f into master Nov 1, 2025
10 checks passed
@jserv jserv deleted the mmu-caching branch November 1, 2025 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants