Skip to content

Refactor WAST Test Framework - Consolidate Three Implementations #127

@avrabe

Description

@avrabe

Problem Statement

The WRT project currently has three separate and divergent implementations of WAST test suite support:

  1. wrt-build-core/src/wast.rs (1,768 lines)

    • Comprehensive with directive categorization, statistics, module registry
    • Heavy debug output (eprintln! throughout instead of tracing framework)
    • Hardcoded test data (violates CLAUDE.md safety guidelines)
    • Complex state management
  2. wrt-build-core/src/wast_test_runner.rs (512 lines)

    • Simpler builder-pattern approach
    • Placeholder implementations and hardcoded function indices
    • Incomplete multi-module support
  3. wrt-component/tests/extended_wasm_test_suite.rs (682 lines)

    • Unit test focused, bypasses WAST framework entirely
    • Uses StacklessEngine directly (inconsistent with other two)
    • Good test case categorization

This creates:

  • 3x maintenance burden for essentially the same problem
  • Inconsistent module loading paths
  • Hardcoded fallbacks that mask real issues (violates CLAUDE.md: "NO FALLBACK LOGIC")
  • Mixed logging (eprintln vs tracing framework)
  • No unified error handling across implementations

Root Cause Analysis

  • No clear architectural design before implementation
  • Attempted incremental improvements without consolidation
  • Unclear which implementation is the "real" one (all integrated into cargo-wrt)
  • Separate test suite never unified with CLI test runner

Proposed Solution

Create a single, clean WAST test framework with clear separation of concerns:

WastTestFramework (single entry point)
├── File Discovery & Filtering
├── Directive Parsing (uses wast crate)
├── Test Execution & Categorization
└── JSON/HTML Reporting
       │
       ▼
WastExecutor (wraps StacklessEngine)
├── Module Loading (unified path)
├── Function Invocation
├── Value Conversion
└── Trap Detection
       │
       ▼
StacklessEngine (real WASM execution)

Cleanup & Implementation Strategy

Phase 1: Analysis & Planning (Current)

  • Identify all three implementations
  • Document their differences and integration points
  • Create GitHub issue with analysis

Phase 2: Cleanup & Consolidation

  • Delete wrt-build-core/src/wast.rs
  • Delete wrt-build-core/src/wast_test_runner.rs
  • Consolidate wrt-build-core/src/wast_execution.rs into core framework
  • Remove extended_wasm_test_suite.rs (move test cases to new framework)

Phase 3: New Implementation

  • Create wrt-build-core/src/wast_framework.rs (unified implementation)
  • Implement directive handling (all types)
  • Implement module loader with proper error handling
  • Add comprehensive test case categories
  • Integrate with tracing framework (not eprintln)
  • Integrate with diagnostics system
  • Add JSON/HTML reporting

Phase 4: Integration & Testing

  • Update cargo-wrt testsuite command to use new framework
  • Add CLI integration tests
  • Run against external/testsuite
  • Verify all test cases still work

Design Principles for New Implementation

  1. No Fallback Logic: If something should exist (exports, functions), fail loudly if missing
  2. No Hardcoded Data: All test data must come from actual WAST files
  3. Single Execution Path: One way to load and execute modules
  4. Proper Logging: Use tracing framework throughout
  5. Clean Error Handling: wrt_error::Error for all errors
  6. Test Case Preservation: Keep comprehensive test categories from extended suite
  7. Safety Focused: ASIL-B compliance validation maintained

Success Criteria

  • Single codebase for WAST testing (no duplication)
  • All original test cases pass
  • No hardcoded data or fallbacks
  • Uses tracing framework (no eprintln)
  • Proper error handling throughout
  • Can run: cargo-wrt testsuite --run-wast --wast-dir external/testsuite
  • Generates proper diagnostic output
  • Can run CI checks against standard test suite

Open Questions

  1. Scope: Support full WAST spec or just core WebAssembly?
  2. Multi-module: Are register/import tests critical or can they be deferred?
  3. Error reporting: Per-assertion results or just pass/fail per file?
  4. Performance: Should there be per-test and per-file timeouts?
  5. CI integration: Should failures block builds or be warnings?

Related Files

  • CLAUDE.md (safety guidelines)
  • cargo-wrt/src/main.rs (CLI integration point)
  • external/testsuite/ (test data)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions