Refactor WAST Test Framework - Consolidate Three Implementations

## Problem Statement

The WRT project currently has **three separate and divergent implementations** of WAST test suite support:

1. **wrt-build-core/src/wast.rs** (1,768 lines)
   - Comprehensive with directive categorization, statistics, module registry
   - Heavy debug output (eprintln! throughout instead of tracing framework)
   - Hardcoded test data (violates CLAUDE.md safety guidelines)
   - Complex state management

2. **wrt-build-core/src/wast_test_runner.rs** (512 lines)
   - Simpler builder-pattern approach
   - Placeholder implementations and hardcoded function indices
   - Incomplete multi-module support

3. **wrt-component/tests/extended_wasm_test_suite.rs** (682 lines)
   - Unit test focused, bypasses WAST framework entirely
   - Uses StacklessEngine directly (inconsistent with other two)
   - Good test case categorization

This creates:
- **3x maintenance burden** for essentially the same problem
- **Inconsistent module loading** paths
- **Hardcoded fallbacks** that mask real issues (violates CLAUDE.md: "NO FALLBACK LOGIC")
- **Mixed logging** (eprintln vs tracing framework)
- **No unified error handling** across implementations

## Root Cause Analysis

- No clear architectural design before implementation
- Attempted incremental improvements without consolidation
- Unclear which implementation is the "real" one (all integrated into cargo-wrt)
- Separate test suite never unified with CLI test runner

## Proposed Solution

Create a **single, clean WAST test framework** with clear separation of concerns:

```
WastTestFramework (single entry point)
├── File Discovery & Filtering
├── Directive Parsing (uses wast crate)
├── Test Execution & Categorization
└── JSON/HTML Reporting
       │
       ▼
WastExecutor (wraps StacklessEngine)
├── Module Loading (unified path)
├── Function Invocation
├── Value Conversion
└── Trap Detection
       │
       ▼
StacklessEngine (real WASM execution)
```

## Cleanup & Implementation Strategy

### Phase 1: Analysis & Planning (Current)
- [x] Identify all three implementations
- [x] Document their differences and integration points
- [x] Create GitHub issue with analysis

### Phase 2: Cleanup & Consolidation
- [ ] Delete wrt-build-core/src/wast.rs
- [ ] Delete wrt-build-core/src/wast_test_runner.rs  
- [ ] Consolidate wrt-build-core/src/wast_execution.rs into core framework
- [ ] Remove extended_wasm_test_suite.rs (move test cases to new framework)

### Phase 3: New Implementation
- [ ] Create wrt-build-core/src/wast_framework.rs (unified implementation)
- [ ] Implement directive handling (all types)
- [ ] Implement module loader with proper error handling
- [ ] Add comprehensive test case categories
- [ ] Integrate with tracing framework (not eprintln)
- [ ] Integrate with diagnostics system
- [ ] Add JSON/HTML reporting

### Phase 4: Integration & Testing
- [ ] Update cargo-wrt testsuite command to use new framework
- [ ] Add CLI integration tests
- [ ] Run against external/testsuite
- [ ] Verify all test cases still work

## Design Principles for New Implementation

1. **No Fallback Logic**: If something should exist (exports, functions), fail loudly if missing
2. **No Hardcoded Data**: All test data must come from actual WAST files
3. **Single Execution Path**: One way to load and execute modules
4. **Proper Logging**: Use tracing framework throughout
5. **Clean Error Handling**: wrt_error::Error for all errors
6. **Test Case Preservation**: Keep comprehensive test categories from extended suite
7. **Safety Focused**: ASIL-B compliance validation maintained

## Success Criteria

- [ ] Single codebase for WAST testing (no duplication)
- [ ] All original test cases pass
- [ ] No hardcoded data or fallbacks
- [ ] Uses tracing framework (no eprintln)
- [ ] Proper error handling throughout
- [ ] Can run: `cargo-wrt testsuite --run-wast --wast-dir external/testsuite`
- [ ] Generates proper diagnostic output
- [ ] Can run CI checks against standard test suite

## Open Questions

1. Scope: Support full WAST spec or just core WebAssembly?
2. Multi-module: Are register/import tests critical or can they be deferred?
3. Error reporting: Per-assertion results or just pass/fail per file?
4. Performance: Should there be per-test and per-file timeouts?
5. CI integration: Should failures block builds or be warnings?

## Related Files

- CLAUDE.md (safety guidelines)
- cargo-wrt/src/main.rs (CLI integration point)
- external/testsuite/ (test data)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor WAST Test Framework - Consolidate Three Implementations #127

Problem Statement

Root Cause Analysis

Proposed Solution

Cleanup & Implementation Strategy

Phase 1: Analysis & Planning (Current)

Phase 2: Cleanup & Consolidation

Phase 3: New Implementation

Phase 4: Integration & Testing

Design Principles for New Implementation

Success Criteria

Open Questions

Related Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor WAST Test Framework - Consolidate Three Implementations #127

Description

Problem Statement

Root Cause Analysis

Proposed Solution

Cleanup & Implementation Strategy

Phase 1: Analysis & Planning (Current)

Phase 2: Cleanup & Consolidation

Phase 3: New Implementation

Phase 4: Integration & Testing

Design Principles for New Implementation

Success Criteria

Open Questions

Related Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions