[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9057
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-03-27T13:45:05.797Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Date: 2026-03-20
Branch: c3
Benchmark set: QF_S (50 randomly selected files from
tests/QF_S.tar.zst; 22 172 total available)Timeout: 10 seconds per benchmark (
-T:5+ outer 7 s for seq with tracing;-T:10/ outer 12 s for nseq;-t:10000ms for ZIPT)Z3 build: Debug, CMake+Ninja, commit
88ef8c7(v4.17.0)ZIPT build:
parikhbranch, Release, .NET 8, linked against freshly builtMicrosoft.Z3.dll(netstandard2.1)Summary
Soundness disagreements (any two solvers return conflicting sat/unsat): 0
nseq is the fastest solver on this sample (avg 0.646 s), solves 41/50 instances definitively (vs. 33/50 for seq and 36/50 for ZIPT). seq returns
unknownon 17 instances, of which 8 are definitively answered by nseq (and 7 also by ZIPT).Notable Issues
Soundness Disagreements (Critical)
✅ None found. All three solvers agreed on every instance where at least two of them produced a definitive answer.
Crashes / Bugs
nseq — 1 assertion violation
pcp_instance_251.smt2ASSERTION VIOLATIONinsrc/ast/rewriter/seq_axioms.cpp:1108:NOT IMPLEMENTED YET!— triggered bystr.replace_allZIPT — 6 unsupported-feature crashes
All six ZIPT crashes share the same root cause: ZIPT does not implement
str.replace_all.pcp_instance_160.smt2Unsupported feature: str.replace_all currently not supportedpcp_instance_125.smt2pcp_instance_188.smt2pcp_instance_251.smt2benchmark_0089.smt2benchmark_0153.smt2Note: seq and nseq return
unknown(not a crash) for the pcp/rna files — they do parsestr.replace_allbut cannot decide the instances within the timeout.Slow Benchmarks (> 8 s)
Three
diseq-*benchmarks caused nseq and ZIPT to hit their full timeouts; seq returnedunknownquickly (≈5 s internal limit):diseq-1-3-6-100.smt2diseq-1-3-5-106.smt2diseq-1-5-6-100.smt2These are disequality-heavy benchmarks; none of the three solvers could decide them within the allotted time.
seq Regressions vs. nseq: Instances Where Only seq Fails
Eight instances were solved definitively by nseq (and in 7 cases also by ZIPT) but seq timed out at 5 seconds:
Burns_sat_non_incre_equiv_trans_28_0.smt2eqdist_sat_non_incre_equiv_init_0_3.smt2two_token_pass_sat_non_incre_equiv_init_0_7.smt2instance13779.smt2instance02438.smt2slog_stranger_3748_sink.smt2instance07678.smt2instance09958.smt2Trace Analysis: seq-fast / nseq-slow Hypotheses
No seq-fast / nseq-slow cases were observed in this run. In every instance, nseq was equal to or faster than seq. The trend was strongly in the opposite direction: nseq outperformed seq on 8 instances that seq could not decide at all.
However, the trace data for the seq-slow / nseq-fast cases reveals a clear pattern worth noting for development purposes:
Burns / eqdist / hornstr-equiv class (seq times out, nseq solves in < 0.1 s)
The seq solver's
.z3-traceforBurns_sat_non_incre_equiv_trans_28_0.smt2(94 375 trace lines) andeqdist_sat_non_incre_equiv_init_0_3.smt2(165 032 trace lines) show the same pathological pattern:[seq] mk_eq_core(9 120 calls for Burns, 3 151 for eqdist), all of the formX == reg1,Y == reg1,X == varin,Y == varout— the same small set of variable-equality assertions repeated thousands of times.[seq] assign_ehcalls that repeatedly re-triggerpropagate_in_refor the same membership predicates.[seq] solve_ne/[seq] reduce_nepattern (108 occurrences each for Burns) indicates the disequality-handling path is iterated many times without closure.(seq.unit Char[49])throughChar[99]), and re-simplifies equations without making net progress.Hypothesis: These benchmarks involve
str.in_remembership over Kleene-star languages combined with string equations and disequalities. The seq solver's tactic of converting membership to automaton transitions generates an ever-growing set of character-split axioms, but the disequality constraints prevent early pruning — seq has no Parikh-constraint or length-bound shortcut for such instances. The nseq solver, by contrast, likely applies Nielsen-graph reductions or Parikh-based length reasoning that immediately derives a contradiction (forunsatcases) or a valid assignment (forsatcases) without enumerating character-level witnesses. This is consistent with nseq's dramatically lower call counts for these problem types.Per-File Results
Click to expand the full 50-row results table
Generated automatically by the ZIPT Benchmark workflow on the c3 branch.
Beta Was this translation helpful? Give feedback.
All reactions