You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Date: 2026-03-29 Branch: c3 Benchmark set: QF_S (200 randomly selected files from tests/QF_S.tar.zst, drawn from 22,172 total) Timeouts: seq -T:5 (5 s wall clock cap inside Z3, 7 s outer); nseq -T:5; ZIPT -t:5000 (5 s) Build type: Debug (assertions active, tracing enabled)
Summary
Metric
seq solver
nseq solver
ZIPT solver
sat
79
94
105
unsat
61
65
64
unknown
57
32
5
timeout
0
0
0
bug/crash
3 *
9 **
26 ***
Total time (s)
261.899
54.335
102.742
Avg time/benchmark (s)
1.309
0.272
0.514
* seq "bug" entries (instance05732, query5169, instance08172) were likely spurious — re-running them manually returns correct sat/unsat. Probably caused by trace output interleaving during the sequential benchmark run.
** nseq "bug" entries include genuine assertion violations (ASSERTION VIOLATION at src/smt/seq/seq_nielsen.cpp:1505, ext) on not-contains-* and diseq-* families. instance05732, query5169, instance08172 appear spurious (same issue as seq above).
*** ZIPT crashes are primarily "Unsupported feature" errors (e.g. str.replace_all) on the benchmark_XXXX, pcp_instance_*, and wildcard-matching-* families.
Soundness disagreements (any two solvers return conflicting sat/unsat): 1
Notable Issues
🚨 Soundness Disagreement (Critical)
File
seq
nseq
ZIPT
Declared :status
noodles-unsat-4.smt2
unsat
unsat
sat
unsat
Both seq and nseq correctly return unsat for the noodles word-equation instance (derived from "Word Equations in Synergy with Regular Constraints"). ZIPT incorrectly returns sat. This is a soundness bug in ZIPT on the parikh branch.
The instance:
(assert (= (str.++ z y x) (str.++ x x z)))
(assert (str.in_re x (re.+ (str.to_re "1111"))))
(assert (str.in_re y (str.to_re "11")))
(assert (str.in_re z (re.+ (str.to_re "111"))))
⚠️ nseq Assertion Violations
The following files trigger ASSERTION VIOLATION at src/smt/seq/seq_nielsen.cpp:1505 (ext) in nseq. ZIPT solves them correctly as sat:
not-contains-1-3-5-135.smt2
not-contains-1-3-6-113.smt2
not-contains-1-4-6-118.smt2
not-contains-1-5-5-100.smt2
not-contains-1-5-5-102.smt2
diseq-1-5-5-110.smt2
These expose a bug in nseq's Nielsen graph extension logic (ext constraint handling at seq_nielsen.cpp:1505).
⚠️ ZIPT Unsupported Features / Crashes (26 files)
benchmark_XXXX.smt2 (RNA family): Use str.replace_all — not yet implemented in ZIPT.
pcp_instance_*.smt2 / unsolved_pcp_instance_*.smt2: Post Correspondence Problem instances expose unsupported constructs.
slog_stranger_2884_sink.smt2: ZIPT crashes on an instance both seq and nseq solve as sat.
🐢 Slow Benchmarks (> 8 s for any solver)
File
seq (s)
nseq (s)
ZIPT (s)
sub-matching-sat-1.smt2
5.014 (timeout)
5.008 (timeout)
12.011 (timeout)
wildcard-matching-regex-62.smt2
5.008 (timeout)
0.417
12.010 (crash)
diseq-1-5-5-110.smt2
5.008 (timeout)
0.097 (crash)
12.016 (timeout)
Trace Analysis: seq-fast / nseq-slow Hypotheses
No seq-fast / nseq-slow cases were observed. The pattern is uniformly reversed: nseq is consistently faster than seq, often by 10–100×. seq timed out on 57 benchmarks that nseq solved quickly. The 4.8× speedup in total time (261.9 s vs. 54.3 s) and 25 additional solved benchmarks confirm nseq is the more effective engine here.
Why is seq slower than nseq?
Analysis of seq traces for representative slow files:
instance00662.smt2 (seq=timeout, nseq=sat in 52 ms) — The trace records 3,462 assign_eh calls, 1,840 simplify_eq invocations, and 1,149 add_axiom steps before timing out. The benchmark contains str.in_re with re.allchar (wildcard), which forces seq's automata-based regex propagation engine to enumerate a large product automaton over character sequences. Each assign_eh step generates new seq.unit Char[N] axioms for every character class boundary, causing combinatorial explosion. nseq uses the Nielsen graph with length arithmetic, assigning |X| = k and picking a satisfying string without character enumeration.
instance00825.smt2 (seq=3.266 s, nseq=52 ms) — The trace records 21,444 assign_eh calls and 303 reduce_ne/solve_ne steps. The benchmark uses re.comp (complement) and re.opt operators. seq inlines regex complement via automata complement, which exponentially blows up the NFA state space, then drives SAT search over NFA transitions. The 5.9 MB trace confirms exponential character-level unrolling. nseq avoids this via the Parikh image: regex membership becomes a linear arithmetic formula over character counts and length, which the arithmetic solver handles efficiently without automaton blowup.
General pattern: seq's regex engine is automaton-based and character-exact, leading to exponential blowup on complex patterns (wildcards, complements, loops). nseq's Nielsen-graph / Parikh-constraint approach reduces string problems to word-equation + integer-arithmetic problems that the SMT solver handles far more efficiently.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Date: 2026-03-29
Branch: c3
Benchmark set: QF_S (200 randomly selected files from
tests/QF_S.tar.zst, drawn from 22,172 total)Timeouts: seq
-T:5(5 s wall clock cap inside Z3, 7 s outer); nseq-T:5; ZIPT-t:5000(5 s)Build type: Debug (assertions active, tracing enabled)
Summary
* seq "bug" entries (instance05732, query5169, instance08172) were likely spurious — re-running them manually returns correct sat/unsat. Probably caused by trace output interleaving during the sequential benchmark run.
** nseq "bug" entries include genuine assertion violations (ASSERTION VIOLATION at
src/smt/seq/seq_nielsen.cpp:1505,ext) onnot-contains-*anddiseq-*families. instance05732, query5169, instance08172 appear spurious (same issue as seq above).*** ZIPT crashes are primarily "Unsupported feature" errors (e.g.
str.replace_all) on thebenchmark_XXXX,pcp_instance_*, andwildcard-matching-*families.Soundness disagreements (any two solvers return conflicting sat/unsat): 1
Notable Issues
🚨 Soundness Disagreement (Critical)
noodles-unsat-4.smt2Both seq and nseq correctly return
unsatfor the noodles word-equation instance (derived from "Word Equations in Synergy with Regular Constraints"). ZIPT incorrectly returnssat. This is a soundness bug in ZIPT on theparikhbranch.The instance:
The following files trigger
ASSERTION VIOLATIONatsrc/smt/seq/seq_nielsen.cpp:1505(ext) in nseq. ZIPT solves them correctly assat:not-contains-1-3-5-135.smt2not-contains-1-3-6-113.smt2not-contains-1-4-6-118.smt2not-contains-1-5-5-100.smt2not-contains-1-5-5-102.smt2diseq-1-5-5-110.smt2These expose a bug in nseq's Nielsen graph extension logic (
extconstraint handling at seq_nielsen.cpp:1505).benchmark_XXXX.smt2(RNA family): Usestr.replace_all— not yet implemented in ZIPT.pcp_instance_*.smt2/unsolved_pcp_instance_*.smt2: Post Correspondence Problem instances expose unsupported constructs.wildcard-matching-regex-*.smt2: Complex regex queries crash ZIPT.slog_stranger_2884_sink.smt2: ZIPT crashes on an instance both seq and nseq solve assat.🐢 Slow Benchmarks (> 8 s for any solver)
sub-matching-sat-1.smt2wildcard-matching-regex-62.smt2diseq-1-5-5-110.smt2Trace Analysis: seq-fast / nseq-slow Hypotheses
No seq-fast / nseq-slow cases were observed. The pattern is uniformly reversed: nseq is consistently faster than seq, often by 10–100×. seq timed out on 57 benchmarks that nseq solved quickly. The 4.8× speedup in total time (261.9 s vs. 54.3 s) and 25 additional solved benchmarks confirm nseq is the more effective engine here.
Why is seq slower than nseq?
Analysis of seq traces for representative slow files:
instance00662.smt2(seq=timeout, nseq=sat in 52 ms) — The trace records 3,462assign_ehcalls, 1,840simplify_eqinvocations, and 1,149add_axiomsteps before timing out. The benchmark containsstr.in_rewithre.allchar(wildcard), which forces seq's automata-based regex propagation engine to enumerate a large product automaton over character sequences. Eachassign_ehstep generates newseq.unit Char[N]axioms for every character class boundary, causing combinatorial explosion. nseq uses the Nielsen graph with length arithmetic, assigning|X| = kand picking a satisfying string without character enumeration.instance00825.smt2(seq=3.266 s, nseq=52 ms) — The trace records 21,444assign_ehcalls and 303reduce_ne/solve_nesteps. The benchmark usesre.comp(complement) andre.optoperators. seq inlines regex complement via automata complement, which exponentially blows up the NFA state space, then drives SAT search over NFA transitions. The 5.9 MB trace confirms exponential character-level unrolling. nseq avoids this via the Parikh image: regex membership becomes a linear arithmetic formula over character counts and length, which the arithmetic solver handles efficiently without automaton blowup.General pattern: seq's regex engine is automaton-based and character-exact, leading to exponential blowup on complex patterns (wildcards, complements, loops). nseq's Nielsen-graph / Parikh-constraint approach reduces string problems to word-equation + integer-arithmetic problems that the SMT solver handles far more efficiently.
Per-File Results
Click to expand full per-file results (200 rows)
instance04470.smt2benchmark_0109.smt2instance14435.smt2instance00727.smt2instance15378.smt2benchmark_0184.smt2instance02950.smt2sub-matching-sat-1.smt2instance00825.smt2instance09773.smt2instance02837.smt2slog_stranger_4392_sink.smt2pcp_instance_332.smt2slog_stranger_2538_sink.smt2instance09503.smt2instance10044.smt2instance03431.smt2instance09357.smt2instance15076.smt2instance11331.smt2instance10161.smt2unsolved_pcp_instance_302.smt2benchmark_0489.smt2instance00662.smt2instance00542.smt2not-contains-1-3-5-135.smt2instance10684.smt2pcp_instance_73.smt2slog_stranger_1608_sink.smt2instance00221.smt2instance11133.smt2slog_stranger_2168_sink.smt2instance03672.smt2instance09386.smt2instance13969.smt2instance12248.smt2instance13926.smt2instance11924.smt2slog_stranger_2357_sink.smt2instance08134.smt2instance04094.smt2instance01961.smt2instance01066.smt2instance14210.smt2instance12836.smt2wildcard-matching-regex-126.smt2instance05732.smt2benchmark_0360.smt2instance07158.smt2not-contains-1-3-6-113.smt2instance09003.smt2slog_stranger_153_sink.smt2wildcard-matching-regex-62.smt2query5169.smt2instance14033.smt2instance11383.smt2instance02278.smt2unsolved_pcp_instance_277.smt2instance09433.smt2instance15986.smt2instance15201.smt2slog_stranger_1438_sink.smt2dining-cryptographers_sat_non_incre_equiv_init_0_14.smt2instance12661.smt2instance09816.smt2instance11823.smt2slog_stranger_2485_sink.smt2instance12542.smt2instance01596.smt2not-contains-1-4-6-118.smt2eqdist_lstar_non_incre_equiv_init_0_24.smt2instance01004.smt2instance12554.smt2instance01671.smt2noodles-unsat-4.smt2slog_stranger_3439_sink.smt2instance08307.smt2slog_stranger_5228_sink.smt203_track_148.smt2instance08977.smt2instance15072.smt2instance13680.smt2instance14785.smt2slog_stranger_2884_sink.smt2instance10179.smt2slog_stranger_2072_sink.smt2instance08349.smt2instance14471.smt204_track_79.smt2instance01898.smt2instance14530.smt2slog_stranger_820_sink.smt2instance05242.smt2instance03690.smt2instance03022.smt2slog_stranger_5226_sink.smt2slog_stranger_831_sink.smt2instance10034.smt2instance10399.smt2not-contains-1-5-5-100.smt2slog_stranger_2404_sink.smt2instance12665.smt2benchmark_0363.smt2not-contains-1-5-5-102.smt2instance08616.smt2instance13320.smt2instance05020.smt2pcp_instance_266.smt2instance13510.smt2instance03791.smt2instance03458.smt2instance15001.smt2instance01534.smt2instance10051.smt2instance01563.smt2instance07334.smt2benchmark_0434.smt2instance15452.smt2instance09342.smt2instance11249.smt2instance11569.smt2eqdist_lstar_non_incre_equiv_bad_0_10.smt2instance11455.smt2instance09428.smt2instance02049.smt2instance04630.smt2instance02931.smt2diseq-1-5-5-110.smt2instance09616.smt2instance05072.smt2instance15307.smt2instance08106.smt2eqdist_lstar_non_incre_equiv_trans_1_20.smt2slog_stranger_1740_sink.smt2instance01492.smt2pcp_instance_32.smt2instance05304.smt2instance06654.smt2instance08818.smt2benchmark_0476.smt2pcp_instance_286.smt2instance01624.smt2instance07646.smt2query4673.smt2instance11699.smt2instance00292.smt2benchmark_0366.smt2instance14183.smt2instance11727.smt2slog_stranger_2676_sink.smt2instance11599.smt2instance10521.smt2instance08176.smt2instance11994.smt2instance03116.smt2slog_stranger_4644_sink.smt2instance06711.smt2instance14115.smt2instance02031.smt2instance06177.smt2instance01015.smt2instance14522.smt2instance05381.smt2instance00050.smt2instance10171.smt2instance00946.smt2instance15618.smt2instance09055.smt2unsolved_pcp_instance_485.smt2slog_stranger_386_sink.smt2instance04900.smt2benchmark_0049.smt2instance06186.smt2instance00850.smt2instance09867.smt2instance07588.smt2instance07073.smt2pcp_instance_195.smt2instance10056.smt2instance00134.smt2benchmark_0405.smt2instance06948.smt2slog_stranger_3207_sink.smt201_track_117.smt2benchmark_0476.smt2instance03351.smt2benchmark_0222.smt2slog_stranger_3689_sink.smt2instance11762.smt2instance15177.smt2instance12825.smt2query4443.smt2instance08172.smt2instance06269.smt2pcp_instance_93.smt2instance03784.smt2coffee-can_lstar_non_incre_equiv_bad_0_2.smt2instance02121.smt2instance07235.smt2instance01976.smt2Generated automatically by the ZIPT Benchmark workflow on the c3 branch.
Beta Was this translation helpful? Give feedback.
All reactions