[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9057

2026-03-20T13:45:06Z

github-actions[bot]
bot Mar 20, 2026

Date: 2026-03-20
Branch: c3
Benchmark set: QF_S (50 randomly selected files from tests/QF_S.tar.zst; 22 172 total available)
Timeout: 10 seconds per benchmark (-T:5 + outer 7 s for seq with tracing; -T:10 / outer 12 s for nseq; -t:10000 ms for ZIPT)
Z3 build: Debug, CMake+Ninja, commit 88ef8c7 (v4.17.0)
ZIPT build: parikh branch, Release, .NET 8, linked against freshly built Microsoft.Z3.dll (netstandard2.1)

Summary

Metric	seq solver	nseq solver	ZIPT solver
sat	21	25	21
unsat	12	16	15
unknown	17	8	5
timeout	0	0	3
bug/crash	0	1	6
Total time (s)	80.264	32.297	47.536
Avg time/benchmark (s)	1.605	0.646	0.951

Soundness disagreements (any two solvers return conflicting sat/unsat): 0

nseq is the fastest solver on this sample (avg 0.646 s), solves 41/50 instances definitively (vs. 33/50 for seq and 36/50 for ZIPT). seq returns unknown on 17 instances, of which 8 are definitively answered by nseq (and 7 also by ZIPT).

Notable Issues

Soundness Disagreements (Critical)

✅ None found. All three solvers agreed on every instance where at least two of them produced a definitive answer.

Crashes / Bugs

nseq — 1 assertion violation

File	Error
`pcp_instance_251.smt2`	`ASSERTION VIOLATION` in `src/ast/rewriter/seq_axioms.cpp:1108`: `NOT IMPLEMENTED YET!` — triggered by `str.replace_all`

ZIPT — 6 unsupported-feature crashes

All six ZIPT crashes share the same root cause: ZIPT does not implement str.replace_all.

File	ZIPT output
`pcp_instance_160.smt2`	`Unsupported feature: str.replace_all currently not supported`
`pcp_instance_125.smt2`	same
`pcp_instance_188.smt2`	same
`pcp_instance_251.smt2`	same
`benchmark_0089.smt2`	same
`benchmark_0153.smt2`	same

Note: seq and nseq return unknown (not a crash) for the pcp/rna files — they do parse str.replace_all but cannot decide the instances within the timeout.

Slow Benchmarks (> 8 s)

Three diseq-* benchmarks caused nseq and ZIPT to hit their full timeouts; seq returned unknown quickly (≈5 s internal limit):

File	seq	nseq	ZIPT
`diseq-1-3-6-100.smt2`	unknown 5.010 s	unknown 10.009 s	timeout 12.018 s
`diseq-1-3-5-106.smt2`	unknown 5.009 s	unknown 10.009 s	timeout 12.018 s
`diseq-1-5-6-100.smt2`	unknown 5.009 s	unknown 10.009 s	timeout 12.018 s

These are disequality-heavy benchmarks; none of the three solvers could decide them within the allotted time.

seq Regressions vs. nseq: Instances Where Only seq Fails

Eight instances were solved definitively by nseq (and in 7 cases also by ZIPT) but seq timed out at 5 seconds:

File	seq	nseq	ZIPT
`Burns_sat_non_incre_equiv_trans_28_0.smt2`	unknown 5.010 s	unsat 0.054 s	unsat 0.293 s
`eqdist_sat_non_incre_equiv_init_0_3.smt2`	unknown 5.011 s	unsat 0.037 s	unsat 0.272 s
`two_token_pass_sat_non_incre_equiv_init_0_7.smt2`	unknown 5.010 s	unsat 0.037 s	unsat 0.263 s
`instance13779.smt2`	unknown 5.011 s	unsat 0.115 s	unsat 0.349 s
`instance02438.smt2`	unknown 5.018 s	sat 0.058 s	sat 0.322 s
`slog_stranger_3748_sink.smt2`	unknown 5.010 s	sat 0.210 s	sat 0.353 s
`instance07678.smt2`	unknown 5.017 s	sat 0.204 s	sat 0.326 s
`instance09958.smt2`	unknown 5.012 s	sat 0.069 s	unknown 0.052 s

Trace Analysis: seq-fast / nseq-slow Hypotheses

No seq-fast / nseq-slow cases were observed in this run. In every instance, nseq was equal to or faster than seq. The trend was strongly in the opposite direction: nseq outperformed seq on 8 instances that seq could not decide at all.

However, the trace data for the seq-slow / nseq-fast cases reveals a clear pattern worth noting for development purposes:

Burns / eqdist / hornstr-equiv class (seq times out, nseq solves in < 0.1 s)

The seq solver's .z3-trace for Burns_sat_non_incre_equiv_trans_28_0.smt2 (94 375 trace lines) and eqdist_sat_non_incre_equiv_init_0_3.smt2 (165 032 trace lines) show the same pathological pattern:

The dominant trace event is [seq] mk_eq_core (9 120 calls for Burns, 3 151 for eqdist), all of the form X == reg1, Y == reg1, X == varin, Y == varout — the same small set of variable-equality assertions repeated thousands of times.
This is interleaved with 6 276 / 8 656 [seq] assign_eh calls that repeatedly re-trigger propagate_in_re for the same membership predicates.
The [seq] solve_ne / [seq] reduce_ne pattern (108 occurrences each for Burns) indicates the disequality-handling path is iterated many times without closure.
The seq solver appears to be caught in a fixpoint loop: it expands membership predicates into automaton-accept form, generates new character-level axioms (e.g., (seq.unit Char[49]) through Char[99]), and re-simplifies equations without making net progress.

Hypothesis: These benchmarks involve str.in_re membership over Kleene-star languages combined with string equations and disequalities. The seq solver's tactic of converting membership to automaton transitions generates an ever-growing set of character-split axioms, but the disequality constraints prevent early pruning — seq has no Parikh-constraint or length-bound shortcut for such instances. The nseq solver, by contrast, likely applies Nielsen-graph reductions or Parikh-based length reasoning that immediately derives a contradiction (for unsat cases) or a valid assignment (for sat cases) without enumerating character-level witnesses. This is consistent with nseq's dramatically lower call counts for these problem types.

Per-File Results

Click to expand the full 50-row results table

#	File	seq verdict	seq time (s)	nseq verdict	nseq time (s)	ZIPT verdict	ZIPT time (s)	Notes
1	instance14129.smt2	unsat	0.313	unsat	0.085	unsat	0.393
2	Burns_sat_non_incre_equiv_trans_28_0.smt2	unknown	5.010	unsat	0.054	unsat	0.293
3	slog_stranger_2638_sink.smt2	sat	4.712	sat	0.046	sat	0.261
4	instance06235.smt2	unsat	0.173	unsat	0.034	unsat	0.393
5	instance13328.smt2	sat	1.264	sat	0.052	sat	0.238
6	instance12167.smt2	sat	0.135	sat	0.034	sat	0.278
7	instance13802.smt2	unsat	0.115	unsat	0.035	unsat	0.287
8	instance00184.smt2	sat	0.074	sat	0.030	sat	0.222
9	instance02552.smt2	sat	0.152	sat	0.035	sat	0.245
10	diseq-1-3-6-100.smt2	unknown	5.010	unknown	10.009	timeout	12.018
11	slog_stranger_3522_sink.smt2	sat	2.353	sat	0.047	sat	0.301
12	instance11423.smt2	sat	0.421	sat	0.040	sat	0.329
13	eqdist_sat_non_incre_equiv_init_0_3.smt2	unknown	5.011	unsat	0.037	unsat	0.272
14	instance07121.smt2	unsat	0.168	unsat	0.035	unsat	0.323
15	instance02438.smt2	unknown	5.018	sat	0.058	sat	0.322
16	instance06342.smt2	unsat	0.099	unsat	0.031	unsat	0.246
17	pcp_instance_160.smt2	unknown	0.228	unknown	0.029	bug	0.137	ZIPT: str.replace_all unsupported
18	query7116.smt2	sat	0.804	sat	0.044	unknown	0.053
19	instance13125.smt2	unsat	0.192	unsat	0.063	unsat	0.340
20	diseq-1-3-5-106.smt2	unknown	5.009	unknown	10.009	timeout	12.018
21	instance06776.smt2	unsat	0.290	unsat	0.067	unsat	0.336
22	slog_stranger_1259_sink.smt2	unsat	0.026	unsat	0.023	unsat	0.227
23	instance04680.smt2	sat	0.069	sat	0.029	sat	0.204
24	two_token_pass_sat_non_incre_equiv_init_0_7.smt2	unknown	5.010	unsat	0.037	unsat	0.263
25	instance02769.smt2	sat	0.062	sat	0.028	sat	0.202
26	instance01358.smt2	sat	0.085	sat	0.028	sat	0.241
27	instance04839.smt2	sat	0.103	sat	0.033	sat	0.264
28	instance03281.smt2	sat	0.471	sat	0.033	sat	0.199
29	instance00017.smt2	sat	0.150	sat	0.039	sat	0.272
30	pcp_instance_125.smt2	unknown	0.209	unknown	0.029	bug	0.141	ZIPT: str.replace_all unsupported
31	instance09958.smt2	unknown	5.012	sat	0.069	unknown	0.052
32	slog_stranger_3748_sink.smt2	unknown	5.010	sat	0.210	sat	0.353
33	instance01197.smt2	sat	1.197	sat	0.029	sat	0.340
34	instance07571.smt2	sat	0.091	sat	0.035	sat	0.252
35	instance02815.smt2	sat	0.311	sat	0.030	sat	0.190
36	pcp_instance_188.smt2	unknown	0.208	unknown	0.029	bug	0.153	ZIPT: str.replace_all unsupported
37	slog_stranger_2674_sink.smt2	unsat	0.032	unsat	0.023	unknown	0.051
38	benchmark_0089.smt2	unknown	0.859	unknown	0.033	bug	0.118	ZIPT: str.replace_all unsupported
39	instance08757.smt2	sat	3.518	sat	0.068	unknown	0.054
40	instance01740.smt2	sat	0.052	sat	0.028	sat	0.214
41	instance08045.smt2	unsat	0.042	unsat	0.028	unsat	0.374
42	instance07678.smt2	unknown	5.017	sat	0.204	sat	0.326
43	diseq-1-5-6-100.smt2	unknown	5.009	unknown	10.009	timeout	12.018
44	instance13779.smt2	unknown	5.011	unsat	0.115	unsat	0.349
45	benchmark_0153.smt2	unknown	0.865	unknown	0.033	bug	0.125	ZIPT: str.replace_all unsupported
46	instance05927.smt2	sat	2.262	sat	0.047	unknown	0.066
47	instance14399.smt2	unsat	0.146	unsat	0.034	unsat	0.347
48	slog_stranger_2769_sink.smt2	unsat	0.040	unsat	0.024	unsat	0.387
49	instance15509.smt2	sat	2.639	sat	0.068	sat	0.309
50	pcp_instance_251.smt2	unknown	0.207	bug	0.028	bug	0.140	nseq: assertion violation (str.replace_all); ZIPT: unsupported

Generated automatically by the ZIPT Benchmark workflow on the c3 branch.

AI generated by Qf S Benchmark · history

expires on Mar 27, 2026, 1:45 PM UTC

2026-03-28T01:01:17Z

github-actions[bot]
bot Mar 28, 2026
Author

This discussion was automatically closed because it expired on 2026-03-27T13:45:05.797Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9057

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[ZIPT Benchmark] Z3 c3 branch — 2026-03-20 #9057

Uh oh!

github-actions[bot] bot Mar 20, 2026

Summary

Notable Issues

Soundness Disagreements (Critical)

Crashes / Bugs

Slow Benchmarks (> 8 s)

seq Regressions vs. nseq: Instances Where Only seq Fails

Trace Analysis: seq-fast / nseq-slow Hypotheses

Per-File Results

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 28, 2026 Author

github-actions[bot]
bot Mar 20, 2026

github-actions[bot]
bot Mar 28, 2026
Author