[Ostrich Benchmark] Z3 c3 branch — 2026-03-20 #9071
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-03-28T00:18:41.114Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Date: 2026-03-20
Branch: c3 (Z3 version 4.17.0, commit
8ef491e)Benchmark set: Ostrich (all 349 files from
tests/ostrich.zip)Timeout: 5 seconds per benchmark (
-T:5for Z3;-t:5000for ZIPT)Summary
Soundness disagreements (any two solvers return conflicting sat/unsat): 22
Per-File Results
Click to expand full per-file table (349 benchmarks)
Notable Issues
Soundness Disagreements — Critical (22 benchmarks)
These benchmarks have at least two solvers producing conflicting definitive answers (sat vs unsat):
seq vs nseq disagreements (nseq likely unsound):
contains-4,failedProp,failedProp2,indexof_const_index_unsat,indexof_var_unsat,norn-benchmark-9f,prefix-suffix,replace-special,replace-special-4,replace-special-5,simple-replace-4b— nseq returnssatwhere seq returnsunsat.ZIPT unsound (likely): 8 substr/prefix/contains benchmarks where ZIPT returns
satbut both Z3 solvers returnunsat.Crashes / Bugs
seq bugs (4 files) — all parse-ecma files crash in both seq and nseq, suggesting a parser issue:
parse-ecma-cases.smt2,parse-ecma-cases-2.smt2,parse-ecma-groups.smt2,parse-ecma-replace.smt2nseq bugs (27 files) — nseq crashes on string operations it doesn't yet support:
concat-001,contains-1,contains-7,is-digit-2,non-greedy-quantifiers,nonlinear-2,noodles-unsat8,noodles-unsat10,norn-benchmark-9i, plus all parse-ecma,pcp-1,str-leq11/12/13,str-lt,str-lt2,str.at,str.at-2,str.from_int_6,str.to_int_5/6,substring,substring2,substring2bZIPT bugs (229 files) — the
parikhbranch of ZIPT does not implement many string operations (transducers, replace-regex, reverse, ADTs, BV-to-str, etc.).Slow Benchmarks (> 4s for any solver)
31 benchmarks took more than 4 seconds for at least one solver (most are at the 5s timeout boundary):
artur-unsat-we,nikolai-unsat,nonlinear,noodles-unsat9,concat-regex4,suffix-31/32/33,fixed22/23artur-unsat,bigSubstrIdx,concat-regex2,indexof-2,loop2,noodles-unsat,noodles-unsat2/3/5/6/7/8,regex_counting_unsat,regexdeep,str.to_int_4,substring-bug2,str-prefix-20,fixed22/23,suffix-30all-quantifiers,concat-regex,concat-regex3,loop,simple-concat-4artur-unsat-common-prefix,artur-unsat-we,concat-empty,norn-benchmark-9g,prefix-1,suffix-1(all hit 5-7s)Generated automatically by the Ostrich Benchmark workflow on the c3 branch.
Build: Z3 v4.17.0 (Release), ZIPT parikh branch (net10.0), 349 benchmarks, 5s timeout.
Beta Was this translation helpful? Give feedback.
All reactions