1
1
import Duper.TPTP
2
2
import Duper.Tactic
3
3
4
- -- Deterministic timeout
5
- -- More saturate loops than commit `c1afee181ad15b58edba6972d3acef4e85eba7ed`
6
- -- "support for recursors"
7
- set_option maxHeartbeats 200000
8
- set_option trace.Meta.debug true in
9
- tptp GEO192 "../TPTP-v8.0.0/Problems/GEO/GEO192+2.p" by duper
10
-
11
- tptp COM025_5 "../TPTP-v8.0.0/Problems/COM/COM025_5.p"
12
- by duper
13
-
14
- tptp HWV042 "../TPTP-v8.0.0/Problems/HWV/HWV042_1.p"
15
- by duper
16
-
17
- set_option maxTPTPProblemLines 10000 in
18
- tptp ITP222 "../TPTP-v8.0.0/Problems/ITP/ITP222_2.p"
19
- by sorry
20
-
21
- set_option maxTPTPProblemLines 10000 in
22
- tptp ITP010_7 "../TPTP-v8.0.0/Problems/ITP/ITP010_7.p"
23
- by sorry
24
-
25
-
26
-
4
+ -- Bug 1
27
5
namespace Color2
28
6
29
7
inductive Color :=
@@ -41,8 +19,204 @@ set_option pp.match false
41
19
#print test.match_1
42
20
#print Color.casesOn
43
21
44
- -- Not sure why this does not work:
22
+ set_option trace.Saturate.debug true in
23
+ set_option trace.Timeout.debug true in
45
24
example : test .red = .green := by
46
25
duper [test, test.match_1, Color.rec, Color.casesOn]
47
26
48
- end Color2
27
+ -- Diagnosis of the above test:
28
+ /-
29
+ When inhabitationReasoning is enabled, test.match_1, Color.rec, and Color.casesOn are all potentially
30
+ vacuous. This is caused by several issues in the current inhabitationReasoning. One issue is that Color2.Color
31
+ itself is not recognized as Inhabited, though there appear to be other issues as well that I haven't fully
32
+ figured out. In any case, the end result is premature saturation.
33
+
34
+ When inhabitationReasoning is disabled, duper spends plenty of time reasoning, but the reasoning
35
+ generates a ton of clauses, most of which aren't even unit clauses. I'm not 100% clear on whether
36
+ duper is currently capable of solving the goal (and just needs to wade through a ton of unneeded
37
+ clauses before it finds the one it actually needs), or whether the fact that duper has stuff to spin
38
+ its wheels on is obscuring the fact that duper can't actually solve this goal. In this latter case,
39
+ it might be due to issues actually deconstructing the arguments provided to duper (e.g. Color2.Color.rec)
40
+
41
+ It's notable that when inhabitationReasoning is disabled, both of the following are in the active set:
42
+ - `∀ (a : Color2.Color), Color2.test a = Color2.Color.rec Color2.Color.green Color2.Color.red a`
43
+ - The full rhs is `@Color2.Color.rec (fun x => Color2.Color) Color2.Color.green Color2.Color.red a`
44
+ which reduces to `Color.green` when a is red (we can see that by calling #reduce)
45
+ - `Color2.test Color2.Color.red ≠ Color2.Color.green`
46
+
47
+ In the expanded form, it's unsurprising that the first premise's lhs would be smaller than the first premise's
48
+ rhs. But if we reduced the first premise's rhs, then the rhs would just be Color.green, which would be smaller.
49
+ Then, by setting a to red, we should be able to unify both premise's lhs to obtain .green ≠ .green (and from that,
50
+ we would quickly obtain our contradiction)
51
+
52
+ One question maybe worth asking is, in the current calculus, are we even instructed to compare the lhs of both
53
+ premises. Because in their current (irreducible) state, the first premise's lhs is smaller than the first premise's
54
+ rhs. This stop being true after we instantiate a and reduce the rhs, but it is true until that point.
55
+
56
+ I do think that for superposition, the lhs of the second premise should be eligible for superposition based on
57
+ (E2) of Defition 22. However, the issue for superposition is that for the first premise, by side condition 5,
58
+ we can't have the unifying side be less than or equal to the side we are substituting in. After we instantiate a,
59
+ both the lhs and rhs can be reduced to .red, but that's irrelevant because we definitely don't do substitution
60
+ midway through the superposition rule. After applying σ (which instantiates a as red), we have that the lhs of
61
+ the first premise is less than the rhs of the first premise, so we can't replace the first premise's lhs with its
62
+ rhs in the second premise.
63
+
64
+ Additional note: The above discouse concerns why I can't combine the two noted premises. And my conclusion is that
65
+ duper is being faithful to the calculus in refusing to do this. However, it occurs to me that maybe the thing
66
+ we should want of duper here isn't to use test.match_1, Color.rec, and Color.casesOn to obtain the result. Maybe
67
+ we should instead just say that `test .red` is a reducible that ought to be reduced (and if we did reduce it, our
68
+ goal would be immediately proven trivially). Having things like test.match_1, Color.rec, and Color.casesOn would
69
+ perhaps be useful if the goal were more abstract (e.g. if we were trying to prove `∃ x : Color, test x = .green`),
70
+ but as is, maybe we should want duper to just reduce `test .red` to `.green`
71
+
72
+ Final note: If `preprocessFact` in `Util.Reduction.lean` is modified at line 17 to use transparency .all rather than
73
+ .instances, the above goal is solved both with and without inhabitation reasoning. I can't recall the specific reason
74
+ we were using .instances transparency though, so I'll refrain from actually making the modification until I can check
75
+ in with Yicheng. However, even when this change is made, the example written below fails both with and without
76
+ inhabitationReasoning. An alternative way to handle this would be to improve Duper's awareness of definitions so that,
77
+ having given `test` as an explicit definition, it will know to unfold `test`. Rather than modifying the preprocessing,
78
+ this approach would instead have definitional rewriting/unfolding be an explicit simplification rule.
79
+ -/
80
+
81
+ set_option inhabitationReasoning false in
82
+ set_option trace.Saturate.debug true in
83
+ set_option trace.Timeout.debug true in
84
+ example : ∃ c : Color, test c = .green := by
85
+ duper [test, test.match_1, Color.rec, Color.casesOn]
86
+
87
+ end Color2
88
+
89
+ -- Bug 2
90
+ instance : Neg Nat := sorry
91
+
92
+ def even_int_fun (f : Int → Int) := ∀ x, f (-x) = f x
93
+ def even_nat_fun (f : Nat → Nat) := ∀ x, f (-x) = f x
94
+
95
+ instance : Add (Int → Int) := (⟨fun f g x => f x + g x⟩ : Add (Int → Int))
96
+ instance : Add (Nat → Nat) := (⟨fun f g x => f x + g x⟩ : Add (Nat → Nat))
97
+
98
+ -- Although it takes longer than I would prefer (~2s), duper is able to solve this
99
+ set_option trace.Print_Proof true in
100
+ example (f : Int → Int) (hf : ∀ x, f (-x) = f x) : even_int_fun f := by -- The goal is the same as hf
101
+ duper [even_int_fun]
102
+
103
+ -- Although duper can solve the above example, it appears to time out on this example. I haven't yet pinpointed
104
+ -- the reason why these two examples behave differently
105
+ set_option trace.Timeout.debug true in
106
+ set_option trace.Timeout.debug.fullActiveSet true in
107
+ example (f : Nat → Nat) (hf : ∀ x, f (-x) = f x) : even_nat_fun f := by -- The goal is the same as hf
108
+ duper [even_nat_fun]
109
+
110
+ -- Bug 3
111
+ -- This example is the same as the below example except it was manually skolemized
112
+ set_option trace.Print_Proof true in
113
+ example (p : α → β → Prop ) (g : _ → _) (h : ∀ (x : α), p x (g x)) : ∃ (f : α → β), ∀ x, p x (f x) :=
114
+ by duper
115
+
116
+ -- This should work as evidenced by the example above
117
+ set_option inhabitationReasoning false in
118
+ set_option trace.Saturate.debug true in
119
+ set_option trace.Timeout.debug true in
120
+ example (p : α → β → Prop ) (h : ∀ (x : α), ∃ y, p x y) : ∃ (f : α → β), ∀ x, p x (f x) :=
121
+ by duper
122
+
123
+ -- Diagnosis of the above test
124
+ /-
125
+ - ∀ (a : α) (a_1 : β), p a (skS.0 0 a a_1) = True
126
+ - ∀ (a : α → β) (a_1 : α), p (skS.0 1 a a_1) (a (skS.0 1 a a_1)) = False
127
+
128
+ Both of the above clauses are in the active set at timeout. I believe it should be possible to unify the two left hand
129
+ sides via the following substitution:
130
+ - The first clause's `a_1` maps to some `b : β`
131
+ - The second clause's `a` and `a_1` maps to `f : α → β` and `y : α` such that `f (skS.0 1 f y) = skS.0 0 (skS.0 1 f y) b`.
132
+ Note that this is only a constraint on `f` and makes no assumptions about `skS.0 1` or `skS.0 0`.
133
+ - The first clause's `a` maps to `(skS.0 1 f y)`
134
+
135
+ With the above unification, I believe both left hand sides should become `p (skS.0 1 f y) (skS.0 0 (skS.0 1 f y) b)`. This
136
+ would enable superposition to infer that `False = True`, which would directly yield a contradiction.
137
+
138
+ The fact that both of the listed clauses are in the active set at timeout most likely indicates that Duper was unable
139
+ to achieve this particular unification (if Duper were able to achieve this unification, the clause `False = True` would
140
+ be produced and I would expect this to immediately be the smallest clause in the passive set. The fact that this clause is
141
+ not immediately pulled from the passive set almost certainly indicates that it was not produced).
142
+
143
+ To further diagnose this bug, the thing to do would be to attempt to trace what duper is doing when superposition is called
144
+ with the above two clauses.
145
+ -/
146
+
147
+ -- Bug 4
148
+ set_option inhabitationReasoning true in
149
+ set_option trace.Saturate.debug true in
150
+ example (p : α → β → Prop ) (h : ∀ (x : α), ∃ y, p x y) : ∃ (f : α → β), ∀ x, p x (f x) :=
151
+ by duper
152
+
153
+ -- Diagnosis of the above test
154
+ /-
155
+ This is the same example used in bug 3, but with inhabitation reasoning enabled, duper results in premature saturation
156
+ rather than a deterministic timeout. The reason for this is duper is not able to determine that `α` or `α → β` are Inhabited.
157
+ Because of this, duper is unable to move forward, even though in principle, it could.
158
+
159
+ If Duper recognized `α` as nonempty, then Duper would be able to make more forward progress. When provided with `[Inhabited α]`,
160
+ the duper is able to derive that `β` is also nonempty in the above example (from `∃ y : β, ...` in `h`), though Duper will
161
+ still get stuck, not recognizing that `α → β` must be Nonempty if `α` and `β` are both Nonempty. Still, it would be more progress
162
+ than is currently achieved, and could be transformed into more progress yet if the Nonempty status of `α → β` were derived
163
+ from the Nonempty status of `α` and `β`.
164
+
165
+ Of course, Duper cannot know that `α` is nonempty, as this isn't actually entailed by the example. However, what Duper could
166
+ do in principle, rather than give up on any reasoning whatsoever, is recognize that if `α` is Empty, then regardless of what
167
+ `β` is, `α → β` must be Nonempty. This would allow Duper to proceed reasoning about the goal `∃ (f : α → β), ∀ x, p x (f x)`
168
+ which is negated to `(∃ (f : α → β), ∀ x, p x (f x)) = False` and clausified into
169
+ `∀ (a : α → β), ((∀ (x : α), p x (a x)) = False)`. With `α` recognized as Empty, `(∀ (x : α), p x (a x))` could be recognized
170
+ as vacuously true, which would transform the clause into `∀ (a : α → β), (True = False)`, and from there a contradiction could
171
+ easily be derived.
172
+
173
+ The point of this is that currently, inhabitation reasoning prevents any forward progress in the presence of potentially
174
+ uninhabited types even though there may be instances when casing on whether a type is inhabited could yield forward progress.
175
+ -/
176
+
177
+ -- Bug 5
178
+ set_option inhabitationReasoning true in
179
+ set_option trace.typeInhabitationReasoning.debug true in
180
+ tptp ITP023_3 "../TPTP-v8.0.0/Problems/ITP/ITP023_3.p"
181
+ by duper -- Fails due to error: unknown free variable '_uniq.6173142'
182
+
183
+ set_option inhabitationReasoning false in
184
+ tptp ITP023_3' "../TPTP-v8.0.0/Problems/ITP/ITP023_3.p"
185
+ by duper -- Det timeout
186
+
187
+ -- Diagnosis of the above test
188
+ /-
189
+ The error appears to occur during removeVanishedVarsHelper (which is only called when inhabitationReasoning is enabled).
190
+ Specifically, when the clause `∀ (a : Type) (a_1 a_2 : a), _uniq.6179254 → p c_2Ebool_2ET_2E0 = True ∨ a_1 = a_2` is
191
+ processed, an mvar `?m.6184382` is given the type `_uniq.6179254`. When `Meta.findInstance` is called on `?m.6184382`'s
192
+ type (`_uniq.6179254`), the error "unknown free variable '_uniq.6173142'" is produced.
193
+
194
+ To further diagnose this bug, the thing to do would be to attempt to trace how the above clause was produced (and therefore,
195
+ where the `_uniq.6179254` constraint was imposed).
196
+ -/
197
+
198
+ -- Bug 6
199
+ set_option inhabitationReasoning true in
200
+ example : ((∃ (A B : Type ) (f : B → A) (x : B), f x = f x) = True) :=
201
+ by duper -- Fails because we currently do not infer that A is nonempty from the fact that B and B → A are nonempty
202
+
203
+ set_option inhabitationReasoning true in
204
+ example : ∃ (A : Type ) (B : A → Type ) (f : ∀ (a : A), B a) (x : A), (f x = f x) = True :=
205
+ by duper
206
+
207
+ -- Diagnosis of the above two examples
208
+ /-
209
+ This isn't so much a bug as it is a limitation in the current approach to inhabitation reasoning. The extent of reasoning Duper
210
+ currently performs in attempting to determine whether a type is inhabited is limited. The two above examples provide somewhat
211
+ more complicated cases in which Duper is not able to infer that a particular type is inhabited
212
+ -/
213
+
214
+ -- Bug 7
215
+ example (a : α) (as : List α) : [] ≠ a :: as :=
216
+ by duper [List.rec] -- Error: AppBuilder for 'mkAppOptM', result contains metavariables @List.nil
217
+
218
+ -- Diagnosis of the above example
219
+ /-
220
+ This error arises during `addRecAsFact` in Tactic.lean on the line that calls `let ctor ← mkAppOptM ctorName #[]`.
221
+ Fundamentally, the issue is that `List` is polymorphic and the current `addRecAsFact` doesn't support that.
222
+ -/
0 commit comments