-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set type for LAnd, LOr, LNot to be Type_Boolean if unknown. #4612
Conversation
Hrm, interesting. I'd expect types to be unique / singletons, what is the point having multiple |
Well it can simply happen when you do not use To be honest, I am not 100% sure whether this is really the reason for the mismatch. The problem I ran into was two different |
I think so. I might be missing something, but I do not see a point of having multiple instances of the same semantic type. Maybe I am wrong :) But if not, then it would be better to fix the underlying issue rather than patch the IR nodes. |
So even without the pointer comparison issue we should probably fix up the type here. We already automatically set the type for |
Yes. For "underlying issue" I mean allowing different |
I agree.
While I agree that it does not make much sense to have multiple instances of |
Well, it depends on what is the underlying rule of comparing On the other hand, booleans, integere types, etc. definitely should be uniqued. Pretty much same approach is used in e.g. LLVM type system and it does make sense to me. |
Is there a case for keeping types defined using the type keyword unique? In particular if you're using pointer comparison to determine equality. Something like:
Shouldn't use the same object for |
That's the key question. What it going from the language perspective? And what is going from the IR perspective? Is |
The language spec is quite clear about the semantics here:
(https://staging.p4.org/p4-spec/docs/P4-16-v1.2.4.html#sec-newtype) So, anything that would allow mixing of these type would violate the spec. Of course, the semantic check could be done first and then re-use the base type in the IR - I'm not sure whether the compiler currently works this way. However, modifying the IR this way would make checking later in the compiler (eg. backend passes) more complicated. |
Thanks for the clarification! Then yes, there might be multiple "instances" of However, it seems that |
So, what is to be done here?
Internally, the compiler already uses |
This should be unnecessary -- code should never create distinct instances of IR::Type::Boolean. The original design of the IR was to have all IR::Type objects be pointer-comparable -- there should only be one instance of any give type. That's why there are |
True, but it can easily happen and lead to errors like this. This PR just makes Probably the best way to enforce this is to make the new operator private for any type that supports |
I'm a little hesitant to endorse changes that just serve to obscure bugs elsewhere in the compiler code. Relation needs an override to set the type to bool regardless of the type of the operands, since that is different from either operand. I could see similarly unconditionally setting the type of LAnd and LOr to bool (and maybe LNot as well?), as they require their operands to be bool. So if we had any types that could be implicitly converted to bool, those conversions should go on the operands of LAnd and LOr, not on the result.
Yes, this seems the way to go to avoid future bugs in the code. |
+1. All unique nodes should have closed |
Yes, the reason I am advocating for this is that relying on pointer comparison to ensure the right type propagates is fragile. In practice, to work around the possibility that you may get a different pointer you will end up having to specifiy IR::Type_Boolean::get() for all instances of these operators anyway. There is also the I think both things should be done, this PR and the privatization of the new operator for any IR class that uses |
Well if the types were really singletons (there was no reason to have two instances) that Another big problem in my option is the source info. Since that is saved in the node, either these singleton-aspiring types can't have source info (it would be a big breaking change to remove So in the end, while the aspiration is for these to be singletons, I don't really see how they can be without rewriting large parts of the compiler (and downstream projects). I suggest we compare them semantically and make all Also note that by relying more on |
The problem is that
On the other hand since we manage the allocation through |
Yes, that is what I meant by "it does not work type-wise". I should be explicit if talking about P4 types or C++ types in implementation of the compiler :-).
True. This one would be reasonably easy to disable. All the more reason to not count on values being pointer comparable though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ I think this change makes sense. However, the default in Operation_Binary
/Operation_Unary
only makes sense in some cases, so I suggest we extend this to more nodes:
LNot
is probably OK as it is homogeneous and so are the remaining unary operations I can see (Neg
,UPlus
,Cmpl
).- All the relations (
Equ
,Neq
,Lss
, ...) should default tobool
, same asLOr
/LAnd
. - The remaining P4C-defined binary operations look OK to me.
I guess (@ChrisDodd would maybe know as he was probably there when this was designed) this is not problem by itself, as the IR is not typed correctly until a type inference/type checker runs. But when one does this when the IR has already types written in it, it is very easy to break an invariant following passes rely on. Personally I found it quite tedious that some things rely on types in the IR and some rely on |
The original design was that the first run of the TypeChecker would insert casts as needed to ensure that everything was correctly typed. Passes that run after should be type correct, which is why subsequent runs of the typechecker run with The type map should not be needed, except to temporarily hold types in the middle of typechecking (which is simplest as a two-step process, first inferring the types of everything in an Inspector, followed by a Transform that rewrites things to have the correct type in the nodes -- this second pass should get rid of Type_Name nodes.) The problem is that this is diffuclt to dump as valid P4 after this transform (due to things being used before they're defined in the resulting dump).
The relations all currently set the type to bool unconditionally, and I think LOr/LAnd should do the same (eg, remove the |
Oh right, it is in |
I'd like to push back on the unconditional part a bit. I can see cases where downstream projects may want to supply their own types to a particular node. This will be much harder if those types are overridden during construction. The extracted varbit type P4Tools uses is an example of that. If we want to fix these nodes, we could just remove the constructor at this point. |
It seems like this would be done by having a pass that modifies the type in the node after typechecking (since otherwise typechecking would override it)? That would not be affected by what is in the ctor. In addition, the "explicit set in the ctor" (as is done in the Relation case) only affects ctors that don't already have an explicit type as an argument to the ctor (eg, a call like |
I think there are a couple workarounds, either writing a pass that performs type substitution or just modifying the type after, but they seem cumbersome. We can have a restricted interface first and then extend it when required?
When I was looking at the IR::Operation_Relation::Operation_Relation(const IR::Type* type, const IR::Expression* left, const IR::Expression* right) : Operation_Binary(type, left, right)
{{ type = Type::Boolean::get(); }
validate();
} So the type is always overridden. |
In this case the parameter This is more of a "happy accident" than anything deliberate in the design of the ir-generator, so might be considered and obfuscation by some, but to the extent that it just "does the right thing" does not require detailed understanding of the details of the ir-generator. |
I see, that is something I didn't consider. |
OK, so in the end the conditional (as written now by Fabian) and unconditional setting work basically the same, unless someone wants to explicitly set type explicitly to unknown (which would be very weird). We should probably at least add a comment to the ctor that it sets the value only for the constructor that does not take it. |
166e753
to
bd0c740
Compare
I ran into issues where I created a
IR::Land(new IR::BoolLiteral(new IR::Type_Boolean(), false), new IR::BoolLiteral(new IR::Type_Boolean(), false))
and the resulting type of theIR::LAnd
wasIR::Type_Unknown
. This happens because https://github.com/p4lang/p4c/blob/main/ir/expression.def#L39 is comparing pointers, not the actual type.Correct me if I am wrong but I believe the default type for Logical AND and OR should be
Type_Boolean
. Do this only when the type is unknown.