Set type for LAnd, LOr, LNot to be Type_Boolean if unknown. #4612

fruffy · 2024-04-12T16:54:55Z

I ran into issues where I created a IR::Land(new IR::BoolLiteral(new IR::Type_Boolean(), false), new IR::BoolLiteral(new IR::Type_Boolean(), false)) and the resulting type of the IR::LAnd was IR::Type_Unknown. This happens because https://github.com/p4lang/p4c/blob/main/ir/expression.def#L39 is comparing pointers, not the actual type.

Correct me if I am wrong but I believe the default type for Logical AND and OR should be Type_Boolean. Do this only when the type is unknown.

asl · 2024-04-13T21:49:29Z

Hrm, interesting. I'd expect types to be unique / singletons, what is the point having multiple Type_Boolean or Type_Unknown?

fruffy · 2024-04-13T21:59:13Z

Hrm, interesting. I'd expect types to be unique / singletons, what is the point having multiple Type_Boolean or Type_Unknown?

Well it can simply happen when you do not use IR::Type_Boolean::get() but new IR::Type_Boolean(). We could make the allocator operators for these classes private.

To be honest, I am not 100% sure whether this is really the reason for the mismatch. The problem I ran into was two different IR::Type_Boolean pointers which cause the inferred type to be IR::Type_Unknown. It could have also been a clone.

asl · 2024-04-13T22:01:56Z

new IR::Type_Boolean(). We could make the allocator operators for these classes private.

I think so. I might be missing something, but I do not see a point of having multiple instances of the same semantic type. Maybe I am wrong :) But if not, then it would be better to fix the underlying issue rather than patch the IR nodes.

fruffy · 2024-04-13T22:09:43Z

But if not, then it would be better to fix the underlying issue rather than patch the IR nodes.

So even without the pointer comparison issue we should probably fix up the type here. We already automatically set the type for Operation_Relation (https://github.com/p4lang/p4c/blob/main/ir/expression.def#L56), LAnd,LOr may have simply be an oversight.

asl · 2024-04-13T22:11:36Z

LAnd,LOr may have simply be an oversight.

Yes. For "underlying issue" I mean allowing different Type_Boolean instances, etc.

vlstill · 2024-04-15T07:05:06Z

But if not, then it would be better to fix the underlying issue rather than patch the IR nodes.

So even without the pointer comparison issue we should probably fix up the type here. We already automatically set the type for Operation_Relation (https://github.com/p4lang/p4c/blob/main/ir/expression.def#L56), LAnd,LOr may have simply be an oversight.

I agree.

LAnd,LOr may have simply be an oversight.

Yes. For "underlying issue" I mean allowing different Type_Boolean instances, etc.

While I agree that it does not make much sense to have multiple instances of Type_Boolean (or Type_Unknown) I would argue that the program should not assume there will be only one. It would be quite weird if some type representations were fully uniqued in this way, while other were not. And I don't think making e.g. struct types fully unique would be worth the extra effort (both in programming it and in runtime).

asl · 2024-04-15T07:31:03Z

I would argue that the program should not assume there will be only one. It would be quite weird if some type representations were fully uniqued in this way, while other were not. And I don't think making e.g. struct types fully unique would be worth the extra effort (both in programming it and in runtime).

Well, it depends on what is the underlying rule of comparing Type's. Having unique types allows comparing them by pointers instead of doing deep comparisons and I think it is something quite desired. Whether struct types should be unique or not is an open question, but I'd assume that the code that deals with them already does not assume them being unique, so this would be fine.

On the other hand, booleans, integere types, etc. definitely should be uniqued.

Pretty much same approach is used in e.g. LLVM type system and it does make sense to me.

ajwalton · 2024-04-15T08:38:55Z

I would argue that the program should not assume there will be only one. It would be quite weird if some type representations were fully uniqued in this way, while other were not. And I don't think making e.g. struct types fully unique would be worth the extra effort (both in programming it and in runtime).

Well, it depends on what is the underlying rule of comparing Type's. Having unique types allows comparing them by pointers instead of doing deep comparisons and I think it is something quite desired. Whether struct types should be unique or not is an open question, but I'd assume that the code that deals with them already does not assume them being unique, so this would be fine.

On the other hand, booleans, integere types, etc. definitely should be uniqued.

Pretty much same approach is used in e.g. LLVM type system and it does make sense to me.

Is there a case for keeping types defined using the type keyword unique? In particular if you're using pointer comparison to determine equality.

Something like:

type bool special_bool_t;

Shouldn't use the same object for special_bool_t as bool. (Although all instances of special_bool_t could use a unique type instance, which is distinct from Type_Boolean - the same class but a different object.)

asl · 2024-04-15T17:45:29Z

In particular if you're using pointer comparison to determine equality.

That's the key question. What it going from the language perspective? And what is going from the IR perspective? Is special_bool_t could be converted to bool implicitly, etc.

ajwalton · 2024-04-15T19:44:16Z

In particular if you're using pointer comparison to determine equality.

That's the key question. What it going from the language perspective? And what is going from the IR perspective? Is special_bool_t could be converted to bool implicitly, etc.

The language spec is quite clear about the semantics here:

While similar to typedef, the type keyword introduces a new type which is not a synonym with the
original type: values of the original type and the newly introduced type cannot be mixed in expressions.

Currently the types that can be created by the type keyword are restricted to one of: bit<>, int<>,
bool, or types defined using type from such types.

(https://staging.p4.org/p4-spec/docs/P4-16-v1.2.4.html#sec-newtype)

So, anything that would allow mixing of these type would violate the spec.

Of course, the semantic check could be done first and then re-use the base type in the IR - I'm not sure whether the compiler currently works this way. However, modifying the IR this way would make checking later in the compiler (eg. backend passes) more complicated.

asl · 2024-04-15T19:52:49Z

The language spec is quite clear about the semantics here:

Thanks for the clarification! Then yes, there might be multiple "instances" of Type_Boolean, etc. and it is expected that they are different. I think PR is correct then, it's just the difference between new IR::Type_Boolean() and IR::Type_Boolean()::get is confusing and error-prone.

However, it seems that type bool special_bool_t is represented by IR::Type_NewType wrapper node (https://github.com/p4lang/p4c/blob/9ac6969195b41c04ddf1d0d0b123620eab996692/ir/type.def#L717C7-L717C19), so then the original question remains: is there a case for second instance of Type_Boolean, etc.? It looks like not to me. But I might be missing something.

fruffy · 2024-04-18T16:32:19Z

So, what is to be done here?

So, anything that would allow mixing of these type would violate the spec.

Of course, the semantic check could be done first and then re-use the base type in the IR - I'm not sure whether the compiler currently works this way. However, modifying the IR this way would make checking later in the compiler (eg. backend passes) more complicated.

Internally, the compiler already uses Type_Boolean::get() for all instances of Type_Boolean iirc. So this condition is currently violated if the implementation were based on pointer comparison. But I believe, as @asl points out, that these declarations are wrapped in a NewType.

fruffy · 2024-05-15T16:34:16Z

@asl @vlstill Mind giving this one a review? It's been stuck for a month.

ChrisDodd · 2024-05-15T23:31:35Z

This should be unnecessary -- code should never create distinct instances of IR::Type::Boolean.

The original design of the IR was to have all IR::Type objects be pointer-comparable -- there should only be one instance of any give type. That's why there are get functions to cache the objects. However, that probably was not followed for more complex types like tuple and list.

fruffy · 2024-05-16T02:09:22Z

This should be unnecessary -- code should never create distinct instances of IR::Type::Boolean.

True, but it can easily happen and lead to errors like this. This PR just makes LOr and LAnd consistent to how Operation_Relation` is initialized.

Probably the best way to enforce this is to make the new operator private for any type that supports ::get().

ChrisDodd · 2024-05-16T02:17:38Z

True, but it can easily happen and lead to errors like this. This PR just makes LOr and LAnd consistent to how Operation_Relation is initialized.

I'm a little hesitant to endorse changes that just serve to obscure bugs elsewhere in the compiler code. Relation needs an override to set the type to bool regardless of the type of the operands, since that is different from either operand. I could see similarly unconditionally setting the type of LAnd and LOr to bool (and maybe LNot as well?), as they require their operands to be bool. So if we had any types that could be implicitly converted to bool, those conversions should go on the operands of LAnd and LOr, not on the result.

Probably the best way to enforce this is to make the new operator private for any type that supports ::get().

Yes, this seems the way to go to avoid future bugs in the code.

asl · 2024-05-16T04:24:24Z

Yes, this seems the way to go to avoid future bugs in the code.

+1. All unique nodes should have closed operator new.

fruffy · 2024-05-17T21:26:26Z

I could see similarly unconditionally setting the type of LAnd and LOr to bool (and maybe LNot as well?), as they require their operands to be bool. So if we had any types that could be implicitly converted to bool, those conversions should go on the operands of LAnd and LOr, not on the result.

Yes, the reason I am advocating for this is that relying on pointer comparison to ensure the right type propagates is fragile. In practice, to work around the possibility that you may get a different pointer you will end up having to specifiy IR::Type_Boolean::get() for all instances of these operators anyway.

There is also the clone operator, which throws a wrench into all of this.

I think both things should be done, this PR and the privatization of the new operator for any IR class that uses get(). But the second part might take a little longer because new is so pervasively used in the compiler right now.

vlstill · 2024-05-20T14:35:15Z

There is also the clone operator, which throws a wrench into all of this.

Well if the types were really singletons (there was no reason to have two instances) that clone should just return this I think, but that does not work type-wise.

Another big problem in my option is the source info. Since that is saved in the node, either these singleton-aspiring types can't have source info (it would be a big breaking change to remove IHasSourceInfo from IR::Type), or they can't be singletons.

So in the end, while the aspiration is for these to be singletons, I don't really see how they can be without rewriting large parts of the compiler (and downstream projects). I suggest we compare them semantically and make all IR::Type_Boolean equal (I believe we already ignore src info in comparison).

Also note that by relying more on IR::...::get() deduplication, we are relying more on static variables, which is not nice :-(.

fruffy · 2024-05-20T15:15:31Z

Well if the types were really singletons (there was no reason to have two instances) that clone should just return this I think, but that does not work type-wise.

The problem is that clone returns a mutable IR::Node so you would have to const cast it, which might not be a good idea for a global static unique value.

Also note that by relying more on IR::...::get() deduplication, we are relying more on static variables, which is not nice :-(.

On the other hand since we manage the allocation through get we can easily disable or enable caching now.

vlstill · 2024-05-21T06:08:24Z

Well if the types were really singletons (there was no reason to have two instances) that clone should just return this I think, but that does not work type-wise.

The problem is that clone returns a mutable IR::Node so you would have to const cast it, which might not be a good idea for a global static unique value.

Yes, that is what I meant by "it does not work type-wise". I should be explicit if talking about P4 types or C++ types in implementation of the compiler :-).

Also note that by relying more on IR::...::get() deduplication, we are relying more on static variables, which is not nice :-(.

On the other hand since we manage the allocation through get we can easily disable or enable caching now.

True. This one would be reasonably easy to disable. All the more reason to not count on values being pointer comparable though.

vlstill

✅ I think this change makes sense. However, the default in Operation_Binary/Operation_Unary only makes sense in some cases, so I suggest we extend this to more nodes:

LNot is probably OK as it is homogeneous and so are the remaining unary operations I can see (Neg, UPlus, Cmpl).
All the relations (Equ, Neq, Lss, ...) should default to bool, same as LOr/LAnd.
The remaining P4C-defined binary operations look OK to me.

vlstill · 2024-05-21T06:28:13Z

I ran into issues where I created a IR::Land(new IR::BoolLiteral(new IR::Type_Boolean(), false), new IR::BoolLiteral(new IR::Type_Boolean(), false)) and the resulting type of the IR::LAnd was IR::Type_Unknown. This happens because https://github.com/p4lang/p4c/blob/main/ir/expression.def#L39 is comparing pointers, not the actual type.

I guess (@ChrisDodd would maybe know as he was probably there when this was designed) this is not problem by itself, as the IR is not typed correctly until a type inference/type checker runs. But when one does this when the IR has already types written in it, it is very easy to break an invariant following passes rely on.

Personally I found it quite tedious that some things rely on types in the IR and some rely on TypeMap so to be on the safe side, one kind of has to have both. It would almost make more sense to me to for passes after type checker to be type preserving (i.e. the IR goes in typed and goes out typed) and we would only validate that the types are still correct. But we would still need type maps to resolve type names, etc so maybe it does not help much.

ChrisDodd · 2024-05-21T06:44:56Z

I guess (@ChrisDodd would maybe know as he was probably there when this was designed) this is not problem by itself, as the IR is not typed correctly until a type inference/type checker runs. But when one does this when the IR has already types written in it, it is very easy to break an invariant following passes rely on.

Personally I found it quite tedious that some things rely on types in the IR and some rely on TypeMap so to be on the safe side, one kind of has to have both. It would almost make more sense to me to for passes after type checker to be type preserving (i.e. the IR goes in typed and goes out typed) and we would only validate that the types are still correct. But we would still need type maps to resolve type names, etc so maybe it does not help much.

The original design was that the first run of the TypeChecker would insert casts as needed to ensure that everything was correctly typed. Passes that run after should be type correct, which is why subsequent runs of the typechecker run with readOnly = true -- any place then found without correct types should be a BUG.

The type map should not be needed, except to temporarily hold types in the middle of typechecking (which is simplest as a two-step process, first inferring the types of everything in an Inspector, followed by a Transform that rewrites things to have the correct type in the nodes -- this second pass should get rid of Type_Name nodes.) The problem is that this is diffuclt to dump as valid P4 after this transform (due to things being used before they're defined in the resulting dump).

All the relations (Equ, Neq, Lss, ...) should default to bool, same as LOr/LAnd.

The relations all currently set the type to bool unconditionally, and I think LOr/LAnd should do the same (eg, remove the if from this change making the ctor just { type = IR::Boolean::get(); }

vlstill · 2024-05-21T07:12:27Z

The relations all currently set the type to bool unconditionally, and I think LOr/LAnd should do the same (eg, remove the if from this change making the ctor just { type = IR::Boolean::get(); }

Oh right, it is in Operation_Relation, sorry I did not notice it there. Then I agree this should be unconditional.

fruffy · 2024-05-21T13:43:18Z

The relations all currently set the type to bool unconditionally, and I think LOr/LAnd should do the same (eg, remove the if from this change making the ctor just { type = IR::Boolean::get(); }

Oh right, it is in Operation_Relation, sorry I did not notice it there. Then I agree this should be unconditional.

I'd like to push back on the unconditional part a bit. I can see cases where downstream projects may want to supply their own types to a particular node. This will be much harder if those types are overridden during construction. The extracted varbit type P4Tools uses is an example of that.

If we want to fix these nodes, we could just remove the constructor at this point.

ChrisDodd · 2024-05-21T21:16:35Z

I'd like to push back on the unconditional part a bit. I can see cases where downstream projects may want to supply their own types to a particular node. This will be much harder if those types are overridden during construction. The extracted varbit type P4Tools uses is an example of that.

It seems like this would be done by having a pass that modifies the type in the node after typechecking (since otherwise typechecking would override it)? That would not be affected by what is in the ctor. In addition, the "explicit set in the ctor" (as is done in the Relation case) only affects ctors that don't already have an explicit type as an argument to the ctor (eg, a call like new IR::Equ(explicit_type, left, right) will create a node with type = explicit_type, not bool.)

fruffy · 2024-05-21T21:59:58Z

I'd like to push back on the unconditional part a bit. I can see cases where downstream projects may want to supply their own types to a particular node. This will be much harder if those types are overridden during construction. The extracted varbit type P4Tools uses is an example of that.

It seems like this would be done by having a pass that modifies the type in the node after typechecking (since otherwise typechecking would override it)? That would not be affected by what is in the ctor.

I think there are a couple workarounds, either writing a pass that performs type substitution or just modifying the type after, but they seem cumbersome. We can have a restricted interface first and then extend it when required?

In addition, the "explicit set in the ctor" (as is done in the Relation case) only affects ctors that don't already have an explicit type as an argument to the ctor (eg, a call like new IR::Equ(explicit_type, left, right) will create a node with type = explicit_type, not bool.)

When I was looking at the Operation_Relation constructor after it was generated it looked like this:

IR::Operation_Relation::Operation_Relation(const IR::Type* type, const IR::Expression* left, const IR::Expression* right) : Operation_Binary(type, left, right)
{{ type = Type::Boolean::get(); }
    validate(); 
}

So the type is always overridden.

ChrisDodd · 2024-05-21T22:10:44Z

When I was looking at the Operation_Relation constructor after it was generated it looked like this:
IR::Operation_Relation::Operation_Relation(const IR::Type* type, const IR::Expression* left, const IR::Expression* right) : Operation_Binary(type, left, right)
{{ type = Type::Boolean::get(); }
    validate(); 
}
So the type is always overridden.

In this case the parameter type shadows the field type, so the field get initialized in the base constructor, and then the assignment of bool here modifies the parameter (not the field). This assignment will only modify the field when there is no type parameter to shadow it.

This is more of a "happy accident" than anything deliberate in the design of the ir-generator, so might be considered and obfuscation by some, but to the extent that it just "does the right thing" does not require detailed understanding of the details of the ir-generator.

fruffy · 2024-05-21T22:36:50Z

In this case the parameter type shadows the field type, so the field get initialized in the base constructor, and then the assignment of bool here modifies the parameter (not the field). This assignment will only modify the field when there is no type parameter to shadow it.

I see, that is something I didn't consider.

vlstill · 2024-05-22T06:27:23Z

OK, so in the end the conditional (as written now by Fabian) and unconditional setting work basically the same, unless someone wants to explicitly set type explicitly to unknown (which would be very weird). We should probably at least add a comment to the ctor that it sets the value only for the constructor that does not take it.

fruffy added the core Topics concerning the core segments of the compiler (frontend, midend, parser) label Apr 12, 2024

fruffy requested review from asl, jafingerhut and vlstill April 12, 2024 18:24

fruffy mentioned this pull request May 20, 2024

Make the new operator protected for some IR types. #4670

Merged

vlstill reviewed May 21, 2024

View reviewed changes

vlstill mentioned this pull request May 21, 2024

idea: IR validator, programatic pass pre/post conditions #4672

Open

fruffy added 2 commits May 22, 2024 08:21

Set type for LAnd and LOr to be Type_Boolean if unknown.

a6c32da

Add comments on constructor shadowing and include LNot.

bd0c740

fruffy force-pushed the fruffy/land_lor_booltype branch from 166e753 to bd0c740 Compare May 22, 2024 12:21

vlstill approved these changes May 22, 2024

View reviewed changes

fruffy changed the title ~~Set type for LAnd and LOr to be Type_Boolean if unknown.~~ Set type for LAnd, LOr, LNot to be Type_Boolean if unknown. May 22, 2024

fruffy added this pull request to the merge queue May 27, 2024

Merged via the queue into main with commit 6507986 May 27, 2024
17 checks passed

fruffy deleted the fruffy/land_lor_booltype branch May 27, 2024 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set type for LAnd, LOr, LNot to be Type_Boolean if unknown. #4612

Set type for LAnd, LOr, LNot to be Type_Boolean if unknown. #4612

fruffy commented Apr 12, 2024

asl commented Apr 13, 2024

fruffy commented Apr 13, 2024

asl commented Apr 13, 2024

fruffy commented Apr 13, 2024

asl commented Apr 13, 2024

vlstill commented Apr 15, 2024

asl commented Apr 15, 2024

ajwalton commented Apr 15, 2024 •

edited

Loading

asl commented Apr 15, 2024

ajwalton commented Apr 15, 2024

asl commented Apr 15, 2024

fruffy commented Apr 18, 2024

fruffy commented May 15, 2024

ChrisDodd commented May 15, 2024

fruffy commented May 16, 2024

ChrisDodd commented May 16, 2024 •

edited

Loading

asl commented May 16, 2024

fruffy commented May 17, 2024

vlstill commented May 20, 2024

fruffy commented May 20, 2024

vlstill commented May 21, 2024

vlstill left a comment •

edited

Loading

vlstill commented May 21, 2024

ChrisDodd commented May 21, 2024

vlstill commented May 21, 2024

fruffy commented May 21, 2024

ChrisDodd commented May 21, 2024 •

edited

Loading

fruffy commented May 21, 2024

ChrisDodd commented May 21, 2024 •

edited

Loading

fruffy commented May 21, 2024

vlstill commented May 22, 2024

Set type for LAnd, LOr, LNot to be Type_Boolean if unknown. #4612

Set type for LAnd, LOr, LNot to be Type_Boolean if unknown. #4612

Conversation

fruffy commented Apr 12, 2024

asl commented Apr 13, 2024

fruffy commented Apr 13, 2024

asl commented Apr 13, 2024

fruffy commented Apr 13, 2024

asl commented Apr 13, 2024

vlstill commented Apr 15, 2024

asl commented Apr 15, 2024

ajwalton commented Apr 15, 2024 • edited Loading

asl commented Apr 15, 2024

ajwalton commented Apr 15, 2024

asl commented Apr 15, 2024

fruffy commented Apr 18, 2024

fruffy commented May 15, 2024

ChrisDodd commented May 15, 2024

fruffy commented May 16, 2024

ChrisDodd commented May 16, 2024 • edited Loading

asl commented May 16, 2024

fruffy commented May 17, 2024

vlstill commented May 20, 2024

fruffy commented May 20, 2024

vlstill commented May 21, 2024

vlstill left a comment • edited Loading

Choose a reason for hiding this comment

vlstill commented May 21, 2024

ChrisDodd commented May 21, 2024

vlstill commented May 21, 2024

fruffy commented May 21, 2024

ChrisDodd commented May 21, 2024 • edited Loading

fruffy commented May 21, 2024

ChrisDodd commented May 21, 2024 • edited Loading

fruffy commented May 21, 2024

vlstill commented May 22, 2024

ajwalton commented Apr 15, 2024 •

edited

Loading

ChrisDodd commented May 16, 2024 •

edited

Loading

vlstill left a comment •

edited

Loading

ChrisDodd commented May 21, 2024 •

edited

Loading

ChrisDodd commented May 21, 2024 •

edited

Loading