Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set type for LAnd, LOr, LNot to be Type_Boolean if unknown. #4612

Merged
merged 2 commits into from
May 27, 2024

Conversation

fruffy
Copy link
Collaborator

@fruffy fruffy commented Apr 12, 2024

I ran into issues where I created a IR::Land(new IR::BoolLiteral(new IR::Type_Boolean(), false), new IR::BoolLiteral(new IR::Type_Boolean(), false)) and the resulting type of the IR::LAnd was IR::Type_Unknown. This happens because https://github.com/p4lang/p4c/blob/main/ir/expression.def#L39 is comparing pointers, not the actual type.

Correct me if I am wrong but I believe the default type for Logical AND and OR should be Type_Boolean. Do this only when the type is unknown.

@fruffy fruffy added the core Topics concerning the core segments of the compiler (frontend, midend, parser) label Apr 12, 2024
@asl
Copy link
Contributor

asl commented Apr 13, 2024

Hrm, interesting. I'd expect types to be unique / singletons, what is the point having multiple Type_Boolean or Type_Unknown?

@fruffy
Copy link
Collaborator Author

fruffy commented Apr 13, 2024

Hrm, interesting. I'd expect types to be unique / singletons, what is the point having multiple Type_Boolean or Type_Unknown?

Well it can simply happen when you do not use IR::Type_Boolean::get() but new IR::Type_Boolean(). We could make the allocator operators for these classes private.

To be honest, I am not 100% sure whether this is really the reason for the mismatch. The problem I ran into was two different IR::Type_Boolean pointers which cause the inferred type to be IR::Type_Unknown. It could have also been a clone.

@asl
Copy link
Contributor

asl commented Apr 13, 2024

new IR::Type_Boolean(). We could make the allocator operators for these classes private.

I think so. I might be missing something, but I do not see a point of having multiple instances of the same semantic type. Maybe I am wrong :) But if not, then it would be better to fix the underlying issue rather than patch the IR nodes.

@fruffy
Copy link
Collaborator Author

fruffy commented Apr 13, 2024

But if not, then it would be better to fix the underlying issue rather than patch the IR nodes.

So even without the pointer comparison issue we should probably fix up the type here. We already automatically set the type for Operation_Relation (https://github.com/p4lang/p4c/blob/main/ir/expression.def#L56), LAnd,LOr may have simply be an oversight.

@asl
Copy link
Contributor

asl commented Apr 13, 2024

LAnd,LOr may have simply be an oversight.

Yes. For "underlying issue" I mean allowing different Type_Boolean instances, etc.

@vlstill
Copy link
Contributor

vlstill commented Apr 15, 2024

But if not, then it would be better to fix the underlying issue rather than patch the IR nodes.

So even without the pointer comparison issue we should probably fix up the type here. We already automatically set the type for Operation_Relation (https://github.com/p4lang/p4c/blob/main/ir/expression.def#L56), LAnd,LOr may have simply be an oversight.

I agree.

LAnd,LOr may have simply be an oversight.

Yes. For "underlying issue" I mean allowing different Type_Boolean instances, etc.

While I agree that it does not make much sense to have multiple instances of Type_Boolean (or Type_Unknown) I would argue that the program should not assume there will be only one. It would be quite weird if some type representations were fully uniqued in this way, while other were not. And I don't think making e.g. struct types fully unique would be worth the extra effort (both in programming it and in runtime).

@asl
Copy link
Contributor

asl commented Apr 15, 2024

I would argue that the program should not assume there will be only one. It would be quite weird if some type representations were fully uniqued in this way, while other were not. And I don't think making e.g. struct types fully unique would be worth the extra effort (both in programming it and in runtime).

Well, it depends on what is the underlying rule of comparing Type's. Having unique types allows comparing them by pointers instead of doing deep comparisons and I think it is something quite desired. Whether struct types should be unique or not is an open question, but I'd assume that the code that deals with them already does not assume them being unique, so this would be fine.

On the other hand, booleans, integere types, etc. definitely should be uniqued.

Pretty much same approach is used in e.g. LLVM type system and it does make sense to me.

@ajwalton
Copy link

ajwalton commented Apr 15, 2024

I would argue that the program should not assume there will be only one. It would be quite weird if some type representations were fully uniqued in this way, while other were not. And I don't think making e.g. struct types fully unique would be worth the extra effort (both in programming it and in runtime).

Well, it depends on what is the underlying rule of comparing Type's. Having unique types allows comparing them by pointers instead of doing deep comparisons and I think it is something quite desired. Whether struct types should be unique or not is an open question, but I'd assume that the code that deals with them already does not assume them being unique, so this would be fine.

On the other hand, booleans, integere types, etc. definitely should be uniqued.

Pretty much same approach is used in e.g. LLVM type system and it does make sense to me.

Is there a case for keeping types defined using the type keyword unique? In particular if you're using pointer comparison to determine equality.

Something like:

type bool special_bool_t;

Shouldn't use the same object for special_bool_t as bool. (Although all instances of special_bool_t could use a unique type instance, which is distinct from Type_Boolean - the same class but a different object.)

@asl
Copy link
Contributor

asl commented Apr 15, 2024

In particular if you're using pointer comparison to determine equality.

That's the key question. What it going from the language perspective? And what is going from the IR perspective? Is special_bool_t could be converted to bool implicitly, etc.

@ajwalton
Copy link

In particular if you're using pointer comparison to determine equality.

That's the key question. What it going from the language perspective? And what is going from the IR perspective? Is special_bool_t could be converted to bool implicitly, etc.

The language spec is quite clear about the semantics here:

While similar to typedef, the type keyword introduces a new type which is not a synonym with the
original type: values of the original type and the newly introduced type cannot be mixed in expressions.

Currently the types that can be created by the type keyword are restricted to one of: bit<>, int<>,
bool, or types defined using type from such types.

(https://staging.p4.org/p4-spec/docs/P4-16-v1.2.4.html#sec-newtype)

So, anything that would allow mixing of these type would violate the spec.

Of course, the semantic check could be done first and then re-use the base type in the IR - I'm not sure whether the compiler currently works this way. However, modifying the IR this way would make checking later in the compiler (eg. backend passes) more complicated.

@asl
Copy link
Contributor

asl commented Apr 15, 2024

The language spec is quite clear about the semantics here:

Thanks for the clarification! Then yes, there might be multiple "instances" of Type_Boolean, etc. and it is expected that they are different. I think PR is correct then, it's just the difference between new IR::Type_Boolean() and IR::Type_Boolean()::get is confusing and error-prone.

However, it seems that type bool special_bool_t is represented by IR::Type_NewType wrapper node (https://github.com/p4lang/p4c/blob/9ac6969195b41c04ddf1d0d0b123620eab996692/ir/type.def#L717C7-L717C19), so then the original question remains: is there a case for second instance of Type_Boolean, etc.? It looks like not to me. But I might be missing something.

@fruffy
Copy link
Collaborator Author

fruffy commented Apr 18, 2024

So, what is to be done here?

So, anything that would allow mixing of these type would violate the spec.

Of course, the semantic check could be done first and then re-use the base type in the IR - I'm not sure whether the compiler currently works this way. However, modifying the IR this way would make checking later in the compiler (eg. backend passes) more complicated.

Internally, the compiler already uses Type_Boolean::get() for all instances of Type_Boolean iirc. So this condition is currently violated if the implementation were based on pointer comparison. But I believe, as @asl points out, that these declarations are wrapped in a NewType.

@fruffy
Copy link
Collaborator Author

fruffy commented May 15, 2024

@asl @vlstill Mind giving this one a review? It's been stuck for a month.

@ChrisDodd
Copy link
Contributor

This should be unnecessary -- code should never create distinct instances of IR::Type::Boolean.

The original design of the IR was to have all IR::Type objects be pointer-comparable -- there should only be one instance of any give type. That's why there are get functions to cache the objects. However, that probably was not followed for more complex types like tuple and list.

@fruffy
Copy link
Collaborator Author

fruffy commented May 16, 2024

This should be unnecessary -- code should never create distinct instances of IR::Type::Boolean.

True, but it can easily happen and lead to errors like this. This PR just makes LOr and LAnd consistent to how Operation_Relation` is initialized.

Probably the best way to enforce this is to make the new operator private for any type that supports ::get().

@ChrisDodd
Copy link
Contributor

ChrisDodd commented May 16, 2024

True, but it can easily happen and lead to errors like this. This PR just makes LOr and LAnd consistent to how Operation_Relation is initialized.

I'm a little hesitant to endorse changes that just serve to obscure bugs elsewhere in the compiler code. Relation needs an override to set the type to bool regardless of the type of the operands, since that is different from either operand. I could see similarly unconditionally setting the type of LAnd and LOr to bool (and maybe LNot as well?), as they require their operands to be bool. So if we had any types that could be implicitly converted to bool, those conversions should go on the operands of LAnd and LOr, not on the result.

Probably the best way to enforce this is to make the new operator private for any type that supports ::get().

Yes, this seems the way to go to avoid future bugs in the code.

@asl
Copy link
Contributor

asl commented May 16, 2024

Yes, this seems the way to go to avoid future bugs in the code.

+1. All unique nodes should have closed operator new.

@fruffy
Copy link
Collaborator Author

fruffy commented May 17, 2024

I could see similarly unconditionally setting the type of LAnd and LOr to bool (and maybe LNot as well?), as they require their operands to be bool. So if we had any types that could be implicitly converted to bool, those conversions should go on the operands of LAnd and LOr, not on the result.

Yes, the reason I am advocating for this is that relying on pointer comparison to ensure the right type propagates is fragile. In practice, to work around the possibility that you may get a different pointer you will end up having to specifiy IR::Type_Boolean::get() for all instances of these operators anyway.

There is also the clone operator, which throws a wrench into all of this.

I think both things should be done, this PR and the privatization of the new operator for any IR class that uses get(). But the second part might take a little longer because new is so pervasively used in the compiler right now.

@vlstill
Copy link
Contributor

vlstill commented May 20, 2024

There is also the clone operator, which throws a wrench into all of this.

Well if the types were really singletons (there was no reason to have two instances) that clone should just return this I think, but that does not work type-wise.

Another big problem in my option is the source info. Since that is saved in the node, either these singleton-aspiring types can't have source info (it would be a big breaking change to remove IHasSourceInfo from IR::Type), or they can't be singletons.

So in the end, while the aspiration is for these to be singletons, I don't really see how they can be without rewriting large parts of the compiler (and downstream projects). I suggest we compare them semantically and make all IR::Type_Boolean equal (I believe we already ignore src info in comparison).

Also note that by relying more on IR::...::get() deduplication, we are relying more on static variables, which is not nice :-(.

@fruffy
Copy link
Collaborator Author

fruffy commented May 20, 2024

Well if the types were really singletons (there was no reason to have two instances) that clone should just return this I think, but that does not work type-wise.

The problem is that clone returns a mutable IR::Node so you would have to const cast it, which might not be a good idea for a global static unique value.

Also note that by relying more on IR::...::get() deduplication, we are relying more on static variables, which is not nice :-(.

On the other hand since we manage the allocation through get we can easily disable or enable caching now.

@vlstill
Copy link
Contributor

vlstill commented May 21, 2024

Well if the types were really singletons (there was no reason to have two instances) that clone should just return this I think, but that does not work type-wise.

The problem is that clone returns a mutable IR::Node so you would have to const cast it, which might not be a good idea for a global static unique value.

Yes, that is what I meant by "it does not work type-wise". I should be explicit if talking about P4 types or C++ types in implementation of the compiler :-).

Also note that by relying more on IR::...::get() deduplication, we are relying more on static variables, which is not nice :-(.

On the other hand since we manage the allocation through get we can easily disable or enable caching now.

True. This one would be reasonably easy to disable. All the more reason to not count on values being pointer comparable though.

Copy link
Contributor

@vlstill vlstill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ I think this change makes sense. However, the default in Operation_Binary/Operation_Unary only makes sense in some cases, so I suggest we extend this to more nodes:

  • LNot is probably OK as it is homogeneous and so are the remaining unary operations I can see (Neg, UPlus, Cmpl).
  • All the relations (Equ, Neq, Lss, ...) should default to bool, same as LOr/LAnd.
  • The remaining P4C-defined binary operations look OK to me.

@vlstill
Copy link
Contributor

vlstill commented May 21, 2024

I ran into issues where I created a IR::Land(new IR::BoolLiteral(new IR::Type_Boolean(), false), new IR::BoolLiteral(new IR::Type_Boolean(), false)) and the resulting type of the IR::LAnd was IR::Type_Unknown. This happens because https://github.com/p4lang/p4c/blob/main/ir/expression.def#L39 is comparing pointers, not the actual type.

I guess (@ChrisDodd would maybe know as he was probably there when this was designed) this is not problem by itself, as the IR is not typed correctly until a type inference/type checker runs. But when one does this when the IR has already types written in it, it is very easy to break an invariant following passes rely on.

Personally I found it quite tedious that some things rely on types in the IR and some rely on TypeMap so to be on the safe side, one kind of has to have both. It would almost make more sense to me to for passes after type checker to be type preserving (i.e. the IR goes in typed and goes out typed) and we would only validate that the types are still correct. But we would still need type maps to resolve type names, etc so maybe it does not help much.

@ChrisDodd
Copy link
Contributor

I guess (@ChrisDodd would maybe know as he was probably there when this was designed) this is not problem by itself, as the IR is not typed correctly until a type inference/type checker runs. But when one does this when the IR has already types written in it, it is very easy to break an invariant following passes rely on.

Personally I found it quite tedious that some things rely on types in the IR and some rely on TypeMap so to be on the safe side, one kind of has to have both. It would almost make more sense to me to for passes after type checker to be type preserving (i.e. the IR goes in typed and goes out typed) and we would only validate that the types are still correct. But we would still need type maps to resolve type names, etc so maybe it does not help much.

The original design was that the first run of the TypeChecker would insert casts as needed to ensure that everything was correctly typed. Passes that run after should be type correct, which is why subsequent runs of the typechecker run with readOnly = true -- any place then found without correct types should be a BUG.

The type map should not be needed, except to temporarily hold types in the middle of typechecking (which is simplest as a two-step process, first inferring the types of everything in an Inspector, followed by a Transform that rewrites things to have the correct type in the nodes -- this second pass should get rid of Type_Name nodes.) The problem is that this is diffuclt to dump as valid P4 after this transform (due to things being used before they're defined in the resulting dump).

All the relations (Equ, Neq, Lss, ...) should default to bool, same as LOr/LAnd.

The relations all currently set the type to bool unconditionally, and I think LOr/LAnd should do the same (eg, remove the if from this change making the ctor just { type = IR::Boolean::get(); }

@vlstill
Copy link
Contributor

vlstill commented May 21, 2024

The relations all currently set the type to bool unconditionally, and I think LOr/LAnd should do the same (eg, remove the if from this change making the ctor just { type = IR::Boolean::get(); }

Oh right, it is in Operation_Relation, sorry I did not notice it there. Then I agree this should be unconditional.

@fruffy
Copy link
Collaborator Author

fruffy commented May 21, 2024

The relations all currently set the type to bool unconditionally, and I think LOr/LAnd should do the same (eg, remove the if from this change making the ctor just { type = IR::Boolean::get(); }

Oh right, it is in Operation_Relation, sorry I did not notice it there. Then I agree this should be unconditional.

I'd like to push back on the unconditional part a bit. I can see cases where downstream projects may want to supply their own types to a particular node. This will be much harder if those types are overridden during construction. The extracted varbit type P4Tools uses is an example of that.

If we want to fix these nodes, we could just remove the constructor at this point.

@ChrisDodd
Copy link
Contributor

ChrisDodd commented May 21, 2024

I'd like to push back on the unconditional part a bit. I can see cases where downstream projects may want to supply their own types to a particular node. This will be much harder if those types are overridden during construction. The extracted varbit type P4Tools uses is an example of that.

It seems like this would be done by having a pass that modifies the type in the node after typechecking (since otherwise typechecking would override it)? That would not be affected by what is in the ctor. In addition, the "explicit set in the ctor" (as is done in the Relation case) only affects ctors that don't already have an explicit type as an argument to the ctor (eg, a call like new IR::Equ(explicit_type, left, right) will create a node with type = explicit_type, not bool.)

@fruffy
Copy link
Collaborator Author

fruffy commented May 21, 2024

I'd like to push back on the unconditional part a bit. I can see cases where downstream projects may want to supply their own types to a particular node. This will be much harder if those types are overridden during construction. The extracted varbit type P4Tools uses is an example of that.

It seems like this would be done by having a pass that modifies the type in the node after typechecking (since otherwise typechecking would override it)? That would not be affected by what is in the ctor.

I think there are a couple workarounds, either writing a pass that performs type substitution or just modifying the type after, but they seem cumbersome. We can have a restricted interface first and then extend it when required?

In addition, the "explicit set in the ctor" (as is done in the Relation case) only affects ctors that don't already have an explicit type as an argument to the ctor (eg, a call like new IR::Equ(explicit_type, left, right) will create a node with type = explicit_type, not bool.)

When I was looking at the Operation_Relation constructor after it was generated it looked like this:

IR::Operation_Relation::Operation_Relation(const IR::Type* type, const IR::Expression* left, const IR::Expression* right) : Operation_Binary(type, left, right)
{{ type = Type::Boolean::get(); }
    validate(); 
}

So the type is always overridden.

@ChrisDodd
Copy link
Contributor

ChrisDodd commented May 21, 2024

When I was looking at the Operation_Relation constructor after it was generated it looked like this:

IR::Operation_Relation::Operation_Relation(const IR::Type* type, const IR::Expression* left, const IR::Expression* right) : Operation_Binary(type, left, right)
{{ type = Type::Boolean::get(); }
    validate(); 
}

So the type is always overridden.

In this case the parameter type shadows the field type, so the field get initialized in the base constructor, and then the assignment of bool here modifies the parameter (not the field). This assignment will only modify the field when there is no type parameter to shadow it.

This is more of a "happy accident" than anything deliberate in the design of the ir-generator, so might be considered and obfuscation by some, but to the extent that it just "does the right thing" does not require detailed understanding of the details of the ir-generator.

@fruffy
Copy link
Collaborator Author

fruffy commented May 21, 2024

In this case the parameter type shadows the field type, so the field get initialized in the base constructor, and then the assignment of bool here modifies the parameter (not the field). This assignment will only modify the field when there is no type parameter to shadow it.

I see, that is something I didn't consider.

@vlstill
Copy link
Contributor

vlstill commented May 22, 2024

OK, so in the end the conditional (as written now by Fabian) and unconditional setting work basically the same, unless someone wants to explicitly set type explicitly to unknown (which would be very weird). We should probably at least add a comment to the ctor that it sets the value only for the constructor that does not take it.

@fruffy fruffy changed the title Set type for LAnd and LOr to be Type_Boolean if unknown. Set type for LAnd, LOr, LNot to be Type_Boolean if unknown. May 22, 2024
@fruffy fruffy added this pull request to the merge queue May 27, 2024
Merged via the queue into main with commit 6507986 May 27, 2024
17 checks passed
@fruffy fruffy deleted the fruffy/land_lor_booltype branch May 27, 2024 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Topics concerning the core segments of the compiler (frontend, midend, parser)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants