Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent typeclass instance self-reference (#3429). #3638

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

matthew-hilty
Copy link
Contributor

@matthew-hilty matthew-hilty commented May 13, 2019

Resolves #3429.

Previously, non-function members of constrained typeclass instances could reference each other. Because members also implicitly reference the overarching typeclass dictionary, the resulting cycle in a typeclass declaration between a dictionary and its members can cause runtime misbehavior.

For example, @LiamGoodacre, in #3429, documents the following problematic typeclass declarations:

class Trivial
instance trivial :: Trivial

instance what :: Trivial => C Int where
  foo = 0
  bar = foo

Because what and foo reference each other in the following generated javascript, any invocation of what causes a stack-overflow error.

var foo = function (dict) {
    return dict.foo;
};
var what = function (dictTrivial) {
    return new C(foo(what(dictTrivial)), 0);
};

The previously established procedure for preventing cycles in typeclass declarations fails in @LiamGoodacre's scenario because, in order to allow implementations of superclass methods to depend on subclass implementations (cf. functorEffect and applicativeEffect), the desugaring process in "Sugar/BindingGroups.hs" deliberately allows cyclical references if they occur inside lambda abstractions. And since, during type elaboration, constrained typeclass dictionaries are represented as abstractions, dictionary cycles fail to trigger a "CycleInDeclaration" error.

Code changes in this pull request fix issue #3429 by adding a new cycle check that is specific to typeclass value declarations. Unlike the primary check, it disallows self-reference even if it occurs within abstractions outside literal objects. Therefore, the implicit cycle in the what instance above (and the explicit cycle in the desugared "pseudo"-purescript representation below) induces an error.

what :: Trivial -> C Int
what dictTrivial = { foo: 0, bar: (what dictTrivial).foo }

Like the primary cycle check, the new check permits self-reference if it occurs within a bare abstraction of an object member. For example, the following class and instance successfully compile.

class C a where
  c0 :: Int
  c1 :: Unit -> Int

instance cInt :: C Int where
  c0 = 0
  c1 _ = c0

Or in pseudo-purescript:

cInt :: C Int
cInt = { c0: 0, c1: \_ -> cInt.c0 }

However, unlike the primary cycle check, self-reference in purescript typeclass "IFFE"s is now considered invalid. This is designed to prevent the following scenario.

class B
instance b :: B

class C a where
  c0 :: a
  c1 :: a

instance cInt :: C Int where
  c0 = 0
  c1 :: B => Int
  c1 = c0

The corresponding pseudo-purescript may make the reason for proscription clearer.

b :: B
b = {}

cInt :: C Int
cInt = { c0: 0, c1: (\dictB -> cInt.c0) b }

Constrained typeclass members, like constrained typeclasses, are represented as abstractions. However, they are also immediately invoked, and so the same kind of deviant runtime behavior as documented by @LiamGoodacre can occur unless screened.


Typeclass self-reference, even with the introduced cyclicality check, is still possible. The earlier case

instance cInt :: C Int where
  c0 = 0
  c1 _ = c0

is one example. Other possibilities arise when the members of class declarations are constrained.

For instance, the following purescript

class B
instance b :: B

class C a where
  c :: B => a

instance cInt :: C Int where
  c = c

or pseudo-purescript,

type C a = { c :: B -> a }
cInt :: C Int
cInt = { c: \b -> cInt.c b }

despite its pronounced and unhelpful self-reference, compiles successfully. Unlike constrained instance members, constrained class members are not immediately invoked, and therefore, the abstraction exception for self-reference takes precedence.

This might be considered a distasteful loophole by the cyclicality screening procedure. Similar cases unrelated to constrained typeclasses already exist, however. Consider the following unconstrained instance.

class C a where
  f :: a -> a

instance cInt :: C Int where
  f x = f x

Or even functions independent of typeclasses altogether like g below.

g :: Int -> Int
g x = g x

In addition to more robust cycle prevention, this pull request also improves error messages for end users.

Error messages for typeclasses now specify the individual members of an instance declaration that contribute to the error -- as well as the members' locations in source code.

For example,

class C a where
  c0 :: a
  c1 :: a
  c2 :: a
  c3 :: a
  c4 :: a

instance cInt :: C Int where
  c0 = 0
  c1 = c0
  c2 = c0
  c3 = c0
  c4 = c0

induces an error message like the following.

Error 1 of 2:

  at src1/Module1.purs:20:1 - 20:52 (line 20, column 1 - line 20, column 52)

    The definition of instance cInt is invalid because of cyclical dependencies.

    In particular, its following members implicitly reference the instance itself.
      value "c1" at src1/Module1.purs:22:3 - 22:10 (line 22, column 3 - line 22, column 10)
      value "c2" at src1/Module1.purs:23:3 - 23:10 (line 23, column 3 - line 23, column 10)
      value "c3" at src1/Module1.purs:24:3 - 24:10 (line 24, column 3 - line 24, column 10)
      value "c4" at src1/Module1.purs:25:3 - 30:28 (line 25, column 3 - line 30, column 28)

Additionally, error messages distinguish between instance members that are functions and those that aren't in order to provide additional guidance to end users.

For example,

class C a where
  c0 :: a -> a
  c1 :: a -> a

instance cInt :: C Int where
  c0 _ = 0
  c1 = c0

induces an error message like the following.

Error 2 of 2:

  at src1/Module2.purs:12:1 - 12:25 (line 12, column 1 - line 12, column 25)

    The definition of instance cInt is invalid because of cyclical dependencies.

    In particular, its member
      function "c1" at src1/Module2.purs:13:3 - 13:10 (line 13, column 3 - line 13, column 10)
    implicitly references the instance itself.

    Note that cycles in the member functions of cInt may lead to non-terminating runtime behavior.

    Consider replacing the functions' circular dependencies with independent terms.

    If their definitions cannot be rewritten, eta-expansion is necessary to accommodate purescript's non-strict style of evaluation.

@garyb
Copy link
Member

garyb commented May 20, 2019

Sorry about the lack of feedback on this so far. It looks interesting, but also tricky, so I've not had a chance to check out what's going on yet. We're mainly focusing on planned breaking changes to get stuff in shape for the 0.13 release, so it might not get a look until after that.

@@ -332,6 +332,7 @@ everythingWithContextOnValues
-> (r -> r -> r)
-> (s -> Declaration -> (s, r))
-> (s -> Expr -> (s, r))
-> (s -> Expr -> (s, r))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document the change to this traversal in a comment? It wouldn't be clear to me just seeing this signature why we have a seemingly duplicated parameter like this that isn't present in other traversals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense. Documentation would definitely be helpful here.

Any thoughts on the following documentation attempt? Suggestions for making it clearer (more accurate) (more relevant) for example?

-- |
-- A fold for paramorphisms associated with (1) an initial object represented
-- by the coproduct 'Node'
-- @
--   data Node
--     = A Declaration
--     | NonLiteral Expr
--     | Literal Expr
--     | B Binder
--     | C CaseAlternative
--     | D DoNotationElement
-- @
-- and (2) the functorial context
-- > type Context s r = (s, r, r -> r -> r, s -> Node -> (s, r))
--
-- Given an initial state, a default output value, a binary action on the
-- output type, and six independent state-transition transformers (one each for
-- 'Declaration's, 'Binder's, 'CaseAlternative's, and 'DoNotationElement's; and
-- two for 'Expr's), determine five corresponding mutually recursive data-
-- gathering functions that generate "measurements" of type 'r' for values of
-- any of the constituent types of the coproduct 'Node'.
--
-- Two input functions for 'Expr' are required in order to allow distinguishing
-- of values inside 'Literal' 'Expr's from values independent of literal objects
-- and arrays. (The function 'immediateLitIdentsAndAllOtherIdents' in module
-- 'Language.PureScript.Sugar.BindingGroups' is an example of a client for
-- this feature.)
--
everythingWithContextOnValues
  :: forall s r
   . s
  -> r
  -> (r -> r -> r)
  -> (s -> Declaration       -> (s, r))
  -> (s -> Expr              -> (s, r))
  -- ^ Transformer of 'Expr' nodes without 'Literal' ancestors
  -> (s -> Expr              -> (s, r))
  -- ^ Transformer of 'Expr' nodes strictly dominated by a 'Literal' 'Expr'
  -> (s -> Binder            -> (s, r))
  -> (s -> CaseAlternative   -> (s, r))
  -> (s -> DoNotationElement -> (s, r))
  -> ( Declaration       -> r
     , Expr              -> r
     , Binder            -> r
     , CaseAlternative   -> r
     , DoNotationElement -> r)

(Thanks for your feedback by the way.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize it's probably a little obnoxious to necro-bump a PR without actually reviewing the important part of it, so sorry; whether this approach is a good idea or not is above my pay grade. But I wanted to register my confusion with respect to this change to everythingWithContextOnValues. Why is this necessary? Couldn't the condition of being inside a Literal be included as part of the context? I realize that would make the implementation of immediateLitIdentsAndAllOtherIdents more complicated but I don't (yet?) buy that this belongs in a generic traversal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Uncaught instance self-reference
4 participants