Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support unnest inside aggregate #9048

Open
1 task done
amoeba opened this issue Apr 24, 2024 · 1 comment
Open
1 task done

feat: support unnest inside aggregate #9048

amoeba opened this issue Apr 24, 2024 · 1 comment
Labels
feature Features or general enhancements

Comments

@amoeba
Copy link
Contributor

amoeba commented Apr 24, 2024

Is your feature request related to a problem?

No response

What is the motivation behind your request?

In older versions of Ibis (I think before the sqlglot refactor?), I was able to use unnest inside my aggregate calls. Something this once worked:

t = ibis.memtable({"id": [1, 1,2], "name":["a", "a", "b"]})
t
┏━━━━━━━┳━━━━━━━━┓
┃ idname   ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1a      │
│     1a      │
│     2b      │
└───────┴────────┘
t.group_by(t.id).aggregate(name=t.name.collect().unique().unnest())

At least with Ibis 8.0.0, it gives an error which includes text like "is not a tuple of coercibles to a Value[+T, Scalar]" presumably because it's getting the wrong shape. The direct solution today is to move the unnest operation into a mutate call,

t.group_by(t.id).aggregate(name=t.name.collect().unique()).mutate(name=_.name.unnest())

However, this type of operation composes pretty naturally with DuckDB and I don't have to think about breaking the unnest call out from the rest of my aggregation:

CREATE TABLE t (id INTEGER, name TEXT);
INSERT INTO t (id, name) VALUES (1, 'a'), (1, 'a'), (2, 'b');
SELECT id, unnest(list_distinct(list(name))) as name FROM t GROUP BY id;
┌───────┬─────────┐
│  id   │  name   │
│ int32 │ varchar │
├───────┼─────────┤
│     1 │ a       │
│     2 │ b       │
└───────┴─────────┘

Describe the solution you'd like

It would be nice if a more actionable error were produced or if this was just possible without any changes.

What version of ibis are you running?

8.0.0

What backend(s) are you using, if any?

DuckDB, memtable

Code of Conduct

  • I agree to follow this project's Code of Conduct
@amoeba amoeba added the feature Features or general enhancements label Apr 24, 2024
@ncclementi
Copy link
Contributor

@amoeba thanks for the report, I can reproduce this on main too. Not sure what's happening yet, but leaving the traceback here for future reference

SignatureValidationError                  Traceback (most recent call last)
Cell In[4], line 1
----> 1 t.group_by(t.id).aggregate(name=t.name.collect().unique().unnest())

File ~/Documents/git/my_forks/ibis/ibis/expr/types/groupby.py:68, in GroupedTable.aggregate(self, *metrics, **kwds)
     66 def aggregate(self, *metrics, **kwds) -> ir.Table:
     67     """Compute aggregates over a group by."""
---> 68     return self.table.to_expr().aggregate(
     69         metrics, by=self.groupings, having=self.havings, **kwds
     70     )

File ~/Documents/git/my_forks/ibis/ibis/expr/types/relations.py:1159, in Table.aggregate(self, metrics, by, having, **kwargs)
   1156             metrics[metric.name] = metric
   1158 # construct the aggregate node
-> 1159 agg = ops.Aggregate(node, groups, metrics).to_expr()
   1161 if having:
   1162     # apply the having clause
   1163     agg = agg.filter(*having)

File ~/Documents/git/my_forks/ibis/ibis/common/bases.py:72, in AbstractMeta.__call__(cls, *args, **kwargs)
     52 def __call__(cls, *args, **kwargs):
     53     """Create a new instance of the class.
     54 
     55     The subclass may override the `__create__` classmethod to change the
   (...)
     70 
     71     """
---> 72     return cls.__create__(*args, **kwargs)

File ~/Documents/git/my_forks/ibis/ibis/common/grounds.py:119, in Annotable.__create__(cls, *args, **kwargs)
    116 @classmethod
    117 def __create__(cls, *args: Any, **kwargs: Any) -> Self:
    118     # construct the instance by passing only validated keyword arguments
--> 119     kwargs = cls.__signature__.validate(cls, args, kwargs)
    120     return super().__create__(**kwargs)

File ~/Documents/git/my_forks/ibis/ibis/common/annotations.py:497, in Signature.validate(self, func, args, kwargs)
    494         this[name] = result
    496 if errors:
--> 497     raise SignatureValidationError(
    498         "{call} has failed due to the following errors:{errors}\n\nExpected signature: {sig}",
    499         sig=self,
    500         func=func,
    501         args=args,
    502         kwargs=kwargs,
    503         errors=errors,
    504     )
    506 return this

SignatureValidationError: Aggregate(<ibis.expr.operations.relations.InMemoryTable object at 0x11fff1d50>, {'id': <ibis.expr.operations.relations.Field object at 0x1058a3880>}, {'name': <ibis.expr.operations.arrays.Unnest object at 0x12c1c32e0>}) has failed due to the following errors:
  `metrics`: {'name': <ibis.expr.operations.arrays.Unnest object at 0x12c1c32e0>} is not matching GenericMappingOf(key=InstanceOf(type=<class 'str'>), value=AllOf(patterns=(GenericCoercedTo(origin=<class 'ibis.expr.operations.core.Value'>, params={'T': +T, 'S': <class 'ibis.expr.datashape.Scalar'>}, checker=GenericInstanceOf(type=ibis.expr.operations.core.Value[+T, ibis.expr.datashape.Scalar], origin=<class 'ibis.expr.operations.core.Value'>, fields={'dtype': CoercedTo(type=<class 'ibis.expr.datatypes.core.DataType'>, func=<bound method DataType.__coerce__ of <class 'ibis.expr.datatypes.core.DataType'>>), 'shape': InstanceOf(type=<class 'ibis.expr.datashape.Scalar'>)})), Not(pattern=InstanceOf(type=<class 'ibis.expr.operations.core.Alias'>)))), type=CoercedTo(type=<class 'ibis.common.collections.FrozenDict'>, func=<class 'ibis.common.collections.FrozenDict'>))

Expected signature: Aggregate(parent: Relation, groups: FrozenDict[str, Annotated[Value[+T, Columnar], Not(pattern=InstanceOf(type=<class 'Alias'>))]], metrics: FrozenDict[str, Annotated[Value[+T, Scalar], Not(pattern=InstanceOf(type=<class 'Alias'>))]])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Status: backlog
Development

No branches or pull requests

2 participants