Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pyspark): support udaf #9173

Open
1 task done
ted0928 opened this issue May 11, 2024 · 0 comments · May be fixed by #9206
Open
1 task done

feat(pyspark): support udaf #9173

ted0928 opened this issue May 11, 2024 · 0 comments · May be fixed by #9206
Labels
feature Features or general enhancements

Comments

@ted0928
Copy link
Contributor

ted0928 commented May 11, 2024

Is your feature request related to a problem?

No response

What is the motivation behind your request?

I was trying to apply a user-defined aggregate function to a groupped table.
but only builtin supported in @ibis.udf.agg.

And i use the deprecated annotaion reduction, magically it works!

import ibis
from pyspark.sql import SparkSession
from ibis.legacy.udf.vectorized import reduction

@reduction(output_type=ibis.dtype("float"), input_type=[ibis.dtype("int32")])
def avg(x) -> float:
    return x.mean()

ibis.options.interactive = True
ibis.options.verbose = True
spark = SparkSession.builder \
    .getOrCreate()
connection = ibis.pyspark.connect(spark)

df = connection.create_view('source', ibis.memtable(dict(id1=[1, 2, 3, 1, 2, 1], id2=[4, 5, 6, 2, 3, 4])))
df = df.group_by(df.id1).aggregate(avg_id2=avg(df.id2))
print(df)
SELECT `t0`.`id1`, IBIS_UDF_AVG_12861BCE(`t0`.`id2`) AS `avg_id2` FROM `source` AS `t0` GROUP BY 1 LIMIT 11
┏━━━━━━━┳━━━━━━━━━━┓
┃ id1   ┃ avg_id2  ┃
┡━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64  │
├───────┼──────────┤
│     13.333333 │
│     24.000000 │
│     36.000000 │
└───────┴──────────┘

Describe the solution you'd like

So is there any plan to migrate this annotation to new @ibis.udf.agg ?

What version of ibis are you running?

main

What backend(s) are you using, if any?

pyspark

Code of Conduct

  • I agree to follow this project's Code of Conduct
@ted0928 ted0928 added the feature Features or general enhancements label May 11, 2024
@ted0928 ted0928 linked a pull request May 17, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Status: backlog
Development

Successfully merging a pull request may close this issue.

1 participant