Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoML is not working properly with large volume of data(30 Million rows with 150 features) #5747

Open
aniketaitawade opened this issue Nov 7, 2024 · 0 comments

Comments

@aniketaitawade
Copy link

Sparkling Water Version

3.5

Issue description

Expected behavior:
Sparkling water can train individual models like XGBoost then it should also run for automl api.
Observed behavior:
Sparkling water can train individual models like XGBoost but fail to run with automl api.

Programming language used

Python

Programming language version

3.11

What environment are you running Sparkling Water on?

Cloud Managed Spark (like Databricks, AWS Glue)

Environment version info

15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12)

Brief cluster specification

Runtime 15.4.x-scala2.12, 1 Driver with 64 GB Memory, 8 Cores, 7 Workers with 64 GB Memory 8 Cores

Relevant log output

Dont have any error logs as process continues for long time.

Code to reproduce the issue

No response

@aniketaitawade aniketaitawade changed the title AutoML is not working the data of 30 Million rows with 150 features AutoML is not working properly with large volume of data(30 Million rows with 150 features) Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant