AutoML is not working properly with large volume of data(30 Million rows with 150 features) #5747

aniketaitawade · 2024-11-07T09:54:00Z

Sparkling Water Version

3.5

Issue description

Expected behavior:
Sparkling water can train individual models like XGBoost then it should also run for automl api.
Observed behavior:
Sparkling water can train individual models like XGBoost but fail to run with automl api.

Programming language used

Python

Programming language version

3.11

What environment are you running Sparkling Water on?

Cloud Managed Spark (like Databricks, AWS Glue)

Environment version info

15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12)

Brief cluster specification

Runtime 15.4.x-scala2.12, 1 Driver with 64 GB Memory, 8 Cores, 7 Workers with 64 GB Memory 8 Cores

Relevant log output

Dont have any error logs as process continues for long time.

Code to reproduce the issue

No response

aniketaitawade changed the title ~~AutoML is not working the data of 30 Million rows with 150 features~~ AutoML is not working properly with large volume of data(30 Million rows with 150 features) Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoML is not working properly with large volume of data(30 Million rows with 150 features) #5747

AutoML is not working properly with large volume of data(30 Million rows with 150 features) #5747

aniketaitawade commented Nov 7, 2024

AutoML is not working properly with large volume of data(30 Million rows with 150 features) #5747

AutoML is not working properly with large volume of data(30 Million rows with 150 features) #5747

Comments

aniketaitawade commented Nov 7, 2024

Sparkling Water Version

Issue description

Programming language used

Programming language version

What environment are you running Sparkling Water on?

Environment version info

Brief cluster specification

Relevant log output

Code to reproduce the issue