Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem in verifying the RocksDB StateStore #2

Open
Nayaamar opened this issue Mar 29, 2023 · 0 comments
Open

Problem in verifying the RocksDB StateStore #2

Nayaamar opened this issue Mar 29, 2023 · 0 comments

Comments

@Nayaamar
Copy link

Nayaamar commented Mar 29, 2023

Hi,
I wanted to test the RocksDB StateStore implementation and check if it really resists the OOM exception.
before I explain my issue.... I compiled the spark version 3.2.0, I'm running it on linux manjaro, I wrote the test application in java and I tested it with spark-submit, my system has 2 cores and 4 logical cores, 16 GB Ram.

My test application is a simple word count that is the sample in structured streaming programming guide page. I also wrote a server in python that sends many random words to the client which is connected to the python listening socket.

My problem is that when I send the words to the spark application (my word count app) it throws OOM exception in both using RocksDBStateStore and HDFSStateStore. What is the problem?! Am I making a mistake in running the application?!

Config of SparkSession

SparkSession spark = SparkSession
                .builder()
                .appName("JavaStructuredNetworkWordCount")
                .config("spark.sql.streaming.stateStore.providerClass",
                        "org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider")
                .config("spark.local.dir", "/home/username/sparkTemp")

                .config("spark.executor.memory", "15g")
                .config("spark.driver.memory", "15g")
                .config("spark.memory.offHeap.enabled", true)
                .config("spark.memory.offHeap.use", true)
                .config("spark.memory.offHeap.size", "50g")
                .config("spark.executor.memoryOverhead", "50g")
                .config("spark.sql.shuffle.partitions", 8)
                .config("spark.sql.streaming.stateStore.rocksdb.trackTotalNumberOfRows", false)
                .getOrCreate();

Execution command
/path/to/spark-submit --master local[*] --deploy-mode client --class org.example.Test4 --name Run /path/to/Test4-1.0-SNAPSHOT.jar --driver-memory 15g --executor-memory 15g

Thanks for help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant