-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NER pipeline Not scaling up to use the full cluster nodes #14121
Comments
Any news here? |
I recommend watching this Webinar, scaling Apache Spark is independent from Spark NLP. You should follow the general "tuning and sizing your cluster" advice in order to utilize all your executors. Since Spark NLP is a native extension of Apache Spark, any recommendation works for this library as well. |
Great! Thank you very much Maziyar. I have watched the webinar and came out with great insights. The issue here is not the speed optimization but why sparknlp NER pipeline is not fully utilizing the cluster and using only one worker? Once this issue is solved, I will work on optimizing the spark application. |
Is there an existing issue for this?
Who can help?
No response
What are you working on?
I am using the spark-nlp for NER detection on Azure databricks cluster. The cluster is made of 5 nodes. But when running the job it is not scaling up to use the full cluster and uses only a single node. It seems that the NER pipeline does not parallelize and only runs on a single node.
Current Behavior
The NER pipeline uses only one node of the available 5 nodes.
Expected Behavior
The expected behavior is to fully run on all the worker nodes.
Steps To Reproduce
Spark NLP version and Apache Spark
spark-nlp==5.1.4
spark==3.4.1
com.johnsnowlabs.nlp:spark-nlp_2.12:5.1.4
Working on Databricks
Type of Spark Application
Python Application
Java Version
8
Java Home Directory
/usr/lib/jvm/zulu8-ca-amd64/jre/
Setup and installation
Pypi
Operating System and Version
No response
Link to your project (if available)
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: