Skip to content

Commit dfae801

Browse files
authored
[SPARKNLP-1095] Add installation instructions for ONNX GPU on Databricks (#14451)
1 parent 38dfa46 commit dfae801

File tree

1 file changed

+32
-38
lines changed

1 file changed

+32
-38
lines changed

docs/en/install.md

Lines changed: 32 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -620,6 +620,8 @@ pointed [here](#python-without-explicit-pyspark-installation)
620620
621621
## Databricks Cluster
622622
623+
### Install Spark NLP on Databricks
624+
623625
1. Create a cluster if you don't have one already
624626

625627
2. On a new cluster or existing one you need to add the following to the `Advanced Options -> Spark` tab:
@@ -631,15 +633,37 @@ pointed [here](#python-without-explicit-pyspark-installation)
631633

632634
3. In `Libraries` tab inside your cluster you need to follow these steps:
633635

634-
3.1. Install New -> PyPI -> `spark-nlp==5.5.1` -> Install
636+
3.1. Install New -> PyPI -> `spark-nlp==5.5.1` -> Install
635637

636-
3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.5.1` -> Install
638+
3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.5.1` -> Install
637639

638640
4. Now you can attach your notebook to the cluster and use Spark NLP!
639641

640-
NOTE: Databricks' runtimes support different Apache Spark major releases. Please make sure you choose the correct Spark
641-
NLP Maven package name (Maven Coordinate) for your runtime from
642-
our [Packages Cheatsheet](https://github.com/JohnSnowLabs/spark-nlp#packages-cheatsheet)
642+
NOTE: Databricks' runtimes support different Apache Spark major releases. Please make sure you choose the correct Spark NLP Maven package name (Maven Coordinate) for your runtime from our [Packages Cheatsheet](https://github.com/JohnSnowLabs/spark-nlp#packages-cheatsheet)
643+
644+
#### ONNX GPU Inference on Databricks
645+
646+
To run infer ONNX models with GPU on Databricks clusters, we need to perform some additional setup steps. ONNX requires CUDA 12 and cuDNN 9 to be installed.
647+
648+
Therefore, we need to use Databricks runtimes starting from version 15, as these come with CUDA 12. However, they come with cuDNN 8, which we need to upgrade manually.
649+
To do so, we have to add the following script as an [init script](https://docs.databricks.com/en/init-scripts/index.html):
650+
651+
```bash
652+
#!/bin/bash
653+
sudo apt-get update && sudo apt-get -y install cudnn9-cuda-12
654+
```
655+
656+
You need to save this script to a shell script file (i.e. `upgrade-cudnn9.sh`) in your workspace. Afterwards, you need to specify it on your compute resource under the *Advanced options* section. cuDNN will be upgraded to version 9 on all nodes before Spark is started.
657+
658+
</div><div class="h3-box" markdown="1">
659+
660+
### Databricks Notebooks
661+
662+
You can view all the Databricks notebooks from this address:
663+
664+
[https://johnsnowlabs.github.io/spark-nlp-workshop/databricks/index.html](https://johnsnowlabs.github.io/spark-nlp-workshop/databricks/index.html)
665+
666+
Note: You can import these notebooks by using their URLs.
643667
644668
</div><div class="h3-box" markdown="1">
645669
@@ -849,6 +873,8 @@ Spark NLP 5.5.1 has been tested and is compatible with the following runtimes:
849873
- 14.0 ML
850874
- 14.1
851875
- 14.1 ML
876+
- 15.x
877+
- 15.x ML
852878
853879
**GPU:**
854880
@@ -871,39 +897,7 @@ Spark NLP 5.5.1 has been tested and is compatible with the following runtimes:
871897
- 13.3 ML & GPU
872898
- 14.0 ML & GPU
873899
- 14.1 ML & GPU
874-
875-
</div><div class="h3-box" markdown="1">
876-
877-
#### Install Spark NLP on Databricks
878-
879-
1. Create a cluster if you don't have one already
880-
881-
2. On a new cluster or existing one you need to add the following to the `Advanced Options -> Spark` tab:
882-
883-
```bash
884-
spark.kryoserializer.buffer.max 2000M
885-
spark.serializer org.apache.spark.serializer.KryoSerializer
886-
```
887-
888-
3. In `Libraries` tab inside your cluster you need to follow these steps:
889-
890-
3.1. Install New -> PyPI -> `spark-nlp` -> Install5.5.1
891-
892-
3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.5.1` -> Install
893-
894-
4. Now you can attach your notebook to the cluster and use Spark NLP!
895-
896-
NOTE: Databrick's runtimes support different Apache Spark major releases. Please make sure you choose the correct Spark NLP Maven pacakge name (Maven Coordinate) for your runtime from our [Packages Cheatsheet](https://github.com/JohnSnowLabs/spark-nlp#packages-cheatsheet)
897-
898-
</div><div class="h3-box" markdown="1">
899-
900-
#### Databricks Notebooks
901-
902-
You can view all the Databricks notebooks from this address:
903-
904-
[https://johnsnowlabs.github.io/spark-nlp-workshop/databricks/index.html](https://johnsnowlabs.github.io/spark-nlp-workshop/databricks/index.html)
905-
906-
Note: You can import these notebooks by using their URLs.
900+
- 15.x ML & GPU
907901
908902
</div><div class="h3-box" markdown="1">
909903

0 commit comments

Comments
 (0)