SparkPipelineSparkNLP

Build & Convert a Spark NLP Pipeline to PMML

Spark NLP Pipeline on Tweets

Language: Scala Requirements:

Author Ian Brooks
Follow [LinkedIn - Ian Brooks PhD] (https://www.linkedin.com/in/ianrbrooksphd/)
HCC Article: [Link] (https://community.hortonworks.com/articles/208569/build-and-convert-a-spark-nlp-pipeline-into-pmml-i.html)

Instructions:

Please follow this tutorial to build the Solr collection 'tweets'
Upload the notebook (JSON File) to Apache Zeppelin
Match the version of Spark with the SolrSpark Connector. The version list is included in here
Review Spark Core NLP's API which creates Spark wrapper to the Stanford CoreNLP library
In the Stanford Core NLP download found here http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip, find the stanford-corelop-*-models.jar and copy it to the /tmp directory. In Zeppelin's Interpreters configurations for Spark, include the following artifact: /tmp/stanford-corenlp-full-2018-02-27/stanford-corenlp-3.9.1-models.jar
Review the libraries JPMML-Spark ML and JPMML-Model library found here https://github.com/jpmml/jpmml-sparkml and https://github.com/jpmml/jpmml-model

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
SparkPipelineNLP.json		SparkPipelineNLP.json
Twitter_Solr_SparkPipeLine.xml		Twitter_Solr_SparkPipeLine.xml