diff --git a/docs/_posts/ahmedlone127/2024-11-26-mini_cpm_2b_8bit_xx.md b/docs/_posts/ahmedlone127/2024-11-26-mini_cpm_2b_8bit_xx.md new file mode 100644 index 00000000000000..60390336578d3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-11-26-mini_cpm_2b_8bit_xx.md @@ -0,0 +1,86 @@ +--- +layout: model +title: mini_cpm_2b_8bit model from +author: John Snow Labs +name: mini_cpm_2b_8bit +date: 2024-11-26 +tags: [en, open_source, pipeline, openvino, xx] +task: Text Generation +language: xx +edition: Spark NLP 5.5.1 +spark_version: 3.0 +supported: true +engine: openvino +annotator: CPMTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CPMTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mini_cpm_2b_8bit` is a multilingual model originally trained by openbmb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mini_cpm_2b_8bit_xx_5.5.1_3.0_1732658809236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mini_cpm_2b_8bit_xx_5.5.1_3.0_1732658809236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +seq2seq = CPMTransformer.pretrained("mini_cpm_2b_8bit","xx") \ + .setInputCols(["documents"]) \ + .setOutputCol("generation") + +pipeline = Pipeline().setStages([documentAssembler, seq2seq]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val seq2seq = CPMTransformer.pretrained("mini_cpm_2b_8bit","xx") + .setInputCols(Array("documents")) + .setOutputCol("generation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, seq2seq)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mini_cpm_2b_8bit| +|Compatibility:|Spark NLP 5.5.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents]| +|Output Labels:|[generation]| +|Language:|xx| +|Size:|3.0 GB| + +## References + +https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-11-27-nllb_distilled_600M_8int_xx.md b/docs/_posts/ahmedlone127/2024-11-27-nllb_distilled_600M_8int_xx.md new file mode 100644 index 00000000000000..fde6f004b253c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-11-27-nllb_distilled_600M_8int_xx.md @@ -0,0 +1,86 @@ +--- +layout: model +title: nllb_distilled_600M_8int model from Facebook +author: John Snow Labs +name: nllb_distilled_600M_8int +date: 2024-11-27 +tags: [en, open_source, pipeline, openvino, xx] +task: Text Generation +language: xx +edition: Spark NLP 5.5.1 +spark_version: 3.0 +supported: true +engine: openvino +annotator: NLLBTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained NLLBTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nllb_distilled_600M_8int` is a Multilingual model originally trained by facebook. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nllb_distilled_600M_8int_xx_5.5.1_3.0_1732741416718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nllb_distilled_600M_8int_xx_5.5.1_3.0_1732741416718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +seq2seq = NLLBTransformer.pretrained("mini_cpm_2b_8bit","xx") \ + .setInputCols(["documents"]) \ + .setOutputCol("generation") + +pipeline = Pipeline().setStages([documentAssembler, seq2seq]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val seq2seq = NLLBTransformer.pretrained("mini_cpm_2b_8bit","xx") + .setInputCols(Array("documents")) + .setOutputCol("generation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, seq2seq)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nllb_distilled_600M_8int| +|Compatibility:|Spark NLP 5.5.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents]| +|Output Labels:|[generation]| +|Language:|xx| +|Size:|842.9 MB| + +## References + +https://huggingface.co/facebook/nllb-200-distilled-600M diff --git a/docs/_posts/ahmedlone127/2024-11-27-nomic_embed_v1_en.md b/docs/_posts/ahmedlone127/2024-11-27-nomic_embed_v1_en.md new file mode 100644 index 00000000000000..b260d1bcdb5b58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-11-27-nomic_embed_v1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: nomic_embed_v1 model from nomic-ai +author: John Snow Labs +name: nomic_embed_v1 +date: 2024-11-27 +tags: [en, open_source, openvino] +task: Embeddings +language: en +edition: Spark NLP 5.5.1 +spark_version: 3.0 +supported: true +engine: openvino +annotator: NomicEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained NomicEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mini_cpm_2b_8bit` is a multilingual model originally trained by openbmb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nomic_embed_v1_en_5.5.1_3.0_1732743647389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nomic_embed_v1_en_5.5.1_3.0_1732743647389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = NomicEmbeddings.pretrained("nomic_embed_v1","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = NomicEmbeddings.pretrained("nomic_embed_v1","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nomic_embed_v1| +|Compatibility:|Spark NLP 5.5.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents]| +|Output Labels:|[generation]| +|Language:|en| +|Size:|255.0 MB| + +## References + +https://huggingface.co/nomic-ai/nomic-embed-text-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-11-29-phi_3_mini_128k_instruct_en.md b/docs/_posts/ahmedlone127/2024-11-29-phi_3_mini_128k_instruct_en.md new file mode 100644 index 00000000000000..fae7ad5900a6be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-11-29-phi_3_mini_128k_instruct_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: phi_3_mini_128k_instruct model from microsoft +author: John Snow Labs +name: phi_3_mini_128k_instruct +date: 2024-11-29 +tags: [en, open_source, openvino] +task: Text Generation +language: en +edition: Spark NLP 5.5.1 +spark_version: 3.0 +supported: true +engine: openvino +annotator: Phi3Transformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Phi3Transformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phi_3_mini_128k_instruct` is a english model originally trained by openbmb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phi_3_mini_128k_instruct_en_5.5.1_3.0_1732897700551.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phi_3_mini_128k_instruct_en_5.5.1_3.0_1732897700551.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +seq2seq = Phi3Transformer.pretrained("phi_3_mini_128k_instruct","en") \ + .setInputCols(["document"]) \ + .setOutputCol("generation") + +pipeline = Pipeline().setStages([documentAssembler, seq2seq]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val seq2seq = Phi3Transformer.pretrained("phi_3_mini_128k_instruct","en") + .setInputCols(Array("document")) + .setOutputCol("generation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, seq2seq)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phi_3_mini_128k_instruct| +|Compatibility:|Spark NLP 5.5.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents]| +|Output Labels:|[generation]| +|Language:|en| +|Size:|3.5 GB| + +## References + +https://huggingface.co/microsoft/Phi-3-mini-128k-instruct diff --git a/docs/_posts/ahmedlone127/2024-11-29-qwen_7.5b_chat_en.md b/docs/_posts/ahmedlone127/2024-11-29-qwen_7.5b_chat_en.md new file mode 100644 index 00000000000000..e91989cf2cb983 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-11-29-qwen_7.5b_chat_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: qwen_7.5b_chat model from Qwen +author: John Snow Labs +name: qwen_7.5b_chat +date: 2024-11-29 +tags: [en, open_source, openvino] +task: Text Generation +language: en +edition: Spark NLP 5.5.1 +spark_version: 3.0 +supported: true +engine: openvino +annotator: QwenTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained QwenTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qwen_7.5b_chat` is a english model originally trained by Qwen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qwen_7.5b_chat_en_5.5.1_3.0_1732900154873.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qwen_7.5b_chat_en_5.5.1_3.0_1732900154873.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +seq2seq = QwenTransformer.pretrained("qwen_7.5b_chat","en") \ + .setInputCols(["document"]) \ + .setOutputCol("generation") + +pipeline = Pipeline().setStages([documentAssembler, seq2seq]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val seq2seq = QwenTransformer.pretrained("qwen_7.5b_chat","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, seq2seq)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qwen_7.5b_chat| +|Compatibility:|Spark NLP 5.5.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents]| +|Output Labels:|[generation]| +|Language:|en| +|Size:|7.0 GB| + +## References + +https://huggingface.co/Qwen/Qwen1.5-7B-Chat \ No newline at end of file diff --git a/docs/_posts/gadde5300/2024-11-20-bert_embeddings_sec_bert_base_en.md b/docs/_posts/gadde5300/2024-11-20-bert_embeddings_sec_bert_base_en.md new file mode 100644 index 00000000000000..da7734ed9dcdbd --- /dev/null +++ b/docs/_posts/gadde5300/2024-11-20-bert_embeddings_sec_bert_base_en.md @@ -0,0 +1,105 @@ +--- +layout: model +title: Financial English BERT Embeddings (Base) +author: John Snow Labs +name: bert_embeddings_sec_bert_base +date: 2024-11-20 +tags: [financial, bert, en, embeddings, open_source, tensorflow] +task: Embeddings +language: en +edition: Spark NLP 5.5.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Financial Pretrained BERT Embeddings model, uploaded to Hugging Face, adapted and imported into Spark NLP. `sec-bert-base` is a English model orginally trained by `nlpaueb`. This is the reference base model, what means it uses the same architecture as BERT-BASE trained on financial documents. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_embeddings_sec_bert_base_en_5.5.1_3.0_1732064992710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_embeddings_sec_bert_base_en_5.5.1_3.0_1732064992710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ +.setInputCol("text") \ +.setOutputCol("document") + +tokenizer = Tokenizer() \ +.setInputCols("document") \ +.setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \ +.setInputCols(["document", "token"]) \ +.setOutputCol("embeddings") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings]) + +data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() +.setInputCol("text") +.setOutputCol("document") + +val tokenizer = new Tokenizer() +.setInputCols(Array("document")) +.setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") +.setInputCols(Array("document", "token")) +.setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) + +val data = Seq("I love Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.embed.sec_bert_base").predict("""I love Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_embeddings_sec_bert_base| +|Compatibility:|Spark NLP 5.5.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|409.4 MB| +|Case sensitive:|true| + +## References + +- https://huggingface.co/nlpaueb/sec-bert-base +- https://arxiv.org/abs/2203.06482 +- http://nlp.cs.aueb.gr/ \ No newline at end of file