Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default name updates #14469

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions python/sparknlp/annotator/embeddings/nomic_embeddings.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ class NomicEmbeddings(AnnotatorModel, HasEmbeddingsProperties, HasCaseSensitiveP
... .setOutputCol("nomic_embeddings")


The default model is ``"nomic_small"``, if no name is provided.
The default model is ``"nomic_embed_v1"``, if no name is provided.

For available pretrained models please see the
`Models Hub <https://sparknlp.org/models?q=Nomic>`__.
Expand Down Expand Up @@ -159,13 +159,13 @@ def loadSavedModel(folder, spark_session, use_openvino=False):
return NomicEmbeddings(java_model=jModel)

@staticmethod
def pretrained(name="nomic_small", lang="en", remote_loc=None):
def pretrained(name="nomic_embed_v1", lang="en", remote_loc=None):
"""Downloads and loads a pretrained model.

Parameters
----------
name : str, optional
Name of the pretrained model, by default "nomic_small"
Name of the pretrained model, by default "nomic_embed_v1"
lang : str, optional
Language of the pretrained model, by default "en"
remote_loc : str, optional
Expand Down
10 changes: 5 additions & 5 deletions python/sparknlp/annotator/seq2seq/cpm_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ class CPMTransformer(AnnotatorModel, HasBatchedAnnotate, HasEngine):
... .setOutputCol("generation")


The default model is ``"llam2-7b"``, if no name is provided. For available
The default model is ``"mini_cpm_2b_8bit"``, if no name is provided. For available
pretrained models please see the `Models Hub
<https://sparknlp.org/models?q=cpm>`__.

Expand Down Expand Up @@ -104,7 +104,7 @@ class CPMTransformer(AnnotatorModel, HasBatchedAnnotate, HasEngine):
>>> documentAssembler = DocumentAssembler() \\
... .setInputCol("text") \\
... .setOutputCol("documents")
>>> cpm = CPMTransformer.pretrained("llama_2_7b_chat_hf_int4") \\
>>> cpm = CPMTransformer.pretrained("mini_cpm_2b_8bit","xx") \\
... .setInputCols(["documents"]) \\
... .setMaxOutputLength(50) \\
... .setOutputCol("generation")
Expand Down Expand Up @@ -299,15 +299,15 @@ def loadSavedModel(folder, spark_session, use_openvino = False):
return CPMTransformer(java_model=jModel)

@staticmethod
def pretrained(name="llama_2_7b_chat_hf_int4", lang="en", remote_loc=None):
def pretrained(name="mini_cpm_2b_8bit", lang="xx", remote_loc=None):
"""Downloads and loads a pretrained model.

Parameters
----------
name : str, optional
Name of the pretrained model, by default "llama_2_7b_chat_hf_int4"
Name of the pretrained model, by default "mini_cpm_2b_8bit"
lang : str, optional
Language of the pretrained model, by default "en"
Language of the pretrained model, by default "xx"
remote_loc : str, optional
Optional remote address of the resource, by default None. Will use
Spark NLPs repositories otherwise.
Expand Down
8 changes: 4 additions & 4 deletions python/sparknlp/annotator/seq2seq/nllb_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ class NLLBTransformer(AnnotatorModel, HasBatchedAnnotate, HasEngine):
... .setOutputCol("generation")


The default model is ``"nllb_418M"``, if no name is provided. For available
The default model is ``"nllb_distilled_600M_8int"``, if no name is provided. For available
pretrained models please see the `Models Hub
<https://sparknlp.org/models?q=nllb>`__.

Expand Down Expand Up @@ -164,7 +164,7 @@ class NLLBTransformer(AnnotatorModel, HasBatchedAnnotate, HasEngine):
>>> documentAssembler = DocumentAssembler() \\
... .setInputCol("text") \\
... .setOutputCol("documents")
>>> nllb = NLLBTransformer.pretrained("nllb_418M") \\
>>> nllb = NLLBTransformer.pretrained("nllb_distilled_600M_8int") \\
... .setInputCols(["documents"]) \\
... .setMaxOutputLength(50) \\
... .setOutputCol("generation") \\
Expand Down Expand Up @@ -398,13 +398,13 @@ def loadSavedModel(folder, spark_session, use_openvino=False):
return NLLBTransformer(java_model=jModel)

@staticmethod
def pretrained(name="nllb_418M", lang="xx", remote_loc=None):
def pretrained(name="nllb_distilled_600M_8int", lang="xx", remote_loc=None):
"""Downloads and loads a pretrained model.

Parameters
----------
name : str, optional
Name of the pretrained model, by default "nllb_418M"
Name of the pretrained model, by default "nllb_distilled_600M_8int"
lang : str, optional
Language of the pretrained model, by default "en"
remote_loc : str, optional
Expand Down
8 changes: 4 additions & 4 deletions python/sparknlp/annotator/seq2seq/phi3_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ class Phi3Transformer(AnnotatorModel, HasBatchedAnnotate, HasEngine):
... .setOutputCol("generation")


The default model is ``"phi3"``, if no name is provided. For available
The default model is ``phi_3_mini_128k_instruct``, if no name is provided. For available
pretrained models please see the `Models Hub
<https://sparknlp.org/models?q=phi3>`__.

Expand Down Expand Up @@ -112,7 +112,7 @@ class Phi3Transformer(AnnotatorModel, HasBatchedAnnotate, HasEngine):
>>> documentAssembler = DocumentAssembler() \\
... .setInputCol("text") \\
... .setOutputCol("documents")
>>> phi3 = Phi3Transformer.pretrained("phi3") \\
>>> phi3 = Phi3Transformer.pretrained(phi_3_mini_128k_instruct) \\
... .setInputCols(["documents"]) \\
... .setMaxOutputLength(50) \\
... .setOutputCol("generation")
Expand Down Expand Up @@ -308,13 +308,13 @@ def loadSavedModel(folder, spark_session, use_openvino=False):
return Phi3Transformer(java_model=jModel)

@staticmethod
def pretrained(name="phi3", lang="en", remote_loc=None):
def pretrained(name="phi_3_mini_128k_instruct", lang="en", remote_loc=None):
"""Downloads and loads a pretrained model.

Parameters
----------
name : str, optional
Name of the pretrained model, by default "phi3"
Name of the pretrained model, by default phi_3_mini_128k_instruct
lang : str, optional
Language of the pretrained model, by default "en"
remote_loc : str, optional
Expand Down
6 changes: 3 additions & 3 deletions python/sparknlp/annotator/seq2seq/qwen_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ class QwenTransformer(AnnotatorModel, HasBatchedAnnotate, HasEngine):
>>> documentAssembler = DocumentAssembler() \\
... .setInputCol("text") \\
... .setOutputCol("documents")
>>> qwen = QwenTransformer.pretrained("qwen-7b") \\
>>> qwen = QwenTransformer.pretrained("qwen_7.5b_chat") \\
... .setInputCols(["documents"]) \\
... .setMaxOutputLength(50) \\
... .setOutputCol("generation")
Expand Down Expand Up @@ -317,13 +317,13 @@ def loadSavedModel(folder, spark_session, use_openvino=False):
return QwenTransformer(java_model=jModel)

@staticmethod
def pretrained(name="qwen-7b", lang="en", remote_loc=None):
def pretrained(name="qwen_7.5b_chat", lang="en", remote_loc=None):
"""Downloads and loads a pretrained model.

Parameters
----------
name : str, optional
Name of the pretrained model, by default "qwen-7b"
Name of the pretrained model, by default "qwen_7.5b_chat"
lang : str, optional
Language of the pretrained model, by default "en"
remote_loc : str, optional
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ import org.json4s.jackson.JsonMethods._
* .setInputCols("document")
* .setOutputCol("generation")
* }}}
* The default model is `"llama_2_7b_chat_hf_int4"`, if no name is provided. For available
* The default model is `"mini_cpm_2b_8bit"`, if no name is provided. For available
* pretrained models please see the [[https://sparknlp.org/models?q=cpm Models Hub]].
*
* For extended examples of usage, see
Expand All @@ -94,7 +94,7 @@ import org.json4s.jackson.JsonMethods._
* .setInputCol("text")
* .setOutputCol("documents")
*
* val cpm = CPMTransformer.pretrained("llama_2_7b_chat_hf_int4")
* val cpm = CPMTransformer.pretrained("mini_cpm_2b_8bit")
* .setInputCols(Array("documents"))
* .setMinOutputLength(10)
* .setMaxOutputLength(50)
Expand Down Expand Up @@ -311,7 +311,8 @@ class CPMTransformer(override val uid: String)
trait ReadablePretrainedCPMTransformerModel
extends ParamsAndFeaturesReadable[CPMTransformer]
with HasPretrained[CPMTransformer] {
override val defaultModelName: Some[String] = Some("llama_2_7b_chat_hf_int4")
override val defaultModelName: Some[String] = Some("mini_cpm_2b_8bit")
override val defaultLang: String = "xx"

/** Java compliant-overrides */
override def pretrained(): CPMTransformer = super.pretrained()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ import org.json4s.jackson.JsonMethods._
* .setInputCols("document")
* .setOutputCol("generation")
* }}}
* The default model is `"nllb_418M"`, if no name is provided. For available pretrained models
* The default model is `"nllb_distilled_600M_8int"`, if no name is provided. For available pretrained models
* please see the [[https://sparknlp.org/models?q=nllb Models Hub]].
*
* For extended examples of usage, see
Expand Down Expand Up @@ -156,7 +156,7 @@ import org.json4s.jackson.JsonMethods._
* .setInputCol("text")
* .setOutputCol("documents")
*
* val nllb = NLLBTransformer.pretrained("nllb_418M")
* val nllb = NLLBTransformer.pretrained("nllb_distilled_600M_8int")
* .setInputCols(Array("documents"))
* .setSrcLang("zho_Hans")
* .serTgtLang("eng_Latn")
Expand Down Expand Up @@ -635,7 +635,7 @@ class NLLBTransformer(override val uid: String)
trait ReadablePretrainedNLLBTransformerModel
extends ParamsAndFeaturesReadable[NLLBTransformer]
with HasPretrained[NLLBTransformer] {
override val defaultModelName: Some[String] = Some("nllb_418M")
override val defaultModelName: Some[String] = Some("nllb_distilled_600M_8int")
override val defaultLang: String = "xx"

/** Java compliant-overrides */
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ import org.json4s.jackson.JsonMethods._
* .setInputCols("document")
* .setOutputCol("generation")
* }}}
* The default model is `"phi_3_mini_128k_instruct_int8"`, if no name is provided. For available
* The default model is `"phi_3_mini_128k_instruct"`, if no name is provided. For available
* pretrained models please see the [[https://sparknlp.org/models?q=phi3 Models Hub]].
*
* For extended examples of usage, see
Expand Down Expand Up @@ -106,7 +106,7 @@ import org.json4s.jackson.JsonMethods._
* .setInputCol("text")
* .setOutputCol("documents")
*
* val phi3 = Phi3Transformer.pretrained("phi_3_mini_128k_instruct_int8")
* val phi3 = Phi3Transformer.pretrained("phi_3_mini_128k_instruct")
* .setInputCols(Array("documents"))
* .setMinOutputLength(10)
* .setMaxOutputLength(50)
Expand Down Expand Up @@ -323,7 +323,7 @@ class Phi3Transformer(override val uid: String)
trait ReadablePretrainedPhi3TransformerModel
extends ParamsAndFeaturesReadable[Phi3Transformer]
with HasPretrained[Phi3Transformer] {
override val defaultModelName: Some[String] = Some("phi_3_mini_128k_instruct_int8")
override val defaultModelName: Some[String] = Some("phi_3_mini_128k_instruct")

/** Java compliant-overrides */
override def pretrained(): Phi3Transformer = super.pretrained()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ import org.json4s.jackson.JsonMethods._
* .setInputCols("document")
* .setOutputCol("generation")
* }}}
* The default model is `"Qwen-13b"`, if no name is provided. For available pretrained models
* The default model is `"qwen_7.5b_chat"`, if no name is provided. For available pretrained models
* please see the [[https://sparknlp.org/models?q=Qwen Models Hub]].
*
* For extended examples of usage, see
Expand Down Expand Up @@ -113,7 +113,7 @@ import org.json4s.jackson.JsonMethods._
* .setInputCol("text")
* .setOutputCol("documents")
*
* val Qwen = QwenTransformer.pretrained("Qwen-7b")
* val Qwen = QwenTransformer.pretrained("qwen_7.5b_chat")
* .setInputCols(Array("documents"))
* .setMinOutputLength(10)
* .setMaxOutputLength(50)
Expand Down Expand Up @@ -334,7 +334,7 @@ class QwenTransformer(override val uid: String)
trait ReadablePretrainedQwenTransformerModel
extends ParamsAndFeaturesReadable[QwenTransformer]
with HasPretrained[QwenTransformer] {
override val defaultModelName: Some[String] = Some("Qwen-7b")
override val defaultModelName: Some[String] = Some("qwen_7.5b_chat")

/** Java compliant-overrides */
override def pretrained(): QwenTransformer = super.pretrained()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ import com.johnsnowlabs.ml.openvino.{OpenvinoWrapper, ReadOpenvinoModel, WriteOp
* .setInputCols("document")
* .setOutputCol("nomic_embeddings")
* }}}
* The default model is `"nomic_small"`, if no name is provided.
* The default model is `"nomic_embed_v1"`, if no name is provided.
*
* For available pretrained models please see the
* [[https://sparknlp.org/models?q=NomicEmbeddings Models Hub]].
Expand Down Expand Up @@ -86,7 +86,7 @@ import com.johnsnowlabs.ml.openvino.{OpenvinoWrapper, ReadOpenvinoModel, WriteOp
* .setInputCol("text")
* .setOutputCol("document")
*
* val embeddings = NomicEmbeddings.pretrained("nomic_small", "en")
* val embeddings = NomicEmbeddings.pretrained("nomic_embed_v1", "en")
* .setInputCols("document")
* .setOutputCol("nomic_embeddings")
*
Expand Down Expand Up @@ -357,7 +357,7 @@ class NomicEmbeddings(override val uid: String)
trait ReadablePretrainedNomicEmbeddingsModel
extends ParamsAndFeaturesReadable[NomicEmbeddings]
with HasPretrained[NomicEmbeddings] {
override val defaultModelName: Some[String] = Some("nomic_small")
override val defaultModelName: Some[String] = Some("nomic_embed_v1")

/** Java compliant-overrides */
override def pretrained(): NomicEmbeddings = super.pretrained()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -690,7 +690,12 @@ object PythonResourceDownloader {
"SnowFlakeEmbeddings" -> SnowFlakeEmbeddings,
"CamemBertForZeroShotClassification" -> CamemBertForZeroShotClassification,
"BertForMultipleChoice" -> BertForMultipleChoice,
"PromptAssembler" -> PromptAssembler)
"PromptAssembler" -> PromptAssembler,
"CPMTransformer"-> CPMTransformer,
"NomicEmbeddings" -> NomicEmbeddings,
"NLLBTransformer" -> NLLBTransformer,
"Phi3Transformer" -> Phi3Transformer,
"QwenTransformer" -> QwenTransformer)

// List pairs of types such as the one with key type can load a pretrained model from the value type
val typeMapper: Map[String, String] = Map("ZeroShotNerModel" -> "RoBertaForQuestionAnswering")
Expand Down