XlmRoBertaSentenceEmbeddings returns huge amount of embeddings instead of set dimensions #14180
Unanswered
kkwasnioch
asked this question in
Q&A
Replies: 1 comment 9 replies
-
Hi @kkwasnioch Will try to fix this in the next release |
Beta Was this translation helpful? Give feedback.
9 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to produce embeddings for whole documents in 3 languages: english, polish, finnish. Previously I have tried sentence-transformers/paraphrase-multilingual-mpnet-base-v2 from huggingface and it works fine, returns 768 dims. But when I load model and run it with sparknlp XlmRoBertaSentenceEmbeddings it produce f.e. 26k dims. Am I loading model wrong way? Or are thare any othe issues? Thanks!
https://github.com/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/onnx/HuggingFace_ONNX_in_Spark_NLP_XlmRoBertaSentenceEmbeddings.ipynb -> here is sample code which i took knowladge
Code:
Output:
+--------------------+--------------------+-----+
| text| emb| size|
+--------------------+--------------------+-----+
|Do kościoła jak "... |[0.028680567, 0.2...|29952|
|Audi Q7 właśnie p... |[-0.01756316, -0.... |28416|
|Białoruś. KGB wpr... |[0.07118901, -0.0... |28416|
|"Są prawdziwym za...|[0.0972352, -0.04..|25344|
|Obsesja, za którą... |[0.07850968, 0.15..|32256|
|Ogromny sukces Po...|[-0.034644652, 0..|22272|
|Rolnicy "zajęli... |[-0.06938014, 0.0.. |29952|
|Szokujące wyznani... |[0.08084734, 0.18...|30720|
|Pogoda zaskoczy w...|[-0.086600736, 0....|34560|
|Kiedyś kary fizyc... |[0.059363756, 0.0..|28416|
+--------------------+--------------------+-----+
Beta Was this translation helpful? Give feedback.
All reactions