You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used this model on quora qa data set (http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv). Performance of the model is below:
-----------------|Model_output - 0 | |Model_output - 1
is_duplicate -0 | 218,328 | 36,696
is_duplicate -1 | 72,739 | 76,524
Do you have any suggestions for improving the performance of the model.
Code is here:
from semantic_text_similarity.models import WebBertSimilarity
from semantic_text_similarity.models import ClinicalBertSimilarity
web_model = WebBertSimilarity(device='cuda', batch_size=10) #defaults to GPU prediction
web_model.predict([("She won an olympic gold medal","The women is an olympic champion")])
On Sun, Jan 12, 2020, 9:05 PM Chandrak1907 ***@***.***> wrote:
I used this model on quora qa data set (
http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv). Performance of
the model is below:
-----------------|Model_output - 0 | |Model_output - 1
is_duplicate -0 | 218,328 | 36,696
is_duplicate -1 | 72,739 | 76,524
Do you have any suggestions for improving the performance of the model.
Code is here:
from semantic_text_similarity.models import WebBertSimilarity
from semantic_text_similarity.models import ClinicalBertSimilarity
web_model = WebBertSimilarity(device='cuda', batch_size=10) #defaults to
GPU prediction
web_model.predict([("She won an olympic gold medal","The women is an
olympic champion")])
# Quora
def check_score(row):
return web_model.predict([(row['question1'],row['question2'])])[0]
import pandas as pd
t2 = pd.read_csv("./quora_duplicate_questions.tsv",sep='\t')
t3= t2.dropna()
t3['model_score']=t3.apply(check_score,axis=1)
t3.to_csv("./t3_Jan10.csv",index=False)
t3 = pd.read_csv("./t3_Jan10.csv")
t3[t3.is_duplicate==0]['model_score'].mean()
t3[t3.is_duplicate==1]['model_score'].mean()
t3['model_output']=0
t3.loc[t3.model_score>3.71, 'model_output']=1
pd.crosstab(t3.is_duplicate, t3.model_output)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=ADJ4TBSGCNDSUUKJQSHAMTTQ5PD7LA5CNFSM4KF3LJS2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IFT7ZRA>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADJ4TBTNGFDEFXLGXESLQOLQ5PD7LANCNFSM4KF3LJSQ>
.
I used this model on quora qa data set (http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv). Performance of the model is below:
-----------------|Model_output - 0 | |Model_output - 1
is_duplicate -0 | 218,328 | 36,696
is_duplicate -1 | 72,739 | 76,524
Do you have any suggestions for improving the performance of the model.
Code is here:
from semantic_text_similarity.models import WebBertSimilarity
from semantic_text_similarity.models import ClinicalBertSimilarity
web_model = WebBertSimilarity(device='cuda', batch_size=10) #defaults to GPU prediction
web_model.predict([("She won an olympic gold medal","The women is an olympic champion")])
# Quora
def check_score(row):
return web_model.predict([(row['question1'],row['question2'])])[0]
import pandas as pd
t2 = pd.read_csv("./quora_duplicate_questions.tsv",sep='\t')
t3= t2.dropna()
t3['model_score']=t3.apply(check_score,axis=1)
t3.to_csv("./t3_Jan10.csv",index=False)
t3 = pd.read_csv("./t3_Jan10.csv")
t3[t3.is_duplicate==0]['model_score'].mean()
t3[t3.is_duplicate==1]['model_score'].mean()
t3['model_output']=0
t3.loc[t3.model_score>3.71, 'model_output']=1
pd.crosstab(t3.is_duplicate, t3.model_output)
The text was updated successfully, but these errors were encountered: