Performance on quora qa data set #7

Chandrak1907 · 2020-01-13T02:05:41Z

I used this model on quora qa data set (http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv). Performance of the model is below:
-----------------|Model_output - 0 | |Model_output - 1
is_duplicate -0 | 218,328 | 36,696
is_duplicate -1 | 72,739 | 76,524

Do you have any suggestions for improving the performance of the model.

Code is here:

from semantic_text_similarity.models import WebBertSimilarity
from semantic_text_similarity.models import ClinicalBertSimilarity
web_model = WebBertSimilarity(device='cuda', batch_size=10) #defaults to GPU prediction

web_model.predict([("She won an olympic gold medal","The women is an olympic champion")])

# Quora

def check_score(row):
return web_model.predict([(row['question1'],row['question2'])])[0]
import pandas as pd
t2 = pd.read_csv("./quora_duplicate_questions.tsv",sep='\t')
t3= t2.dropna()
t3['model_score']=t3.apply(check_score,axis=1)
t3.to_csv("./t3_Jan10.csv",index=False)
t3 = pd.read_csv("./t3_Jan10.csv")
t3[t3.is_duplicate==0]['model_score'].mean()
t3[t3.is_duplicate==1]['model_score'].mean()
t3['model_output']=0
t3.loc[t3.model_score>3.71, 'model_output']=1
pd.crosstab(t3.is_duplicate, t3.model_output)

AndriyMulyar · 2020-01-13T02:12:12Z

Fine-tune on your task specific data. Best of luck!

…

On Sun, Jan 12, 2020, 9:05 PM Chandrak1907 ***@***.***> wrote: I used this model on quora qa data set ( http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv). Performance of the model is below: -----------------|Model_output - 0 | |Model_output - 1 is_duplicate -0 | 218,328 | 36,696 is_duplicate -1 | 72,739 | 76,524 Do you have any suggestions for improving the performance of the model. Code is here: from semantic_text_similarity.models import WebBertSimilarity from semantic_text_similarity.models import ClinicalBertSimilarity web_model = WebBertSimilarity(device='cuda', batch_size=10) #defaults to GPU prediction web_model.predict([("She won an olympic gold medal","The women is an olympic champion")]) # Quora def check_score(row): return web_model.predict([(row['question1'],row['question2'])])[0] import pandas as pd t2 = pd.read_csv("./quora_duplicate_questions.tsv",sep='\t') t3= t2.dropna() t3['model_score']=t3.apply(check_score,axis=1) t3.to_csv("./t3_Jan10.csv",index=False) t3 = pd.read_csv("./t3_Jan10.csv") t3[t3.is_duplicate==0]['model_score'].mean() t3[t3.is_duplicate==1]['model_score'].mean() t3['model_output']=0 t3.loc[t3.model_score>3.71, 'model_output']=1 pd.crosstab(t3.is_duplicate, t3.model_output) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=ADJ4TBSGCNDSUUKJQSHAMTTQ5PD7LA5CNFSM4KF3LJS2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IFT7ZRA>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADJ4TBTNGFDEFXLGXESLQOLQ5PD7LANCNFSM4KF3LJSQ> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance on quora qa data set #7

Performance on quora qa data set #7

Chandrak1907 commented Jan 13, 2020

AndriyMulyar commented Jan 13, 2020 via email

Performance on quora qa data set #7

Performance on quora qa data set #7

Comments

Chandrak1907 commented Jan 13, 2020

# Quora

AndriyMulyar commented Jan 13, 2020 via email