bert.txt

In this paper , we improve the fine - tuning based approaches by proposing BERT : Bidirectional Encoder Representations from Transformers . BERT alleviates the previously mentioned unidirectionality constraint by using a “ masked language model ” ( MLM ) pre - training objective , inspired by the Cloze task ( Taylor , 1953 ) .
In this work , we denote the number of layers ( i.e. , Transformer blocks ) as L , the hidden size as H , and the number of self - attention heads as A . We primarily report results on two model sizes : BERTBASE ( L = 12 , H = 768 , A = 12 , Total Parameters = 110M ) and BERTLARGE ( L = 24 , H = 1024 , A = 16 , Total Parameters = 340M ) .
F1 scores are reported for QQP and MRPC , Spearman correlations are reported for STS - B , and accuracy scores are reported for the other tasks . We exclude entries that use BERT as one of their components .
On the official GLUE leaderboard , BERTLARGE obtains a score of 80.5 , compared to OpenAI GPT, which obtains 72.8 as of the date of writing .
MNLI Multi - Genre Natural Language Inference is a large - scale, crowdsourced entailment classification task ( Williams et al. , 2018 ) . Given a pair of sentences , the goal is to predict whether the second sentence is an entailment , contradiction , or neutral with respect to the first one .
F1 scores are reported for QQP and MRPC , Spearman correlations are reported for STS - B , and accuracy scores are reported for the other tasks . We exclude entries that use BERT as one of their components .