In this project visualization methods of fine-tuning Transformer architecture network on QA task were demondstrated. Visualization of the optimal state achieved by training could be taken into consideration when setting up the fine-tuning process.
As a result of comparing available corpora and the premise of their creation, two datasets were selected: SQuAD and Adversarial QA for the QA task. DistilBERT model, chosen for significant training speed and model configuration weight advantage over other Transformer models while not compromising accuracy, was trained on QA task.
1-Dimensional loss and F1-score plot for interpolated values. x=0 corresponds to untuned model, x=1 corresponds to fine-tuned model on SQuAD 1.1 dataset
2-Dimensional loss surface contour plot. (0,0) point corresponds to untuned model, (0,1) point corresponds to fine-tuned model on SQuAD 1.1 dataset
Visualization of projected optimization trajectory on 2-dimensional loss surface contour plot. (0,0) point corresponds to untuned model, (0,1) point corresponds to fine-tuned model on SQuAD 1.1 dataset
Work based on:
Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer and Tom Goldstein. Visualizing the Loss Landscape of Neural Nets. NIPS, 2018.
Made with Python as a part of thesis for masters, FDT ITMO