Link to SemEval-2021: Task 5 Toxic Span Detection is https://competitions.codalab.org/competitions/25623
- https://huggingface.co/docs/transformers/training - To understand how to train model.
- https://huggingface.co/docs/transformers/model_doc/roberta - To understand Roberta model and corresponding tokenizer
- https://huggingface.co/docs/transformers/model_doc/distilbert - To understand DistilBert and corresponding rokeniser
- huggingface/transformers#14305 - to understand postprocessing of predicted labels to spans
- https://github.com/huggingface/notebooks/blob/master/examples/token_classification-tf.ipynb - Copied function tokenize_and_align_labels() from this tutorial notebook from huggingface and followed the certain steps to fine tune model on custom dataset.
- https://github.com/ipavlopoulos/toxic_spans/blob/master/evaluation/metrics.py - F1 score function provided by competition is modified to accomodate our model output