CatIss

Good news!!! CatIss is the winner of the NLBSE'22 tool competition! 🥇 🏆 🎉

This repository contains the source code, notebooks, model, and datasets used for training CatIss, an intelligent tool for automatic categorization of issue reports based on the RoBERTa model. I first deduplicated, cleaned and truncated the datasets provided for the NLBSE 2022 Tool Competition (The First International Workshop on Natural Language-based Software Engineering) [1], then fine-tuned RoBERTa for four epochs on the cleaned training set. CatIss is able to achieve an 87.2% F1-score (micro average) on the provided test set.

Shared Model and Data

The saved model and cleaned datasets are shared publicly on Google Drive:

https://drive.google.com/drive/folders/1jgV4U41-2acctpc6jH5DWL3fF5V6bKF8?usp=sharing

System Information

Experiments are conducted on a machine equipped with Ubuntu 16.04, 64-bit as the operating system, two GeForce RTX 2080 GPU cards, AMD Ryzen Threadripper 1920X CPU, and 64G RAM. Training lasts for four hours and 20 minutes. Note that preprocessing the datasets significantly reduces the training cost, while maintaining the accuracy of predictions.

Tool Paper Abstract

This paper describes CatIss, an automatic Categorizer of Issue reports which is built upon the Transformer-based pre-trained RoBERTa model. CatIss classifies issue reports into three main categories of bug report, enhancement/feature request, and question. First, the datasets provided for the NLBSE tool competition are cleaned and preprocessed. Then, the pre-trained RoBERTa model is fine-tuned on the preprocessed dataset. Evaluating CatIss on more than 80 thousand issue reports from GitHub, indicates that it performs very well surpassing the competition baseline, TiketTagger [2, 3], TicketTagger, and achieving 87.2% F1-score (micro average). Additionally, as CatIss is trained on a wide set of repositories, it is a generic prediction model, hence applicable for any unseen software project or projects with little historical data. Scripts for cleaning the datasets, training CatIss and evaluating the model are publicly available. CatIss is based on our recent work published by Empirical Software Engineering Journal (EMSE), which I will also be presenting at the 44th International Conference on Software Engineering (ICSE'22) Conference in the Journal First Track [4].

References

[1] Kallis, R., Chaparro, O., Di Sorbo, A., and Panichella, S., NLBSE'22 Tool Competition, Proceedings of The 1st International Workshop on Natural Language-based Software Engineering (NLBSE'22)

[2] Kallis, R., Di Sorbo, A., Canfora, G., & Panichella, S. (2019, September). Ticket tagger: Machine learning driven issue classification. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME) (pp. 406-409). IEEE.

[3] Kallis, R., Di Sorbo, A., Canfora, G., & Panichella, S. (2021). Predicting issue types on GitHub. Science of Computer Programming, 205, 102598.

[4] Izadi, M., Akbari, K., & Heydarnoori, A. (2022). Predicting the objective and priority of issue reports in software repositories. Empirical Software Engineering, 27(2), 1-37.

How to Cite

If you use CatIss in your work, please cite as following:

Izadi, M., CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers, In Proceedings of The 1st International Workshop on Natural
Language-based Software Engineering (NLBSE’22), page (to appear), 2022.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
model		model
notebooks		notebooks
results		results
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

model

model

notebooks

notebooks

results

results

CITATION.cff

CITATION.cff

LICENSE

LICENSE

README.md

README.md

Repository files navigation

CatIss

Good news!!! CatIss is the winner of the NLBSE'22 tool competition! 🥇 🏆 🎉

Shared Model and Data

System Information

Tool Paper Abstract

References

How to Cite

About

Releases

Packages

Languages

License

MalihehIzadi/catiss

Folders and files

Latest commit

History

Repository files navigation

CatIss

Good news!!! CatIss is the winner of the NLBSE'22 tool competition! 🥇 🏆 🎉

Shared Model and Data

System Information

Tool Paper Abstract

References

How to Cite

About

Topics

Resources

License

Stars

Watchers

Forks

Languages