The linguistic curriculum learning algorithm has three features. a) Estimating the importance of linguistic indices using a data-driven approach, b) The application of a "linguistic curriculum" to enhance the model's performance from a linguistic perspective, and c) Identifying the core set of linguistic indices needed to learn a task. This tool also evaluates the model's ability to handle different linguistic indices.
In order to apply the correlation or optimization approaches of linguistic indices importance estimation, use the following options.
python train.py --diff_score lng_w --lng_method [opt OR corr]
To apply the sigmoid, negative-sigmoid, or gaussian curricula, use the following options.
python train.py --curr [sigmoid OR neg-simoid OR gauss]
To compute the binned balanced accuracy according to a linguistic index, you may use the function calc_bal_acc
in utils.py.
All datasets used are publicly available on HF-Datasets. The preprocessing scripts we use are available on scripts/data.
To compute the linguistic indices for a dataset, scripts are provided in scripts/tools.
Python 3.6.10