LHA-HTC : Label hierarchy alignment for improved hierarchical text classification

Implementation for the 2023 IEEE International Conference on Big Data (BigData) accepted paper "Label hierarchy alignment for improved hierarchical text classification" paper-link

Requirements

Python >= 3.7
torch >= 1.6.0
transformers >= 4.30.2
Below libraries only if you want to run on GAT/GCN as the graph encoder
- torch-geometric == 2.4.0
- torch-sparse == 0.6.17
- torch-scatter == 2.1.1

Data

All datasets are publically available and can be accessed at WOS and RCV1-V2
We followed the specific details mentioned in the contrastive-htc repository to obtain and preprocess the original datasets (WOS and RCV1-V2).
After accessing the dataset, run the scripts in the folder preprocess for each dataset separately to obtain tokenized version of dataset and the related files. These will be added in the data/x folder where x is the name of dataset with possible choices as: wos and rcv.
Detailed steps regarding how to obtain and preprocess each dataset are mentioned in the readme file of preprocess folder
For reference we have added tokenized versions of the WOS dataset along with its related files in the data/wos folder. Similarly do for rcv dataset.

Train

The train_lha.py can be used to train all the models by setting different arguments.

FOR HGCLR

HGCLR is a popular hierarchical text classification model, introduced in this paper at ACL 2022. Although we propose LHA as a model-agnostic approach, we demonstrate its effectiveness by integrating it with the HGCLR model. We also thank the authors for sharing their code. To run HGCLR use following command in the terminal while inside the HTLA-n directory

python train_lha.py --name='ckp_hgclr' --batch 10 --data='wos' --graph 1 --lamb 0.05 --thre 0.02

Some Important arguments:

--name The name of directory in which your model will be saved. For e.g. the above model will be saved in ./LHA-HTC/data/wos/ckp_bert
--data The name of directory which contains your data and related files
--graph whether to use structure encoder
--graph denotes batch size
--lamb and --thre are threshold arguments specific to HGCLR, and their values for WOS (lamb 0.05; thre 0.02) and RCV1-V2 (lamb 0.3; thre 0.001) are provided in contrastive-htc

FOR LHA-CON (Contrastive label alignment)

The code for Contrastive label alignment is in LHA_CON class in contrast_lha.py

python train_lha.py --name='ckpt_con' --batch 10 --data='wos' --graph 1 --lamb 0.05 --thre 0.02 --hsampling 1 --hcont_wt 0.4

Some Important arguments:

--hsampling whether to use LHA-CON module
--hcont_wt weight term of the LHA-CON module. We use 0.4 as the weigh for both WOS and RCV1-V2.

FOR LHA-ADV (Adversarial label alignment)

The code for Adversarial label alignment is in LHA_ADV class in contrast_lha.py

python train_lha.py --name='ckpt_adv' --batch 10 --data='wos' --graph 1 --lamb 0.05 --thre 0.02 --label_reg 1 --prior_wt 0.5 --hlayer 900

Some Important arguments:

--label_reg whether to use LHA-ADV
--prior_wt weight term of the LHA-ADV module. We use 0.4 for WOS and 0.2 for RCV1-V2
--hlayer The size of the first hidden layer of the neural network. We use 900 for WOS and 1000 for RCV1-V2.

For BERT

python train_lha.py --name='ckp_bert' --batch 10 --data='wos' --graph 0

Test

To run the trained model on test set run the script test_lha.py
python test_lha.py --name ckpt1 --data wos --extra _macro

Some Important arguments

--name The name of the directory which contains the saved checkpoint. The checkpoint is saved in ../LHA-HTC/data/wos/
--data The name of directory which contains your data and related files
--extra Two checkpoints are kept based on macro-F1 and micro-F1 respectively. The possible choices are _macro and _micro to choose from the two checkpoints

Citation

If you find our work helpful, please cite it using the following BibTeX entry:

@INPROCEEDINGS{10386495,
  author={Kumar, Ashish and Toshniwal, Durga},
  booktitle={2023 IEEE International Conference on Big Data (BigData)}, 
  title={Label Hierarchy Alignment for Improved Hierarchical Text Classification}, 
  year={2023},
  volume={},
  number={},
  pages={1174-1179},
  doi={10.1109/BigData59044.2023.10386495}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
data/wos		data/wos
preprocess		preprocess
README.md		README.md
contrast_lha.py		contrast_lha.py
eval.py		eval.py
graph.py		graph.py
optim.py		optim.py
test_lha.py		test_lha.py
train_lha.py		train_lha.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LHA-HTC : Label hierarchy alignment for improved hierarchical text classification

Requirements

Data

Train

FOR HGCLR

FOR LHA-CON (Contrastive label alignment)

FOR LHA-ADV (Adversarial label alignment)

For BERT

Test

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

havelhakimi/LHA-HTC

Folders and files

Latest commit

History

Repository files navigation

LHA-HTC : Label hierarchy alignment for improved hierarchical text classification

Requirements

Data

Train

FOR HGCLR

FOR LHA-CON (Contrastive label alignment)

FOR LHA-ADV (Adversarial label alignment)

For BERT

Test

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages