A Sequence-Labeling-Model-for-Catchphrase-Identification-from-Legal-Case-Documents

Introduction

This reposirtory provides the implementation of the supervised catchphrase extraction model D2V-BiGRU-CRF described in the paper titled - "A Sequence Labeling Model for Catchphrase Identification from Legal Case Documents". The repository provides python codes for training a new model from a scratch and using the same for extracting catchphrases from a new unseen document. In addition we provide a pre-trained model that was trained using our data that can be readily used to extract catchphrases from unseen documents. We describe the usage of the python scripts.

Regardless of whether we want to train a new model from scratch or extract catchphrases using the trained-model, we need a pre-trained Doc2Vec model. We provide our doc2vec model (trained using gensim upon a set of 33.5K case documents from the Supreme Court of India ) that can be downloaded from the link: https://app.box.com/s/sd3v6kp1i2qtsz8r2i2dfuwvx43ri1hb. We hope to make using our model easier. One should be primarily interested in the following two scripts -

train_on_gold_standard_catches.py for training the model and
annotate_docs.py for predicting the catchphrases out of new unseen documents.

All other options can be provided inside the scripts themselves. And the meaning of the variables are explained within the code using appropriate commentlines. Thank You.

Reference

Thank you for using this implementation in your work, please cite our original paper: "A Sequence Labeling Model for Catchphrase Identification from Legal Case Documents", A. Mandal, K. Ghosh, S. Ghosh, S. Mandal, 2021, Journal of Artificial Intelligence and Law.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Catchphrases_extracted_by_D2V_BiGRU_CRF		Catchphrases_extracted_by_D2V_BiGRU_CRF
The_D2V_BiGRU_CRF_model		The_D2V_BiGRU_CRF_model
catchwords		catchwords
cleaned_cases		cleaned_cases
test_documents		test_documents
LICENSE		LICENSE
README.md		README.md
Requirements.txt		Requirements.txt
annotate_docs.py		annotate_docs.py
callbacks.py		callbacks.py
layers.py		layers.py
models.py		models.py
models_d2v.py		models_d2v.py
preprocessing_d2v.py		preprocessing_d2v.py
sentencify.py		sentencify.py
tagger.py		tagger.py
train_on_gold_standard_catches.py		train_on_gold_standard_catches.py
trainer.py		trainer.py
trainer_d2v.py		trainer_d2v.py
utils.py		utils.py
wrap.py		wrap.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Sequence-Labeling-Model-for-Catchphrase-Identification-from-Legal-Case-Documents

Introduction

Reference

About

Releases

Packages

Languages

License

amarnamarpan/D2V-BiGRU-CRF

Folders and files

Latest commit

History

Repository files navigation

A Sequence-Labeling-Model-for-Catchphrase-Identification-from-Legal-Case-Documents

Introduction

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages