SciBERT Reranker and Local BM25 baseline

This repository implements the LocalBM25 baseline and the SciBERT Reranker as presented by

@inproceedings{10.1007/978-3-030-99736-6_19,
    title     = {Local Citation Recommendation with Hierarchical-Attention Text Encoder and SciBERT-Based Reranking},
    author    = {Nianlong Gu and Yingqiang Gao and Richard H. R. Hahnloser},
    year      = {2022},
    booktitle = {Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022},
    editor    = {Matthias Hagen and Suzan Verberne and Craig Macdonald and Christin Seifert and Krisztian Balog and Kjetil Nørvåg and Vinay Setty},
    series    = {Lecture Notes in Computer Science},
    volume    = {13185},
    pages     = {274--288},
    publisher = {Springer},
    url       = {https://doi.org/10.1007/978-3-030-99736-6_19},
    doi       = {10.1007/978-3-030-99736-6_19}
}

We experiment with different information in the query text, i.e., the additional information utilized for representing the citation context: abstract of the citing paper, title of the citing paper, paragraph around the cite-worthy sentence, section of the cite-worthy sentence Ideally, we do not use the abstract and the title in the representation as we are recommending citations for a work-in-progress scientific writing. In this case,

the abstract and title might not exist yet and
it might be confusing when the recommended citations in the text change only due to some changes in the abstract or the title.

The candidate paper is represented by its title and abstract as in the original work. When the section of the cite-worthy sentence is utilized, we additionally add the year of publication of the candidate paper to its representation.

For more details on the representations, we refer to Section 3.4 of our scientific report.

Installation

Create a python3.8 environment with all required libraries. Using Anaconda: conda env create -f environment.yml python=3.8

Install the simpletransformers library via our github organisation (it only contains slight changes for loading large classification datasets in a lazy manner and allowing a custom loss function and a custom batch sampler in the ClassificationModel, which is not possible in the official library implementation).

git clone https://github.com/Data-Science-2Like/simpletransformers.git
cd simpletransformers (just to be on the save side that we install custom and not official version)
pip install .

Structure of the Repository

baseline: implementation of the Local BM25 baseline
(see README in the directory for more details)

dataset_creation: further processing of the S2ORC_Reranker and ACL-200_Reranker datasets
(see README in the directory for more details)

reranker: implementation of the SciBERT Reranker
(see README in the directory for more details)

test: initial tests for verifying our implementation of the triplet loss (→ test_loss.py) and the evaluation metrics (→ test_metrics.py) for the SciBERT Reranker

Getting Datasets for Running the Experiments

In order to run the experiments with the Local BM25 baseline or the SciBERT Reranker proper datasets are required.

ACL dataset

When you want to perform the experiments with the ACL dataset as provided in Improved Local Citation Recommendation Based on Context Enhanced with Global Information, you need to download the dataset and execute the dataset_creation/run.py file with the respective method call and parameters.

Modified S2ORC dataset

In order to perform the experiments with the Modified S2ORC dataset, a sequence of preprocessing steps is required:

Download the S2ORC dataset and create the Modified S2ORC dataset as described in our dataset-creation repository.
Create an initial version of the S2ORC_Reranker dataset as described in dataset-creation/reranker/ using the Modified S2ORC dataset as a basis.
With this version of the S2ORC_Reranker dataset, execute the dataset_creation/run.py file with the respective method call and parameters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SciBERT Reranker and Local BM25 baseline

Installation

Structure of the Repository

Getting Datasets for Running the Experiments

ACL dataset

Modified S2ORC dataset

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
baseline		baseline
dataset		dataset
dataset_creation		dataset_creation
reranker		reranker
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

License

Data-Science-2Like/SciBERT-Reranker

Folders and files

Latest commit

History

Repository files navigation

SciBERT Reranker and Local BM25 baseline

Installation

Structure of the Repository

Getting Datasets for Running the Experiments

ACL dataset

Modified S2ORC dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages