Advanced Paper Analyzer

Introduction

Advanced Paper Analyzer takes a set of research papers and extracts its metadata to obtain information. It accesses Wikidata and ROR to expand the information and also has processes that compare the similarity between the abstracts taken from the papers and that analyze the possible topics the paper is about.

Installation

You have the choice to run the application in a container (note that you need a VNC-client) here or in your computer as follows :

Clone the repository:

git clone https://gitlab.utc.fr/royhucheradorni/ia04.git

Python

The code runs on Python 3.10, so it must be installed in the system to be able to use Advanced Paper Analyzer.

Dependencies

Dependencies can be installed by using Poetry. You simply must go to the root directory of the repository and run:

poetry install

Or install all dependencies with pip using requirements.txt in the root directory of the repository by running:

pip install -r requirements.txt

Grobid

Grobid is used to extract metadata from the papers, which are then used for further analysis. For this reason you must install either the full or light version of the Grobid 0.8.0 Docker image. To run Grobid use one of this commands depending on version you have:

Full image: https://hub.docker.com/r/grobid/grobid

docker pull grobid/grobid

Light image: https://hub.docker.com/r/lfoppiano/grobid/

docker pull lfoppiano/grobid

Apache Jena Fuseki

Jena Fuseki is used to create the triple-store and the SPARQL endpoint, so it must be installed and run as described in the section to create the dataset

docker pull stain/jena-fuseki

How to use

run Jena-fuseki and grobid with :

docker run -p 8070:8070 lfoppiano/grobid:latest-develop

docker run -p 3035:3030 -e ADMIN_PASSWORD=pw123 -e FUSEKI_DATASET_1=KG_dataset stain/jena-fuseki

it creates the dataset at the same time

Run the script interface.py with the parameter 0

poetry run python interface.py 0

You can now :

PROCESS PDF WITH GROBID : process all the pdf in the directory Corpus_pdf to reformat the data/metadata in a XML format.
EXTRACT DATA : Extract the data (title, date, author) from the processed pdf and do some topic modeling and compute similarity between the abstract of each pdf.
Enrich DATA : Add more information coming from ROR and WIKIDATA (name, authors, organizations_founder of referenced papers).
INSERT DATA FROM RDF : Add all this data to KG server Jena-fuseki.
SUBMIT QUERY : in the input box, write your SPARQL queries and submit.

Example of queries :

select each topic of which the papers have more than 0.90 probability of belonging to that topic :

to request all the pair of article with more than 70% of similarity :

Our RDF diagram :

DOCKER

In order to display the Graphic User Interface running in a docker container, we create a VNC-server. Therefore, you will need to have a VNC-client software (such as RealVNC Viewer).

How to install and run

Go to the location of the docker-compose and run :

docker-compose build 
docker-compose up -d

Connect to the container using your VNC-client at the adress : localhost:5901 The password is : pw123
Open a terminal and execute :

poetry run python interface.py 1

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
Corpus_pdf		Corpus_pdf
Grobid_processed_pdf		Grobid_processed_pdf
Scripts		Scripts
docs		docs
grobid_client_python @ 1fa605f		grobid_client_python @ 1fa605f
images		images
vnc_ubuntu		vnc_ubuntu
.gitmodules		.gitmodules
.readthedocs.yaml		.readthedocs.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
interface.py		interface.py
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced Paper Analyzer

Table of Contents

Introduction

Installation

How to use

DOCKER

How to install and run

About

Releases 3

Packages

Contributors 2

Languages

License

anastmur/advanced_paper_analyzer

Folders and files

Latest commit

History

Repository files navigation

Advanced Paper Analyzer

Table of Contents

Introduction

Installation

How to use

DOCKER

How to install and run

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages