NEST

This repository contains Python code to replicate the experiments for NEST. We propose to use neural models for type prediction and type representation to improve the type enrichment strategies that can be used in existing matching pipelines in a modular fashion. In particular:

type enrichment for type-based filtering: neural type prediction algorithms to enrich the types of candidate entities with types predicted by a neural network.
type enrichment for entity similarity with distributed representations: distributed type representations to enrich entity embeddings and make their similarity more aware of their types.

Reference

This work is under review (ESWC 2021):

Cutrona, V., Puleri, G., Bianchi, F., and Palmonari, M. (2020). NEST: Neural Soft Type Constraints to Improve Entity Linking in Tables. ESWC 2021 (under review).

How to use

Installing dependencies

The code is developed for Python 3.8. Install all the required packages listed in the requirements.txt file.

virtualenv -p python3.8 venv # we suggest to create a virtual environment
source venv/bin/activate
pip install -r requirements.txt

Prepare utils data

Neural networks and type embeddings are available in the utils_data.zip file. The following files must be extracted under the utils/data directory:

abs2vec_pred.keras and abs2vec_pred_classes.pkl: the neural network based on BERT embeddings, and the list of its predictable classes
rdf2vec_pred.keras and rdf2vec_pred_classes.pkl: the neural network based on RDF2Vec embeddings, and the list of its predictable classes
dbpedia_owl2vec: typed embedding for DBpedia 2016-10 generated using OWL2Vec
tee.wv: typed embedding for DBpedia 2016-10 generated using TEE

We release a set of Docker images to run the above predictors as a service; also, some other embedding models (e.g., RDF2Vec) have been exposed as a service. Download abs2vec embeddings from GDrive and set its path in the docker-compose.yml file. Finally, start the containers:

docker-compose up -d

Benchmark datasets

Benchmark datasets can be downloaded from GDrive. Unzip the file under the datasets folder.

Create the index

Replicating our experiments requires to initialize an index that contains DBpedia 2016-10. We created it by using ElasticPedia, then manually adding the Wikipedia anchor texts, labels from the Lexicalization dataset, and the in- and out-degree from the Page Link dataset. Lastly, we re-indexed the index with the following mappings:

{
  "dbpedia": {
    "mappings": {
      "properties": {
        "category": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "description": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "direct_type": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "in_degree": {
          "type": "integer"
        },
        "nested_surface_form": {
          "type": "nested",
          "properties": {
            "surface_form_keyword": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword"
                },
                "ngram": {
                  "type": "text",
                  "analyzer": "my_analyzer"
                }
              }
            }
          }
        },
        "out_degree": {
          "type": "integer"
        },
        "surface_form_keyword": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            },
            "ngram": {
              "type": "text",
              "analyzer": "my_analyzer"
            }
          }
        },
        "type": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "uri": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "uri_count": {
          "type": "integer"
        },
        "uri_prob": {
          "type": "float"
        }
      }
    },
    "settings": {
      "index": {
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "my_tokenizer"
            }
          },
          "tokenizer": {
            "my_tokenizer": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "min_gram": "3",
              "type": "ngram",
              "max_gram": "3"
            }
          }
        }
      }
    }
  }
}

We are planning to release a dump of our index. Replace the host name titan with the endpoint of your Elasticsearch index in the following files:

utils/nn.py
utils/embeddings.py
data_model/kgs.py
run_experiments.py

Run the experiments

Run the script as follows to initialize and run the models described in our paper:

python run_experiments.py

Results are printed in the eswc_experiments.json file.

People

Vincenzo Cutrona, University of Milano - Bicocca ([email protected])
Gianluca Puleri, University of Milano - Bicocca ([email protected])
Federico Bianchi, Bocconi University ([email protected])
Matteo Palmonari, University of Milano - Bicocca ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
annotators		annotators
data_model		data_model
datasets		datasets
experiments		experiments
generators		generators
hybrid		hybrid
lookup		lookup
notebooks		notebooks
utils		utils
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
hpo.py		hpo.py
requirements.txt		requirements.txt
run_experiments.py		run_experiments.py
utils_data.zip		utils_data.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NEST

Reference

How to use

Installing dependencies

Prepare utils data

Benchmark datasets

Create the index

Run the experiments

People

About

Releases

Packages

Contributors 3

Languages

vcutrona/nest

Folders and files

Latest commit

History

Repository files navigation

NEST

Reference

How to use

Installing dependencies

Prepare utils data

Benchmark datasets

Create the index

Run the experiments

People

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages