-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
fb873e8
commit 6986e3e
Showing
40 changed files
with
7,843 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
# Created by .ignore support plugin (hsz.mobi) | ||
### JetBrains template | ||
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and WebStorm | ||
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839 | ||
|
||
# User-specific stuff | ||
.idea/**/workspace.xml | ||
.idea/**/tasks.xml | ||
.idea/**/dictionaries | ||
.idea/**/shelf | ||
|
||
# Sensitive or high-churn files | ||
.idea/**/dataSources/ | ||
.idea/**/dataSources.ids | ||
.idea/**/dataSources.local.xml | ||
.idea/**/sqlDataSources.xml | ||
.idea/**/dynamic.xml | ||
.idea/**/uiDesigner.xml | ||
|
||
# Gradle | ||
.idea/**/gradle.xml | ||
.idea/**/libraries | ||
|
||
# CMake | ||
cmake-build-debug/ | ||
cmake-build-release/ | ||
|
||
# Mongo Explorer plugin | ||
.idea/**/mongoSettings.xml | ||
|
||
# File-based project format | ||
*.iws | ||
|
||
# IntelliJ | ||
out/ | ||
|
||
# mpeltonen/sbt-idea plugin | ||
.idea_modules/ | ||
|
||
# JIRA plugin | ||
atlassian-ide-plugin.xml | ||
|
||
# Cursive Clojure plugin | ||
.idea/replstate.xml | ||
|
||
# Crashlytics plugin (for Android Studio and IntelliJ) | ||
com_crashlytics_export_strings.xml | ||
crashlytics.properties | ||
crashlytics-build.properties | ||
fabric.properties | ||
|
||
# Editor-based Rest Client | ||
.idea/httpRequests | ||
### Python template | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# celery beat schedule file | ||
celerybeat-schedule | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
|
||
.idea/ | ||
*.avro | ||
|
||
ignore |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
# Can We Predict New Facts with Open Knowledge Graph Embeddings? A Benchmark for Open Link Prediction | ||
|
||
This repository contains the code for the ACL 2020 paper [**"Can We Predict New Facts with Open Knowledge Graph Embeddings? A Benchmark for Open Link Prediction"**](https://www.aclweb.org/anthology/2020.acl-main.209/). The code is provided as a documentation for the paper and also for follow-up research. | ||
|
||
# <p align="center"> <img src="docs/lp_vs_olp.png" alt="link prediction vs open link prediction" width="70%"> </p> | ||
|
||
The content of this page covers the following topics: | ||
|
||
1. [Preparation and Installation](#preparation-and-installation) | ||
2. [Training Open Knowledge Graph Embedding Model on OLPBENCH](#training) | ||
3. [Issues and possible improvements](#issues-and-possible-improvements) | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
## Preparation and Installation | ||
|
||
- The project is installed as follows: | ||
|
||
``` | ||
git clone https://github.com/samuelbroscheit/open_link_prediction_benchmark.git | ||
cd open_link_prediction_benchmark | ||
pip install -r requirements.txt | ||
``` | ||
- Add paths to environment | ||
``` | ||
source setup_paths | ||
``` | ||
- Download OLPBENCH | ||
Download the full dataset (compressed: ~ 2.4 GB, uncompressed: ~ 7.9 GB) | ||
``` | ||
cd data | ||
wget http://data.dws.informatik.uni-mannheim.de/olpbench/olpbench.tar.gz | ||
tar xzf olpbench.tar.gz | ||
cd .. | ||
``` | ||
- Download OPIEC | ||
**Only** if you want to recreate OLPBENCH from scratch! | ||
Download the OPIEC clean dataset (compressed: ~ 35 GB, uncompressed: ~ 292.4 GB) | ||
``` | ||
cd data | ||
wget http://data.dws.informatik.uni-mannheim.de/opiec/OPIEC-Clean.zip | ||
unzip OPIEC-Clean.zip | ||
cd .. | ||
``` | ||
Then download and start and Elasticsearch server, that should listen on localhost:9200 . This is usually as easy as downloading the most recent version, unzip it, change the default configuration to | ||
``` | ||
cluster node.local: true # disable network | ||
``` | ||
and then start the server in with ./bin/elasticsearch. Then run the preprocessing with | ||
``` | ||
python scripts/create_data.py -c config/preprocessing/prototype.yaml | ||
``` | ||
There are two configurations prepared | ||
- [config/preprocessing/prototype.yaml](config/preprocessing/prototype.yaml) a configuration for prototyping | ||
- [config/preprocessing/acl2020.yaml](config/preprocessing/acl2020.yaml) the configurations with the settings fomr the ACL2020 study | ||
## Training | ||
Once preparation and installation are finished you can train a model on OLPBENCH. | ||
1. [Run training](#run-training) | ||
2. [Prepared configurations](#prepared-configurations) | ||
3. [Available options](#available-options) | ||
### Run training | ||
Run the training with: | ||
``` | ||
python scripts/train.py [TRAIN_CONFIG_YAML] [OPTIONS] | ||
``` | ||
TRAIN_CONFIG_YAML is a yaml config file. The possible options are documented in: | ||
[openkge/default.yaml](openkge/default.yaml) | ||
All top level options can also be given set on the command line. | ||
### Run evaluation | ||
Run evaluation on test data with: | ||
``` | ||
python scripts/train.py --resume data/experiments/.../checkpoint.pth.tar --evaluate True --evaluate_on_validation False | ||
``` | ||
_--resume_ epects the path to a checkpoint file | ||
_--evaluate_on_validation False_ sets the evaluation to run on test data | ||
### Prepared configurations | ||
In the config folder you will find the following configurations: | ||
- [config/acl2020-openlink/wikiopenlink-thorough-complex-lstm.yaml](config/acl2020-openlink/wikiopenlink-thorough-complex-lstm.yaml) is a configuration to train a OpenKGE model on the open link benchmark data. | ||
### Models | ||
###### Lookup based models (standard KGE) | ||
- LookupTucker3RelationModel | ||
- LookupDistmultRelationModel | ||
- LookupComplexRelationModel | ||
###### Token based models | ||
*Compute the entity and relation embeddings by pooling token embeddings* | ||
- UnigramPoolingComplexRelationModel | ||
*Compute the entity and relation embeddings with a sliding window CNN* | ||
- BigramPoolingComplexRelationModel | ||
*Compute the entity and relation embeddings with a LSTM* | ||
- LSTMDistmultRelationModel | ||
- LSTMComplexRelationModel | ||
- LSTMTucker3RelationModel | ||
###### Diagnostic models | ||
- DataBiasOnlyEntityModel | ||
- DataBiasOnlyRelationModel | ||
For model options see the init of the respective class. Additional combinations of score and embedding functions can be easily created: | ||
``` | ||
class BigramPoolingDistmultRelationModel(DistmultRelationScorer, BigramPoolingRelationEmbedder): | ||
|
||
def __init__(self, **kwargs): | ||
super().__init__(**kwargs) | ||
``` | ||
## Citation | ||
if you find this code useful for your research please cite | ||
``` | ||
@inproceedings{broscheit-etal-2020-predict, | ||
title = "Can We Predict New Facts with Open Knowledge Graph Embeddings? A Benchmark for Open Link Prediction", | ||
author = "Broscheit, Samuel and | ||
Gashteovski, Kiril and | ||
Wang, Yanjie and | ||
Gemulla, Rainer", | ||
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics", | ||
month = jul, | ||
year = "2020", | ||
address = "Online", | ||
publisher = "Association for Computational Linguistics", | ||
url = "https://www.aclweb.org/anthology/2020.acl-main.209", | ||
doi = "10.18653/v1/2020.acl-main.209", | ||
pages = "2296--2308", | ||
} | ||
``` |
Oops, something went wrong.