Templar

Build

We use the gradle shadowJar plugin to build the project.

gradle shadowJar

Setup

Make sure to run python word2vec_server.py before (this will use port 10000 by default).
Ensure your database properties are set correctly in config.properties in the main project folder. If this doesn't exist, you should create it by copying config.properties.example.
a functioning MySQL instance with necessary data from each dataset pre-loaded. For example, have MySQL up and running, then create database name mas and follow instructions on the MAS dataset README.

Running Templar tests

TemplarCV - Runs a cross-validation test on a specific dataset given some parameters.

After building, we can run:

java -cp build/libs/templar-all.jar edu.umich.templar.TemplarCV <dataset> <log_level> <log_join_on>

Choices for each argument:

<dataset>: mas, yelp, imdb
<log_level>: full, no_const, no_const_op
<log_join_on>: true, false

Disabling the candidates cache

Since a lot of keywords are frequently reused in each dataset, we implemented a cache to speed up testing. This can be enabled/disabled by changing the setting for ENABLE_CACHE in the edu.umich.templar.main.settings.Params.

These caches will be saved in data/<dataset>/<dataset>.cands.cache, so to clear the cache, just delete these files.

Adding new datasets

In order to add new datasets, you need to

Load the dataset with name <dataset> into MySQL.
Create the folder data/<dataset>. Each dataset is required to have the following files (see existing datasets for examples):
- <dataset>_keywords.csv: pre-parsed keywords, metadata, and answers. See other datasets for examples. Note specifically that we allow multiple correct answers, separated by semicolons, and that pairs are given in comma-separated form. This formatting matters because our accuracy evaluation is done via string comparison.
- <dataset>_joins.csv: correct join paths for each query. These are in a nested, parenthetical format, where the first table alphabetically is always the first, then a table's children is given by parentheses after it, and multiple children of a tree are separated by commas. For example, author(organization,writes(publication)) is a join path where author is the first alphabetical table name, then its children are organization and writes, and then writes has publication as a child. This formatting matters because our accuracy evaluation is done via string comparison.
- <dataset>_all.sqls: the correct SQL labels for each NLQ, one query per line. This is fed in as our query log.
- <dataset>.fkpk.json: a JSON file listing all the foreign key-primary key relationships in the schema
- <dataset>.main_attrs.json: defining the main/display/default attributes for each relation
- <dataset>.proj_attrs.json: defining the paired attributes for each relation

Name		Name	Last commit message	Last commit date
Latest commit History 639 Commits
.idea		.idea
_old		_old
data		data
libs		libs
mapping_results		mapping_results
src/main/java/edu/umich/templar		src/main/java/edu/umich/templar
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
config.properties.example		config.properties.example
requirements.txt		requirements.txt
templar.iml		templar.iml
word2vec_server.py		word2vec_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Templar

Build

Setup

Running Templar tests

Disabling the candidates cache

Adding new datasets

About

Releases

Packages

Languages

License

umich-dbgroup/templar

Folders and files

Latest commit

History

Repository files navigation

Templar

Build

Setup

Running Templar tests

Disabling the candidates cache

Adding new datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages