Skip to content
forked from tuzhucheng/Castor

Deep learning for information retrieval with PyTorch

Notifications You must be signed in to change notification settings

mengf821/Castor

This branch is 5 commits behind tuzhucheng/Castor:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

8563ad5 · Jul 3, 2018
May 23, 2018
Jul 3, 2018
Jun 24, 2018
Jul 3, 2018
May 25, 2018
May 25, 2018
Jul 3, 2018
Jun 19, 2018
Jun 19, 2018
May 25, 2018
Jun 19, 2018
Jul 3, 2018
Jun 19, 2018
May 27, 2018
Nov 4, 2017
Apr 18, 2017
Nov 25, 2017
Jun 24, 2018
Nov 25, 2017
May 3, 2017

Repository files navigation

Castor

This is the common repo for PyTorch deep learning models by the Data Systems Group at the University of Waterloo.

Models

Predictions Over One Input Text Sequence

For sentiment analysis, topic classification, etc.

Predictions Over Two Input Text Sequences

For paraphrase detection, question answering, etc.

Each model directory has a README.md with further details.

Setting up PyTorch

If you are an internal Castor contributor using GPU machines in the lab, follow the instructions here.

Castor is designed for Python 3.6 and PyTorch 0.4. PyTorch recommends Anaconda for managing your environment. We'd recommend creating a custom environment as follows:

$ conda create --name castor python=3.6
$ source activate castor

And installing the packages as follows:

$ conda install pytorch torchvision -c pytorch

Other Python packages we use can be installed via pip:

$ pip install -r requirements.txt

Code depends on data from NLTK (e.g., stopwords) so you'll have to download them. Run the Python interpreter and type the commands:

>>> import nltk
>>> nltk.download()

Finally, run the following inside the utils directory to build the trec_eval tool for evaluating certain datasets.

$ ./get_trec_eval.sh

Data and Pre-Trained Models

If you are an internal Castor contributor using GPU machines in the lab, follow the instructions here.

To fully take advantage of code here, clone these other two repos:

Organize your directory structure as follows:

.
├── Castor
├── Castor-data
└── Castor-models

For example (using HTTPS):

$ git clone https://github.com/castorini/Castor.git
$ git clone https://git.uwaterloo.ca/jimmylin/Castor-data.git
$ git clone https://git.uwaterloo.ca/jimmylin/Castor-models.git

After cloning the Castor-data repo, you need to unzip embeddings and run data pre-processing scripts. You can choose to follow instructions under each dataset and embedding directory separately, or just run the following script in Castor-data to do all of the steps for you:

$ ./setup.sh

About

Deep learning for information retrieval with PyTorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 89.6%
  • JavaScript 5.3%
  • Java 3.7%
  • Other 1.4%