Skip to content

uncharted-distil/distil-ingest

Repository files navigation

distil-ingest

CircleCI Go Report Card GolangCI

Dependencies

Requires the Go programming language binaries with the GOPATH environment variable specified and $GOPATH/bin in your PATH.

Installation

go get github.com/uncharted-distil/distil-ingest

Development

Clone the repository:

mkdir $GOPATH/src/github.com/unchartedsoftware
cd $GOPATH/src/github.com/unchartedsoftware
git clone [email protected]:uncharted-distil/distil-ingest.git

Install dependencies:

cd distil-ingest
make install

Build executable:

make build

Usage

The repository contains CLIs used to parse, and ingest 3M OpenML datasets (those with a name beginning with o_) into elasticsearch.

Merging training and target datasets:

Classifying merged datasets:

  • Update and ensure the arguments in ./classify_all.share correct
  • Run ./classify_all.sh

Ingesting merged and classified datasets:

  • Update and ensure the arguments in ./ingest_all.share correct
  • Run ./ingest_all.sh

Common Issues:

"EOF"

  • The Elasticsearch instance does not have http.compression enabled.
  • The mappings json argument is invalid, most likely missing a closing bracket

"No Elasticsearch node available"

  • You are accessing an Elasticsearch instance that requires a VPN and it is not on.
  • The Elasticsearch instance is temporarily down.

"dep: command not found":

  • Cause: $GOPATH/bin has not been added to your $PATH.
  • Solution: Add export PATH=$PATH:$GOPATH/bin to your .bash_profile or .bashrc.