This directory contains the TigerGraph/GSQL implementation of the Interactive workload of the LDBC SNB benchmark.
The recommended environment is that the benchmark scripts (Bash) and the LDBC driver (Java 8) run on the host machine, while TigerGraph database runs in a Docker container. Therefore, the requirements are as follows:
- Bash
- Java 8
- Docker 19+
In order to build the TigerGraph implementation, it is advised to use the Maven build tool from root directory of the project:
mvn package -DskipTests -Ptigergraph -U
Please, note the -U
flag. The project uses a snapshot version of TigerGraph client, and the switch helps to ensure that this artifact gets resolved properly.
In order to run the benchmark, the following configuration steps need to be performed:
- Preparing a dataset
- Setting up TigerGraph database (starting the cluster, running all the required services)
- Loading the data set into the database (defining data loading jobs, running them)
- Defining the queries to be executed (defining queries, installing them into the DB)
- Creating indices (TBD)
There are scripts provided in setup directory that can be used to perform steps 2-5 and some helper scripts in scripts to manage TigerGraph in a Docker container.
In this section, we will describe how to obtain the data set and load it into the database.
The data can be generated using the data generator tool or downloaded from the Hadoop-based Datagen. There are also some pre-generated data sets available in the SURF/CWI data repository.
The data sets need to be generated and preprocessed before loading it to the database.
To generate such data sets, use the Hadoop-based Datagen's CsvComposite
serializer classes (with the default date formatter):
ldbc.snb.datagen.serializer.dateFormatter:ldbc.snb.datagen.util.formatter.LongDateFormatter
ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.CsvCompositeDynamicActivitySerializer
ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.CsvCompositeDynamicPersonSerializer
ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.CsvCompositeStaticSerializer
Please note, that the loading procedure assumes that person data about languages and emails are combined into a person record.
An example configuration for scale factor 1 is given in the [`params-csv-composite-longdateformatter.ini`](https://github.com/ldbc/ldbc_snb_datagen_hadoop/blob/main/params-csv-composite-longdateformatter.ini) file of the Datagen repository.
TigerGraph uses a mechanism called "loading jobs" for data import. There are three loading job defined for this purpose:
setup.sh contains the commands to run the loading jobs. It takes two arguments:
- path to data
- path to queries
Please, note that the loading procedure assumes that the data files are present on the TigerGraph machine. They need to be uploaded there in advance (or mounted as a volume).
This section explains how to set up the database for the benchmarking using Docker containers.
The instruction assumes that you are starting in the tigergraph
subfolder of the project root directory (ldbc_snb_interactive_impls/tigergraph
).
Set the following environment variables based on your data source (the example below uses the test data set SF-0.003):
export TIGERGRAPH_DATA_DIR=`pwd`/test-data/social_network
To start the database, run the following script:
./scripts/stop.sh # if you have an existing TG database
# wait several seconds for docker to reset
./scripts/start.sh
It will start a single node TigerGraph database and all required services. Note the license in the container is a trial license supporting at most 100GB data. For benchmarks on SF-100 and larger, you need to obtain a license after running start.sh
. We have an example command in the end of start.sh
.
To set up the database, run the following script:
./scripts/setup.sh
This step may take a while (several minutes), as it is responsible for defining the queries, loading jobs, loading the data and installing the queries. After the data is ready, you can explore the graph using TigerGraph GraphStudio in the browser: http://localhost:14240/
. By default, the docker terminal can be accessed via ssh tigergraph@localhost -p 14022
with password tigergraph, or using Docker command docker exec --user tigergraph -it snb-interactive-tigergraph bash
.
The above scripts can be executed with a single command:
scripts/load-in-one-step.sh
To run the scripts of benchmark framework, edit the driver/{create-validation-parameters,validate,benchmark}.properties
files, then run their script, one of:
driver/create-validation-parameters.sh
driver/validate.sh
driver/benchmark.sh
scripts/backup-database.sh
and scripts/restore-database.sh
scripts to achieve this.