Skip to content

Commit

Permalink
More updates for getting cluster mode working with 1.5.0
Browse files Browse the repository at this point in the history
  • Loading branch information
rf972 committed Feb 11, 2022
1 parent a8c7b39 commit e7bf58a
Show file tree
Hide file tree
Showing 9 changed files with 53 additions and 100 deletions.
19 changes: 19 additions & 0 deletions cluster_setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Our cluster configuration uses docker host networking. There are a series of scripts to bring up the dockers that make up our cluster. You will likely need to tailor these scripts to meet the needs of your configuration.

We have several scripts:
spark/docker/start_master_host.sh This brings up the spark master container using host networking.
spark/docker/start_worker_host.sh This brings up the spark worker container using host networking.
spark/docker/start_launcher_host.sh This brings up the spark launcher container using host networking. This is the container where our run_tpch.sh will launch the benchmark from.
dikeHDFS/start_server_host.sh This brings up the docker with HDFS, and NDP.

There is a config file called spark/spark.config. It has the config of the addresses and hostnames needed by the above scripts. You need to modify it for your configuration. There is an example in our repo.

You also need to configure dikeHDFS/start_server_host.sh with your IP address. Change the line with --add-host=dikehdfs to include your storage server's ip address.

As an example, in our configuration we typically will follow this sequence.
1) From our master node we will run start_master_host.sh and start_launcher_host.sh
2) Next we go to the worker nodes and run start_worker_host.sh 1 8
3) Note that the 1 8 above is the number of workers followed by the number of cores to use.
4) Launch the NDP server via dikeHDFS/start_server_host.sh


20 changes: 10 additions & 10 deletions demo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,22 @@
printf "\nNext Test: Spark TPC-H query with HDFS storage and with no pushdown\n"
read -n 1 -s -r -p "Press any key to continue with test."
cd benchmark/tpch
./run_tpch.sh -t 6 -ds ndp --protocol ndphdfs
./run_tpch.sh --local -t 6 -ds ndp --protocol ndphdfs
printf "\nTest Complete: Spark TPC-H query with HDFS storage and with no pushdown\n"

printf "\nNext Test: Spark TPC-H query with HDFS storage and with pushdown enabled.\n"
read -n 1 -s -r -p "Press any key to continue with test."
./run_tpch.sh -t 6 -ds ndp --protocol ndphdfs --pushdown
./run_tpch.sh --local -t 6 -ds ndp --protocol ndphdfs --pushdown
printf "\nTest Complete: Spark TPC-H query with HDFS storage and with pushdown enabled.\n"



printf "\nNext Test: Spark TPC-H query with S3 storage and with no pushdown\n"
read -n 1 -s -r -p "Press any key to continue with test."
./run_tpch.sh -t 6 -ds ndp --protocol s3
printf "Test Complete: Spark TPC-H query with S3 storage and with no pushdown\n"
#printf "\nNext Test: Spark TPC-H query with S3 storage and with no pushdown\n"
#read -n 1 -s -r -p "Press any key to continue with test."
#./run_tpch.sh --local -t 6 -ds ndp --protocol s3
#printf "Test Complete: Spark TPC-H query with S3 storage and with no pushdown\n"

printf "\nNext Test: Spark TPC-H query with S3 and with pushdown enabled.\n"
read -n 1 -s -r -p "Press any key to continue with test."
./run_tpch.sh -t 6 -ds ndp --protocol s3 --pushdown
printf "\nTest Complete: Spark TPC-H query with S3 and with pushdown enabled.\n"
#printf "\nNext Test: Spark TPC-H query with S3 and with pushdown enabled.\n"
#read -n 1 -s -r -p "Press any key to continue with test."
#./run_tpch.sh --local -t 6 -ds ndp --protocol s3 --pushdown
#printf "\nTest Complete: Spark TPC-H query with S3 and with pushdown enabled.\n"
2 changes: 1 addition & 1 deletion dikeHDFS
Submodule dikeHDFS updated 1 files
+93 −0 start_server_host.sh
64 changes: 5 additions & 59 deletions spark/docker/start-launcher.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,78 +11,25 @@ rm -f "${ROOT_DIR}/volume/status/MASTER*"

CMD="sleep 365d"
RUNNING_MODE="daemon"
START_LOCAL="NO"
if [ ! -d spark.config ]; then
START_LOCAL="YES"
else
DOCKER_HOSTS="$(cat spark.config | grep DOCKER_HOSTS)"
IFS='=' read -a IP_ARRAY <<< "$DOCKER_HOSTS"
DOCKER_HOSTS=${IP_ARRAY[1]}
HOSTS=""
IFS=',' read -a IP_ARRAY <<< "$DOCKER_HOSTS"
for i in "${IP_ARRAY[@]}"
do
HOSTS="$HOSTS --add-host=$i"
done
DOCKER_HOSTS=$HOSTS
echo "Docker Hosts: $DOCKER_HOSTS"

LAUNCHER_IP="$(cat spark.config | grep LAUNCHER_IP)"
IFS='=' read -a IP_ARRAY <<< "$LAUNCHER_IP"
LAUNCHER_IP=${IP_ARRAY[1]}
echo "LAUNCHER_IP: $LAUNCHER_IP"
fi
DOCKER_ID=""
if [ $RUNNING_MODE = "interactive" ]; then
DOCKER_IT="-i -t"
fi
# --cpuset-cpus="9-12" \
if [ ${START_LOCAL} == "YES" ]; then
DOCKER_RUN="docker run ${DOCKER_IT} --rm \
DOCKER_RUN="docker run ${DOCKER_IT} --rm \
-p 5006:5006 \
--name sparklauncher \
--network dike-net \
-e MASTER=spark://sparkmaster:7077 \
-e SPARK_CONF_DIR=/conf \
-e SPARK_PUBLIC_DNS=localhost \
--mount type=bind,source=$(pwd)/spark,target=/spark \
--mount type=bind,source=$(pwd)/build,target=/build \
--mount type=bind,source=$(pwd)/examples,target=/examples \
--mount type=bind,source=$(pwd)/../data,target=/tpch-data \
--mount type=bind,source=$(pwd)/../dikeHDFS,target=/dikeHDFS \
--mount type=bind,source=$(pwd)/../benchmark/tpch,target=/tpch \
--mount type=bind,source=$(pwd)/../pyNdp,target=/pyNdp \
--mount type=bind,source=$(pwd)/../pushdown-datasource/pushdown-datasource,target=/pushdown-datasource \
-v $(pwd)/conf/master:/conf \
-v ${ROOT_DIR}/build/.m2:${DOCKER_HOME_DIR}/.m2 \
-v ${ROOT_DIR}/build/.gnupg:${DOCKER_HOME_DIR}/.gnupg \
-v ${ROOT_DIR}/build/.sbt:${DOCKER_HOME_DIR}/.sbt \
-v ${ROOT_DIR}/build/.cache:${DOCKER_HOME_DIR}/.cache \
-v ${ROOT_DIR}/build/.ivy2:${DOCKER_HOME_DIR}/.ivy2 \
-v ${ROOT_DIR}/volume/status:/opt/volume/status \
-v ${ROOT_DIR}/volume/logs:/opt/volume/logs \
-v ${ROOT_DIR}/bin/:${DOCKER_HOME_DIR}/bin \
-e "AWS_ACCESS_KEY_ID=${USER_NAME}" \
-e "AWS_SECRET_ACCESS_KEY=admin123" \
-e "AWS_EC2_METADATA_DISABLED=true" \
-e RUNNING_MODE=${RUNNING_MODE} \
-u ${USER_ID} \
spark-run-${USER_NAME} ${CMD}"
else
DOCKER_RUN="docker run ${DOCKER_IT} --rm \
-p 5006:5006 \
--name sparklauncher \
--network dike-net --ip ${LAUNCHER_IP} ${DOCKER_HOSTS} \
-e MASTER=spark://sparkmaster:7077 \
-e SPARK_CONF_DIR=/conf \
-e SPARK_PUBLIC_DNS=localhost \
-e SPARK_MASTER="spark://sparkmaster:7077" \
-e SPARK_DRIVER_HOST=${LAUNCHER_IP} \
--mount type=bind,source=$(pwd)/spark,target=/spark \
--mount type=bind,source=$(pwd)/build,target=/build \
--mount type=bind,source=$(pwd)/examples,target=/examples \
--mount type=bind,source=$(pwd)/../dikeHDFS,target=/dikeHDFS \
--mount type=bind,source=$(pwd)/../benchmark/tpch,target=/tpch \
--mount type=bind,source=$(pwd)/../data,target=/tpch-data \
--mount type=bind,source=$(pwd)/../pushdown-datasource/pushdown-datasource,target=/pushdown-datasource \
-v $(pwd)/conf/master:/conf \
-v ${ROOT_DIR}/build/.m2:${DOCKER_HOME_DIR}/.m2 \
Expand All @@ -98,11 +45,10 @@ else
-e "AWS_EC2_METADATA_DISABLED=true" \
-e RUNNING_MODE=${RUNNING_MODE} \
-u ${USER_ID} \
spark-run-${USER_NAME} ${CMD}"
fi
echo "mode: $RUNNING_MODE"
v${DIKE_VERSION}-spark-run-${USER_NAME} ${CMD}"

if [ $RUNNING_MODE = "interactive" ]; then
eval "${DOCKER_RUN}"
else
eval "${DOCKER_RUN}" &
fi
fi
9 changes: 5 additions & 4 deletions spark/docker/start-master.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/bin/bash

# Include the setup for our cached local directories. (.m2, .ivy2, etc)
source docker/spark_version
source docker/setup.sh

mkdir -p "${ROOT_DIR}/volume/logs"
Expand Down Expand Up @@ -37,8 +38,8 @@ else
fi
fi
echo "removing work and logs"
rm -rf build/spark-3.1.2/work/
rm -rf build/spark-3.1.2/logs/
rm -rf build/spark-$SPARK_VERSION/work/
rm -rf build/spark-$SPARK_VERSION/logs/

# --cpuset-cpus="9-12" \
if [ ${START_LOCAL} == "YES" ]; then
Expand Down Expand Up @@ -67,7 +68,7 @@ if [ ${START_LOCAL} == "YES" ]; then
-v ${ROOT_DIR}/bin/:${DOCKER_HOME_DIR}/bin \
-e RUNNING_MODE=${RUNNING_MODE} \
-u ${USER_ID} \
spark-run-${USER_NAME} ${CMD}"
v${DIKE_VERSION}-spark-run-${USER_NAME} ${CMD}"
else
DOCKER_RUN="docker run ${DOCKER_IT} --rm \
-p 4040:4040 -p 6066:6066 -p 7077:7077 -p 8080:8080 -p 5005:5005 -p 18080:18080 \
Expand Down Expand Up @@ -98,7 +99,7 @@ else
-e "AWS_EC2_METADATA_DISABLED=true" \
-e RUNNING_MODE=${RUNNING_MODE} \
-u ${USER_ID} \
spark-run-${USER_NAME} ${CMD}"
v${DIKE_VERSION}-spark-run-${USER_NAME} ${CMD}"
fi
if [ $RUNNING_MODE = "interactive" ]; then
eval "${DOCKER_RUN}"
Expand Down
6 changes: 3 additions & 3 deletions spark/docker/start-worker-host.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash

source docker/spark_version
source docker/setup.sh

mkdir -p "${ROOT_DIR}/volume/logs"
Expand All @@ -20,8 +20,8 @@ if [ "$#" -ge 2 ] ; then
CORES=$2
fi
echo "removing work and logs"
rm -rf build/spark-3.1.2/work/
rm -rf build/spark-3.1.2/logs/
rm -rf build/spark-$SPARK_VERSION/work/
rm -rf build/spark-$SPARK_VERSION/logs/

echo "Workers: $WORKERS"
echo "Cores: $CORES"
Expand Down
26 changes: 5 additions & 21 deletions spark/docker/start-worker.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash

source docker/spark_version
source docker/setup.sh

mkdir -p "${ROOT_DIR}/volume/logs"
Expand All @@ -20,27 +20,11 @@ if [ "$#" -ge 2 ] ; then
CORES=$2
fi
echo "removing work and logs"
rm -rf build/spark-3.1.2/work/
rm -rf build/spark-3.1.2/logs/
rm -rf build/spark-$SPARK_VERSION/work/
rm -rf build/spark-$SPARK_VERSION/logs/

echo "Workers: $WORKERS"
echo "Cores: $CORES"
DOCKER_HOSTS="$(cat spark.config | grep DOCKER_HOSTS)"
IFS='=' read -a IP_ARRAY <<< "$DOCKER_HOSTS"
DOCKER_HOSTS=${IP_ARRAY[1]}
HOSTS=""
IFS=',' read -a IP_ARRAY <<< "$DOCKER_HOSTS"
for i in "${IP_ARRAY[@]}"
do
HOSTS="$HOSTS --add-host=$i"
done
DOCKER_HOSTS=$HOSTS
echo "Docker Hosts: $DOCKER_HOSTS"

WORKER_IP="$(cat spark.config | grep WORKER_IP)"
IFS='=' read -a IP_ARRAY <<< "$WORKER_IP"
WORKER_IP=${IP_ARRAY[1]}
echo "WORKER_IP: $WORKER_IP"

if [ $RUNNING_MODE = "interactive" ]; then
DOCKER_IT="-i -t"
Expand All @@ -50,7 +34,7 @@ fi
DOCKER_RUN="docker run ${DOCKER_IT} --rm -p 8081:8081 \
--expose 7012 --expose 7013 --expose 7014 --expose 7015 --expose 8881 \
--name sparkworker \
--network dike-net --ip ${WORKER_IP} ${DOCKER_HOSTS} \
--network dike-net \
-e SPARK_CONF_DIR=/conf \
-e SPARK_WORKER_INSTANCES=$WORKERS \
-e SPARK_WORKER_CORES=$CORES \
Expand All @@ -72,7 +56,7 @@ DOCKER_RUN="docker run ${DOCKER_IT} --rm -p 8081:8081 \
-v ${ROOT_DIR}/bin/:${DOCKER_HOME_DIR}/bin \
-e RUNNING_MODE=${RUNNING_MODE} \
-u ${USER_ID} \
spark-run-${USER_NAME} ${CMD}"
v${DIKE_VERSION}-spark-run-${USER_NAME} ${CMD}"


if [ $RUNNING_MODE = "interactive" ]; then
Expand Down
5 changes: 4 additions & 1 deletion spark/start.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,7 @@

./docker/start-master.sh && sleep 5 && ./docker/start-worker.sh

sleep 5
sleep 5
./docker/start-launcher.sh

sleep 5
2 changes: 1 addition & 1 deletion start_hdfs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ echo $CMDSTATUS
if [ $CMDSTATUS -ne 0 ]; then
pushd benchmark/tpch
echo "Initialize tpch CSV database in hdfs"
./run_tpch.sh --mode initCsv --protocol hdfs || (echo "*** failed tpch init of CSV for hdfs $?" ; exit 1)
./run_tpch.sh --local --mode initCsv --protocol hdfs || (echo "*** failed tpch init of CSV for hdfs $?" ; exit 1)
popd
fi

0 comments on commit e7bf58a

Please sign in to comment.