Name	Name	Last commit message	Last commit date
Latest commit jvivian Merge pull request #292 from akmorrow13/spark_2.4.3 Jul 3, 2019 bf75eae · Jul 3, 2019 History 388 Commits
16gt	16gt	Adding 16GT and SOAP-DP3 dependency.	Aug 24, 2017
SOAP3-dp	SOAP3-dp	Adding 16GT and SOAP-DP3 dependency.	Aug 24, 2017
adam	adam	Move to ADAM 0.22.0 using Spark 2, Scala 2.11.	Aug 24, 2017
apache-hadoop-common	apache-hadoop-common	Tag Apache Hadoop and Spark with git revision and latest.	Sep 9, 2017
apache-hadoop-master	apache-hadoop-master	Tag Apache Hadoop and Spark with git revision and latest.	Sep 9, 2017
apache-hadoop-worker	apache-hadoop-worker	Tag Apache Hadoop and Spark with git revision and latest.	Sep 9, 2017
apache-spark-master	apache-spark-master	Tag Apache Hadoop and Spark with git revision and latest.	Sep 9, 2017
apache-spark-worker	apache-spark-worker	Tag Apache Hadoop and Spark with git revision and latest.	Sep 9, 2017
avocado	avocado	updated avocado by removing spark scripts	Nov 30, 2018
bamQC	bamQC	updated Gencode v23 download location for bamQC	Nov 29, 2018
bcftools	bcftools	Adding bcftools.	Aug 24, 2017
bowtie2	bowtie2	Adding bowtie2.	Aug 24, 2017
bwa	bwa	Upgrade to bwa 0.7.15, Apache 2 licensed branch.	Aug 24, 2017
bwakit	bwakit	232: fix test	Mar 20, 2017
cannoli	cannoli	bump cannoli to recent branch	Nov 29, 2018
checkbias	checkbias	Added CheckBias	Mar 16, 2016
cmake	cmake	Add cmake (resolves #254 ) (#255 )	Aug 29, 2017
conductor	conductor	Move Conductor to Spark 2.	Aug 24, 2017
crossmap	crossmap	adding new image for CrossMap version 0.2.1, fixes #101	Mar 21, 2016
cutadapt	cutadapt	Update version in test	Oct 17, 2018
deca	deca	Bump to Deca 0.1.0 (resolves #265 )	Sep 15, 2017
fastq-dump	fastq-dump	Update fastq-dump to 2.8.1 (resolves #220 )	Jan 5, 2017
fastqc	fastqc	Containerize FastQC (resolves #111 )	Apr 14, 2016
freebayes	freebayes	Add FreeBayes.	Aug 24, 2017
gatk	gatk	upgrade gatk version (resolves #229 ) (#231 )	Mar 6, 2017
gatk4	gatk4	Adding GATK4 container.	Aug 24, 2017
gdc-client	gdc-client	Add GDC client (resolves #274 )	Apr 11, 2018
gencode_hugo_mapping	gencode_hugo_mapping	Fix ZeroDivisionError when no genes map (resolves #279 )	Aug 30, 2018
genetorrent	genetorrent	Use "set -e" in all wrappers	Feb 18, 2016
hera	hera	Fix hera_build import bug (resolves #267 )	Oct 19, 2017
kallisto	kallisto	Fix kallisto tagging in makefile (resolves #238 )	Mar 22, 2017
kallisto_sc	kallisto_sc	Fix Kallisto build	Apr 27, 2017
mango	mango	mango docker now runs with nodejs 8.X and Spark 2.4.3	Jul 3, 2019
manta	manta	Add Manta.	Aug 24, 2017
mapsplice	mapsplice	Use "set -e" in all wrappers	Feb 18, 2016
muse	muse	Fixed spacing on Dockerfile	Dec 14, 2015
mutect	mutect	Use "set -e" in all wrappers	Feb 18, 2016
picardtools	picardtools	Update Picard.	Aug 24, 2017
pindel	pindel	Use "set -e" in all wrappers	Feb 18, 2016
pizzly	pizzly	Remove wrapper copy	Aug 22, 2017
platypus	platypus	Add Platypus.	Aug 24, 2017
rnaseqc	rnaseqc	Use "set -e" in all wrappers	Feb 18, 2016
rsem	rsem	Use env in shebang for all wrappers	Feb 18, 2016
rsem_postprocess	rsem_postprocess	Adds normalized versions of RSEM's output (resolves #218 )	Jan 5, 2017
rtg_tools	rtg_tools	Issues/263 create rtgtools image (#264 )	Sep 5, 2017
s3am	s3am	Minor improvements and standardization to match codebase	Sep 23, 2016
samtools	samtools	issues/277 updated samtools version	Apr 30, 2018
snap	snap	Add SNAP.	Aug 24, 2017
snpeff	snpeff	Use "set -e" in all wrappers	Feb 18, 2016
spark-and-maven	spark-and-maven	update Spark version to 2.4.3	Jul 1, 2019
spladder	spladder	Spladder	Mar 3, 2016
spooky-test	spooky-test	Create a test image for spooky tests (resolves #177 )	Sep 12, 2016
star	star	Update STAR (resolves #245 )	May 9, 2017
strelka	strelka	Add Strelka.	Aug 24, 2017
ubu	ubu	Use "set -e" in all wrappers	Feb 18, 2016
vg	vg	Fix vg build and test	Apr 27, 2017
.gitignore	.gitignore	Issues/263 create rtgtools image (#264 )	Sep 5, 2017
README.md	README.md	Update example wrapper in README to use trap method	Feb 18, 2016
jenkins.py	jenkins.py	Fix to jenkins.py to skip tools that have been removed.	Aug 24, 2017

Name

Last commit message

Last commit date

jvivian

Merge pull request #292 from akmorrow13/spark_2.4.3

Jul 3, 2019

bf75eae · Jul 3, 2019

388 Commits

16gt

Adding 16GT and SOAP-DP3 dependency.

Aug 24, 2017

SOAP3-dp

Adding 16GT and SOAP-DP3 dependency.

Aug 24, 2017

adam

Move to ADAM 0.22.0 using Spark 2, Scala 2.11.

Aug 24, 2017

apache-hadoop-common

Tag Apache Hadoop and Spark with git revision and latest.

Sep 9, 2017

apache-hadoop-master

Tag Apache Hadoop and Spark with git revision and latest.

Sep 9, 2017

apache-hadoop-worker

Tag Apache Hadoop and Spark with git revision and latest.

Sep 9, 2017

apache-spark-master

Tag Apache Hadoop and Spark with git revision and latest.

Sep 9, 2017

apache-spark-worker

Tag Apache Hadoop and Spark with git revision and latest.

Sep 9, 2017

avocado

updated avocado by removing spark scripts

Nov 30, 2018

bamQC

updated Gencode v23 download location for bamQC

Nov 29, 2018

Aug 24, 2017

Aug 24, 2017

Upgrade to bwa 0.7.15, Apache 2 licensed branch.

Aug 24, 2017

bwakit

232: fix test

Mar 20, 2017

cannoli

bump cannoli to recent branch

Nov 29, 2018

checkbias

Added CheckBias

Mar 16, 2016

cmake

Add cmake (resolves #254 ) (#255 )

Aug 29, 2017

conductor

Move Conductor to Spark 2.

Aug 24, 2017

crossmap

adding new image for CrossMap version 0.2.1, fixes #101

Mar 21, 2016

cutadapt

Update version in test

Oct 17, 2018

deca

Bump to Deca 0.1.0 (resolves #265 )

Sep 15, 2017

fastq-dump

Update fastq-dump to 2.8.1 (resolves #220 )

Jan 5, 2017

fastqc

Containerize FastQC (resolves #111 )

Apr 14, 2016

freebayes

Add FreeBayes.

Aug 24, 2017

gatk

upgrade gatk version (resolves #229 ) (#231 )

Mar 6, 2017

gatk4

Adding GATK4 container.

Aug 24, 2017

gdc-client

Add GDC client (resolves #274 )

Apr 11, 2018

gencode_hugo_mapping

Fix ZeroDivisionError when no genes map (resolves #279 )

Aug 30, 2018

genetorrent

Use "set -e" in all wrappers

Feb 18, 2016

hera

Fix hera_build import bug (resolves #267 )

Oct 19, 2017

kallisto

Fix kallisto tagging in makefile (resolves #238 )

Mar 22, 2017

kallisto_sc

Fix Kallisto build

Apr 27, 2017

mango

mango docker now runs with nodejs 8.X and Spark 2.4.3

Jul 3, 2019

manta

Add Manta.

Aug 24, 2017

mapsplice

Use "set -e" in all wrappers

Feb 18, 2016

muse

Fixed spacing on Dockerfile

Dec 14, 2015

mutect

Use "set -e" in all wrappers

Feb 18, 2016

picardtools

Update Picard.

Aug 24, 2017

pindel

Use "set -e" in all wrappers

Feb 18, 2016

Aug 22, 2017

Aug 24, 2017

Use "set -e" in all wrappers

Feb 18, 2016

rsem

Use env in shebang for all wrappers

Feb 18, 2016

rsem_postprocess

Adds normalized versions of RSEM's output (resolves #218 )

Jan 5, 2017

rtg_tools

Issues/263 create rtgtools image (#264 )

Sep 5, 2017

s3am

Minor improvements and standardization to match codebase

Sep 23, 2016

samtools

issues/277 updated samtools version

Apr 30, 2018

snap

Add SNAP.

Aug 24, 2017

snpeff

Use "set -e" in all wrappers

Feb 18, 2016

spark-and-maven

update Spark version to 2.4.3

Jul 1, 2019

spladder

Spladder

Mar 3, 2016

spooky-test

Create a test image for spooky tests (resolves #177 )

Sep 12, 2016

star

Update STAR (resolves #245 )

May 9, 2017

strelka

Add Strelka.

Aug 24, 2017

ubu

Use "set -e" in all wrappers

Feb 18, 2016

Fix vg build and test

Apr 27, 2017

.gitignore

Issues/263 create rtgtools image (#264 )

Sep 5, 2017

README.md

Update example wrapper in README to use trap method

Feb 18, 2016

jenkins.py

Fix to jenkins.py to skip tools that have been removed.

Aug 24, 2017

Containerization Standards for Tools in Docker

Basic Philosophy

The goal of encapsulating a genomics tool in a Docker container is to create a modular, portable tool that is software agnostic and can run on almost any hardware. The tool should be setup such that the call to the tool only requires the appended arguments prepended by the standard Docker boilerplate:

docker run quay.io/ucsc_cgl/<Tool> [Parameters]

The Docker image should contain only the tool and the minimum dependencies needed to run that tool.
The tool should be launched when the person runs the image without needing to know where the tool is located or how it is called. If no parameters are passed, the user should be presented with the tool's help menu.
All images should have a folder /data that acts as the standard mount point. The final working directory in the container should be set to /data (WORKDIR /data).
Any scripts, jars, wrappers or other software should go in /opt/<tool name>
More complex tools with many build dependencies should follow the guidelines in Complex Tools. The general idea is to separate the build dependencies from runtime dependencies minimizing the final size of the deployed image.
Building a tool from source should only require changing to the tool’s directory and typing make. All built images should conform to the tag standards set in section Tag Conventions.
Every image should have an ENTRYPOINT set to a wrapper script. (see Wrapper Script)
All tools should be lowercase in the github repo and follow the directory structure outlined in the figure below. In this figure, samtools is a basic tool, while bwa is a complex tool.

Dockerfile Structure

The de-facto guide to follow is available on Docker's website.

Useful highlights:

Don't do RUN apt-get update on a single line. Pair with apt-get install using &&. This is due to issues with how Docker caches.
CD does not work intuitively. Use WORKDIR (absolute path).
Always attempt to launch the tool via ENTRYPOINT. Always use the "exec" form, e.g. ["foo", "bar"]

Complex Tools

A complex tool is a tool that requires several build dependencies and fewer (or different) runtime dependencies. In the end, it is up to the developer to decide whether or not a tool should conform to the standards we set for a complex tool, but if the end size of the image can be reduced or unneeded build dependencies can be eliminated, it is preferred. An example of a Makefile that orchestrates that is below:

# Definitions
build_output = runtime/gatk.jar
runtime_fullpath = $(realpath runtime)
build_tool = runtime-container.DONE
git_commit ?= $(shell git log --pretty=oneline -n 1 -- ../gatk | cut -f1 -d " ")
name = quay.io/ucsc_cgl/gatk
tag = 3.4--${git_commit}

# Steps
build: ${build_output} ${build_tool}

${build_output}: build/Dockerfile
	cd build && docker build -t gatkbuild .
	docker run -v ${runtime_fullpath}:/data gatkbuild cp gatk.jar /data

${build_tool}: ${build_output} runtime/Dockerfile
	cd runtime && docker build -t ${name}:${tag} .
	docker tag -f ${name}:${tag} ${name}:latest
	docker rmi -f gatkbuild
	touch ${build_tool}

push: build
	# Requires ~/.dockercfg
	docker push ${name}:${tag}
	docker push ${name}:latest

test: build
	python test.py

clean:
	-rm ${build_tool}
	-rm ${build_output}

Tag Conventions

Tags will be used in two ways: to record information about that particular build of the image and for easy deployment. Our group uses Jenkins for continuous integration of the project and conforms to the following tag standard:

${ToolVersion}--${MostRecentCommitHashForTool}

Latest Tag and Version Tag

In an effort to make the software as accessible as possible, every tool should have a latest tag associated with at least one image of that tool. Since our group now uses the Docker hosting site Quay.io, tags are visually linked by hash so one can always determine which commit is associated with the latest tag.

Branches

All tools should be on their own branch while under development. Once a tool is ready, that branch should be rebased to the Master and pull request submitted.

Wrapper Script

Every image should have a wrapper script set as the ENTRYPOINT which handles launching the tool (with parameters), and importantly, changing the ownership of all output files to the owner of the mounted /data directory. This wrapper script allows for all kinds of flexibility, as the example below shows the wrapper script handling ownership of output files from root to the host user as well as using environment variables to allow any number of java options to be passed during jar execution. An example of a wrapper script for gatk is shown below:

#!/usr/bin/env bash

# Fix ownership of output files
finish() {
    # Fix ownership of output files
    user_id=$(stat -c '%u:%g' /data)
    chown -R ${user_id} /data
}
trap finish EXIT

# Call tool with parameters
java $JAVA_OPTS -jar /opt/cgl-docker-lib/gatk.jar "$@"

Standards Within the Genomics Community

GA4GH members have agreed to begin work on creating standards for dockerizing genomics tools. Once that has happened, this document and repository will be updated to comply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Containerization Standards for Tools in Docker

Basic Philosophy

Dockerfile Structure

Complex Tools

Tag Conventions

Latest Tag and Version Tag

Branches

Wrapper Script

Standards Within the Genomics Community

About

Releases

Packages

Contributors 15

Languages

BD2KGenomics/cgl-docker-lib

Folders and files

Latest commit

History

Repository files navigation

Containerization Standards for Tools in Docker

Basic Philosophy

Dockerfile Structure

Complex Tools

Tag Conventions

Latest Tag and Version Tag

Branches

Wrapper Script

Standards Within the Genomics Community

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 15

Languages

Packages