Skip to content

Commit

Permalink
Merge pull request #77 from hubmapconsortium/devel
Browse files Browse the repository at this point in the history
Devel - before this point, master had almost no functionality.
  • Loading branch information
jswelling authored Jun 5, 2020
2 parents d05862f + 2b8cba6 commit 6e70323
Show file tree
Hide file tree
Showing 100 changed files with 5,449 additions and 310 deletions.
15 changes: 15 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[submodule "src/ingest-pipeline/airflow/cwltool"]
path = src/ingest-pipeline/airflow/cwltool
url = [email protected]:hubmapconsortium/cwltool.git
[submodule "src/ingest-pipeline/airflow/dags/cwl/codex-pipeline"]
path = src/ingest-pipeline/airflow/dags/cwl/codex-pipeline
url = [email protected]:hubmapconsortium/codex-pipeline.git
[submodule "src/ingest-pipeline/submodules/ingest-validation-tools"]
path = src/ingest-pipeline/submodules/ingest-validation-tools
url = [email protected]:hubmapconsortium/ingest-validation-tools.git
[submodule "src/ingest-pipeline/airflow/dags/cwl/portal-containers"]
path = src/ingest-pipeline/airflow/dags/cwl/portal-containers
url = [email protected]:hubmapconsortium/portal-containers.git
[submodule "src/ingest-pipeline/airflow/dags/cwl/salmon-rnaseq"]
path = src/ingest-pipeline/airflow/dags/cwl/salmon-rnaseq
url = [email protected]:hubmapconsortium/salmon-rnaseq.git
37 changes: 37 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,43 @@ This repository implements the internals of the HuBMAP data repository
processing pipeline. This code is independent of the UI but works in
response to requests from the data-ingest UI backend.

## Using the devtest assay type

*devtest* is a mock assay for use by developers. It provides a testing tool controlled by a simple YAML file, allowing a developer to simulate execution of a full ingest pipeline without the need for real data. To do a devtest run, follow this procedure.

1) Create an input dataset, for example using the ingest UI.
- It must have a valid Source ID.
- Its datatype must be Other -> devtest
2) Insert a control file named *test.yml* into the top-level directory of the dataset. The file format is described below. You may include any other files in the directory, as long as test.yml exists.
3) Submit the dataset.

Ingest operations will proceed normally from that point:
1) The state of the original dataset will change from New through Processing to QA.
2) A secondary dataset will be created, and will move through Processing to QA with an adjustable delay (see below).
3) Files specified in *test.yml* may be moved into the dataset directory of the secondary dataset.
4) All normal metadata will be returned, including extra metadata specified in *test.yml* (see below).

The format for *test.yml* is:
```
{
# the following line is required for the submission to be properly identified at assay 'devtest'
collectiontype: devtest,
# The pipeline_exec stage will delay for this many seconds before returning (default 30 seconds)
delay_sec: 120,
# If this list is present, the listed files will be copied from the submission directory to the derived dataset.
files_to_copy: ["file_068.bov", "file_068.doubles"],
# If present, the given metadata will be returned as dataset metadata for the derived dataset.
metadata_to_return: {
mymessage: 'hello world',
othermessage: 'and also this'
}
}
```

## API

| <strong>API Test</strong> | |
Expand Down
2 changes: 1 addition & 1 deletion build_number
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2
66
6 changes: 6 additions & 0 deletions docker/docker-compose.dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,11 @@ services:
volumes:
# Mount the source code to container
- "../src/ingest-pipeline:/usr/src/app/src"
# Map the actual root of the staging area to container
- "${LZ_PATH-/hive/hubmap/lz}:/usr/local/airflow/lz"
environment:
- AIRFLOW_CONN_INGEST_API_CONNECTION=http://hubmap-auth:7777/
- AIRFLOW__WEBSERVER__WEB_SERVER_PORT=8789
- FLASK_RUN_PORT=8789


22 changes: 22 additions & 0 deletions docker/docker-compose.localhost.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
version: "3.7"

services:

ingest-pipeline:
volumes:
# Mount the source code to container. This includes the instance/app.cfg file
# one directory below.
- "../src/ingest-pipeline:/usr/src/app/src"
# Logic for finding schemata from dags requires a specific route to schemata
- "../src/ingest-pipeline/schemata:/usr/local/schemata"
# Map the actual root of the staging area to container
- "/hubmap-data:/usr/local/airflow/lz"
# Map the actual root of the scratch area to container
- "/hubmap-scratch:/hubmap-scratch"
environment:
- AIRFLOW_CONN_INGEST_API_CONNECTION=http://hubmap-auth:4444/
- AIRFLOW__WEBSERVER__WEB_SERVER_PORT=8789
- FLASK_RUN_PORT=8789



16 changes: 11 additions & 5 deletions docker/docker-compose.prod.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,18 @@ version: "3.7"

services:

ingest-api:
init: true
restart: always

ingest-ui:
ingest-pipeline:
init: true
restart: always
volumes:
# Mount the source code to container
- "../src/ingest-pipeline:/usr/src/app/src"
# Map the actual root of the staging area to container
- "${LZ_PATH-/hive/hubmap/lz}:/usr/local/airflow/lz"
environment:
- AIRFLOW_CONN_INGEST_API_CONNECTION=http://hubmap-auth:7777/
- AIRFLOW__WEBSERVER__WEB_SERVER_PORT=8789
- FLASK_RUN_PORT=8789



19 changes: 19 additions & 0 deletions docker/docker-compose.test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
version: "3.7"

services:

ingest-pipeline:
init: true
restart: always
volumes:
# Mount the source code to container
- "../src/ingest-pipeline:/usr/src/app/src"
# Map the actual root of the staging area to container
- "${LZ_PATH-/hive/hubmap/lz}:/usr/local/airflow/lz"
environment:
- AIRFLOW_CONN_INGEST_API_CONNECTION=http://hubmap-auth:7777/
- AIRFLOW__WEBSERVER__WEB_SERVER_PORT=8789
- FLASK_RUN_PORT=8789



53 changes: 33 additions & 20 deletions docker/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,46 +1,59 @@
version: "3.7"

# Will use the hostname when talking between services on the same network
services:


postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
logging:
options:
max-size: 10m
max-file: "3"
networks:
- gateway_hubmap


ingest-pipeline:
build: ./ingest-pipeline
# Build the image with name and tag
image: ingest-pipeline:0.1
restart: always
depends_on:
- postgres
hostname: ingest-pipeline
container_name: ingest-pipeline
volumes:
# Mount the app config to container in order to keep it outside of the image
- "../src/ingest-pipeline/instance:/usr/src/app/src/instance"
# Mount the logging to container
- "../logs:/usr/local/airflow/logs"
# Mount dags, plugins, and data
- "../src/ingest-pipeline/airflow/dags:/usr/local/airflow/dags:rw"
- "../src/ingest-pipeline/airflow/lib:/usr/local/airflow/lib:rw"
- "../src/ingest-pipeline/airflow/plugins:/usr/local/airflow/plugins:rw"
- "../src/ingest-pipeline/airflow/data:/usr/local/airflow/data:rw"
- "../src/ingest-pipeline/instance:/usr/local/airflow/instance"
- "${LZ_PATH-/hive/hubmap/lz}:/usr/local/airflow/lz"
# Mount requirements.txt
- ${PWD}/../src/ingest-pipeline/requirements.txt:/requirements.txt
- "../src/ingest-pipeline/requirements.txt:/requirements.txt"
environment:
- AIRFLOW_CONN_INGEST_API_CONNECTION=http://hubmap-auth:7777/
- AIRFLOW__HUBMAP_API_PLUGIN__BUILD_NUMBER=${BUILD_NUM}
- TZ=${TZ}
- AIRFLOW__WEBSERVER__WEB_SERVER_PORT=8789
- FLASK_RUN_PORT=8789
- LOAD_EX=n
- EXECUTOR=Local
- FLASK_ENV=development
- FLASK_DEBUG=1
# - AIRFLOW_CONN_INGEST_API_CONNECTION=http://hubmap-auth:84/
# - AIRFLOW_CONN_INGEST_API_CONNECTION=http://ingest-api:8484/
# - AIRFLOW_CONN_INGEST_API_CONNECTION=http://172.21.0.1:5000/
# - AIRFLOW__CLI__ENDPOINT_URL="http://localhost:8787"
# - AIRFLOW__WEBSERVER__BASE_URL="http://localhost:8787"
# - AIRFLOW__CORE__LOGGING_LEVEL=DEBUG

- AIRFLOW__HUBMAP_API_PLUGIN__BUILD_NUMBER=${INGEST_PIPELINE_BUILD_NUM:-0}
logging:
options:
max-size: 10m
max-file: "3"
networks:
- gateway_hubmap
ports:
- "8789:8789"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3


networks:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

# There's actually nothing to be done here, since the code is mounted from
# the main src directory
echo "$0 : nothing to do"

#mkdir -p ingest-pipeline/src

Expand Down
10 changes: 4 additions & 6 deletions docker/ingest-pipeline/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,9 @@ LABEL description="HuBMAP Ingest Pipeline" \
USER root
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y git
apt-get install -y git && \
apt-get install -y sudo
RUN echo 'airflow ALL=(ALL) NOPASSWD: /bin/chown airflow ., /bin/chgrp airflow .' > /etc/sudoers.d/90-airflow

USER airflow

# Airflow API, including hubmap plugin
EXPOSE 8789

# Flower API
EXPOSE 5555
1 change: 1 addition & 0 deletions src/ingest-pipeline/airflow/cwltool
Submodule cwltool added at 650b44
4 changes: 4 additions & 0 deletions src/ingest-pipeline/airflow/dags/.airflowignore
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
# This file contains regular expressions identifying contents that should be ignored by Airflow
mock_data
cwl
utils.py
workflow_map.yml
Loading

0 comments on commit 6e70323

Please sign in to comment.