From a5e6eb748a17232056efa8f422d7007aba33ed10 Mon Sep 17 00:00:00 2001 From: shalberd <21118431+shalberd@users.noreply.github.com> Date: Fri, 23 Aug 2024 23:29:43 +0200 Subject: [PATCH] added documentation on file run output to S3 storage and logging. Mentioned new runtime env variable ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3 Signed-off-by: shalberd <21118431+shalberd@users.noreply.github.com> --- .../README.md | 19 ++++++++++++++++++- .../README.md | 19 ++++++++++++++++++- .../run-pipelines-on-apache-airflow/README.md | 17 ++++++++++++++++- .../README.md | 18 +++++++++++++++++- 4 files changed, 69 insertions(+), 4 deletions(-) diff --git a/pipelines/run-generic-pipelines-on-apache-airflow/README.md b/pipelines/run-generic-pipelines-on-apache-airflow/README.md index b6762d2..4213876 100644 --- a/pipelines/run-generic-pipelines-on-apache-airflow/README.md +++ b/pipelines/run-generic-pipelines-on-apache-airflow/README.md @@ -53,7 +53,24 @@ Elyra currently supports Apache Airflow deployments that utilize GitHub or GitHu - Branch in named repository, e.g. `test-dags`. This branch must exist. - [Personal access token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) that Elyra can use to push DAGs to the repository, e.g. `4d79206e616d6520697320426f6e642e204a616d657320426f6e64` -Elyra utilizes S3-compatible cloud storage to make data available to notebooks and Python scripts while they are executed. Any kind of cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab is running and the Apache Airflow cluster. Collect the following information: +Elyra utilizes S3-compatible cloud storage to make data available to Jupyter notebooks and R or Python scripts while they are executed. Any kind of cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab is running and the Apache Airflow cluster. + +Elyra also puts the STDOUT (including STDERR) run output into a file when env var `ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3` is set to `true` or not present in the runtime container, which is the default. +This happens in addition to logging and writing to STDOUT and STDERR at runtime. + +`ipynb` file execution run/STDOUT output is written to S3-compatible object storage in the following files: +- `-output.ipynb` +- `.html` + +.r and .py file execution run/STDOUT output is written to to S3-compatible object storage in the following files: +- `.log` + +Note: If you prefer to use S3-compatible storage for transfer of files between pipeline steps only and **not for logging information / run output of R, Python and Jupyter Notebook files**, +either set env var **`ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3`** to **`false`** in runtime container builds or pass that env value explicitely in the env section of the pipeline editor, +either at Pipeline Properties - Generic Node Defaults - Environment Variables or at +Node Properties - Additional Properties - Environment Variables. + +Collect the following information: - S3 compatible object storage endpoint, e.g. `http://minio-service.kubernetes:9000` - S3 object storage username, e.g. `minio` - S3 object storage password, e.g. `minio123` diff --git a/pipelines/run-generic-pipelines-on-kubeflow-pipelines/README.md b/pipelines/run-generic-pipelines-on-kubeflow-pipelines/README.md index d3856d0..0527a4c 100644 --- a/pipelines/run-generic-pipelines-on-kubeflow-pipelines/README.md +++ b/pipelines/run-generic-pipelines-on-kubeflow-pipelines/README.md @@ -47,7 +47,24 @@ Collect the following information for your Kubeflow Pipelines installation: - Password, for a multi-user, auth-enabled Kubeflow installation, e.g. `passw0rd` - Workflow engine type, which should be `Argo` or `Tekton`. Contact your administrator if you are unsure which engine your deployment utilizes. -Elyra utilizes S3-compatible cloud storage to make data available to notebooks and scripts while they are executed. Any kind of cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab is running and from the Kubeflow Pipelines cluster. Collect the following information: +Elyra utilizes S3-compatible cloud storage to make data available to Jupyter notebooks and R or Python scripts while they are executed. Any kind of cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab is running and from the Kubeflow Pipelines cluster. + +Elyra also puts the STDOUT (including STDERR) run output into a file when env var `ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3` is set to `true` or not present in the runtime container, which is the default. +This happens in addition to logging and writing to STDOUT and STDERR at runtime. + +`ipynb` file execution run/STDOUT output is written to S3-compatible object storage in the following files: +- `-output.ipynb` +- `.html` + +.r and .py file execution run/STDOUT output is written to to S3-compatible object storage in the following files: +- `.log` + +Note: If you prefer to use S3-compatible storage for transfer of files between pipeline steps only and **not for logging information / run output of R, Python and Jupyter Notebook files**, +either set env var **`ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3`** to **`false`** in runtime container builds or pass that env value explicitely in the env section of the pipeline editor, +either at Pipeline Properties - Generic Node Defaults - Environment Variables or at +Node Properties - Additional Properties - Environment Variables. + +Collect the following information: - S3 compatible object storage endpoint, e.g. `http://minio-service.kubernetes:9000` - S3 object storage username, e.g. `minio` - S3 object storage password, e.g. `minio123` diff --git a/pipelines/run-pipelines-on-apache-airflow/README.md b/pipelines/run-pipelines-on-apache-airflow/README.md index 2bb5e00..b67fa6a 100644 --- a/pipelines/run-pipelines-on-apache-airflow/README.md +++ b/pipelines/run-pipelines-on-apache-airflow/README.md @@ -52,7 +52,22 @@ Collect the following information for your Apache Airflow installation: Detailed instructions for setting up a DAG repository and generating an access token can be found in [the User Guide](https://elyra.readthedocs.io/en/latest/recipes/configure-airflow-as-a-runtime.html#setting-up-a-dag-repository-on-github). -Elyra utilizes S3-compatible cloud storage to make data available to notebooks and scripts while they are executed. Any kind of S3-based cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab/Elyra is running and from the Apache Airflow cluster. +Elyra utilizes S3-compatible cloud storage to make data available to Jupyter notebooks and R or Python scripts while they are executed. Any kind of S3-based cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab/Elyra is running and from the Apache Airflow cluster. + +Elyra also puts the STDOUT (including STDERR) run output into a file when env var `ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3` is set to `true` or not present in the runtime container, which is the default. +This happens in addition to logging and writing to STDOUT and STDERR at runtime. + +`ipynb` file execution run/STDOUT output is written to S3-compatible object storage in the following files: +- `-output.ipynb` +- `.html` + +.r and .py file execution run/STDOUT output is written to to S3-compatible object storage in the following files: +- `.log` + +Note: If you prefer to use S3-compatible storage for transfer of files between pipeline steps only and **not for logging information / run output of R, Python and Jupyter Notebook files**, +either set env var **`ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3`** to **`false`** in runtime container builds or pass that env value explicitely in the env section of the pipeline editor, +either at Pipeline Properties - Generic Node Defaults - Environment Variables or at +Node Properties - Additional Properties - Environment Variables. Collect the following information: - S3 compatible object storage endpoint, e.g. `http://minio-service.kubernetes:9000` diff --git a/pipelines/run-pipelines-on-kubeflow-pipelines/README.md b/pipelines/run-pipelines-on-kubeflow-pipelines/README.md index d795c14..ed7b373 100644 --- a/pipelines/run-pipelines-on-kubeflow-pipelines/README.md +++ b/pipelines/run-pipelines-on-kubeflow-pipelines/README.md @@ -52,7 +52,23 @@ Collect the following information for your Kubeflow Pipelines installation: - Password, for a multi-user, auth-enabled Kubeflow installation, e.g. `passw0rd` - Workflow engine type, which should be `Argo` or `Tekton`. Contact your administrator if you are unsure which engine your deployment utilizes. -Elyra utilizes S3-compatible cloud storage to make data available to notebooks and scripts while they are executed. Any kind of S3-based cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab/Elyra is running and from the Kubeflow Pipelines cluster. +Elyra utilizes S3-compatible cloud storage to make data available to Jupyter notebooks and R or Python scripts while they are executed. Any kind of S3-based cloud storage should work (e.g. IBM Cloud Object Storage or Minio) as long as it can be accessed from the machine where JupyterLab/Elyra is running and from the Kubeflow Pipelines cluster. + +Elyra also puts the STDOUT (including STDERR) run output into a file when env var `ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3` is set to `true` or not present in the runtime container, which is the default. +This happens in addition to logging and writing to STDOUT and STDERR at runtime. + +`ipynb` file execution run/STDOUT output is written to S3-compatible object storage in the following files: +- `-output.ipynb` +- `.html` + +.r and .py file execution run/STDOUT output is written to to S3-compatible object storage in the following files: +- `.log` + +Note: If you prefer to use S3-compatible storage for transfer of files between pipeline steps only and **not for logging information / run output of R, Python and Jupyter Notebook files**, +either set env var **`ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3`** to **`false`** in runtime container builds or pass that env value explicitely in the env section of the pipeline editor, +either at Pipeline Properties - Generic Node Defaults - Environment Variables or at +Node Properties - Additional Properties - Environment Variables. + Collect the following information: - S3 compatible object storage endpoint, e.g. `http://minio-service.kubernetes:9000`