You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Fix MLflow deployment for Canonical k8s (#195)
* Fix MLflow deployment for Canonical k8s
* Fix init container name
* Fix shared folder on Canonical k8s (#196)
* feat: run integration tests on canonical k8s (#203)
* feat: run integration tests on canonical k8s
* fix: gpu tests to run on Canonical k8s (#204)
* fix: run gpu ci on caonical k8s
* feat: update docs for canonical k8s (#215)
* feat: update docs for canonical k8s
* Angel's review
---------
Co-authored-by: afgambin <angel.fernandez@canonical.com>
* fix: use security context for the mounted volumens (#214)
* fix: store dss logs in snap common folder (#218)
* fix: store logs in snap common folder
---------
Co-authored-by: deusebio <edeusebio85@gmail.com>
---------
Co-authored-by: afgambin <angel.fernandez@canonical.com>
Co-authored-by: deusebio <edeusebio85@gmail.com>
Copy file name to clipboardExpand all lines: docs/explanation/dss-arch.rst
+19-25Lines changed: 19 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
1
DSS architecture
2
2
================
3
3
4
-
This guide provides an overview of the Data science stack (DSS) architecture, its main components and their interactions.
4
+
This guide provides an overview of the Data Science Stack (DSS) architecture, its main components, and their interactions.
5
5
6
6
DSS is a ready-to-run environment for Machine Learning (ML) and Data Science (DS).
7
-
It's built on open-source tooling, including `MicroK8s`_, JupyterLab and `MLflow <https://ubuntu.com/blog/what-is-mlflow>`_.
7
+
It's built on open-source tooling, including `Canonical K8s`_, JupyterLab, and `MLflow <https://ubuntu.com/blog/what-is-mlflow>`_.
8
8
9
9
DSS is distributed as a `snap`_ and usable on any Ubuntu workstation.
10
10
This provides robust security management and user-friendly version control, enabling seamless updates and auto-rollback in case of failure.
@@ -52,22 +52,22 @@ ML tools
52
52
53
53
DSS includes:
54
54
55
-
* Jupyter Notebooks: Opensource environment that provides a flexible interface to organise DS projects and ML workloads.
56
-
* MLflow: Opensource platform for managing the ML life cycle, including experiment tracking and model registry.
57
-
* ML frameworks: DSS comes by default with PyTorch and Tensorflow. Users can manually add other frameworks, depending on their needs and use cases.
55
+
* Jupyter Notebooks: Open-source environment that provides a flexible interface to organise DS projects and ML workloads.
56
+
* MLflow: Open-source platform for managing the ML life cycle, including experiment tracking and model registry.
57
+
* ML frameworks: DSS comes by default with PyTorch and TensorFlow. Users can manually add other frameworks, depending on their needs and use cases.
58
58
59
59
Jupyter Notebooks
60
60
^^^^^^^^^^^^^^^^^
61
61
62
-
A `Jupyter Notebook <Jupyter Notebooks_>`_ is essentially a `Kubernetes deployment <Pod_>`_, also known as `Pod`, running a Docker image with Jupyter Lab and a dedicated ML framework, such as Pytorch or Tensorflow.
62
+
A `Jupyter Notebook <Jupyter Notebooks_>`_ is essentially a `Kubernetes deployment <Pod_>`_, also known as `Pod`, running a Docker image with Jupyter Lab and a dedicated ML framework, such as PyTorch or TensorFlow.
63
63
For each Jupyter Notebook, DSS mounts a `Hostpath <Microk8s hostpath docs_>`_ directory-backed persistent volume to the data directory.
64
64
All Jupyter Notebooks share the same persistent volume, allowing them to exchange data seamlessly.
65
65
The full path to that persistent volume is `/home/jovyan/shared`.
66
66
67
67
MLflow
68
68
^^^^^^
69
69
70
-
`MLflow <https://ubuntu.com/blog/what-is-mlflow>`_ operates in `local mode <https://mlflow.org/docs/latest/tracking.html#other-configuration-with-mlflow-tracking-server>`_,
70
+
`MLflow <https://ubuntu.com/blog/what-is-mlflow>`_ operates in `local mode <https://mlflow.org/docs/latest/tracking/#other-tracking-setup>`_,
71
71
meaning that metadata and artefacts are, by default, stored in a local directory.
72
72
73
73
This local directory is backed by a persistent volume, mounted to a Hostpath directory of the MLflow Pod.
@@ -79,27 +79,22 @@ Orchestration
79
79
~~~~~~~~~~~~~
80
80
81
81
DSS requires a container orchestration solution.
82
-
DSS relies on `MicroK8s`_, a lightweight Kubernetes distribution.
82
+
DSS relies on `Canonical K8s`_, a lightweight Kubernetes distribution.
83
83
84
-
Therefore, MicroK8s needs to be deployed before installing DSS on the host machine.
85
-
It must be configured with the storage add-on.
86
-
This is required to use Hostpath storage in the cluster.
87
-
See :ref:`set_microk8s` to learn how to install MicroK8s.
84
+
Therefore, Canonical K8s needs to be deployed before installing DSS on the host machine.
85
+
It must be configured with local storage support to handle persistent volumes used by DSS.
88
86
89
87
.. _gpu_support:
90
88
91
89
GPU support
92
90
^^^^^^^^^^^
93
91
94
92
DSS can run with or without the use of GPUs.
95
-
If needed, MicroK8s can be configured with the desired `GPU add-on <https://microk8s.io/docs/addon-gpu>`_.
96
-
97
-
DSS is designed to support the deployment of containerised GPU workloads on NVIDIA GPUs.
98
-
MicroK8s simplifies the GPU access and usage through the `NVIDIA GPU Operator <NVIDIA Operator_>`_.
93
+
If needed, follow `NVIDIA GPU Operator <https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html>`_ for deployment details.
99
94
100
95
DSS does not automatically install the tools and libraries required for running GPU workloads.
101
-
To do so, it relies on MicroK8s for the required operating-system drivers.
102
-
It also relies on the chosen image, for example, CUDA when working with NVIDIA GPUs.
96
+
It relies on Canonical K8s for the required operating-system drivers.
97
+
It also depends on the chosen image, for example, CUDA when working with NVIDIA GPUs.
103
98
104
99
.. caution::
105
100
GPUs from other silicon vendors rather than NVIDIA can be configured. However, its functionality is not guaranteed.
@@ -108,16 +103,16 @@ Storage
108
103
^^^^^^^
109
104
110
105
DSS expects a default `storage class <https://kubernetes.io/docs/concepts/storage/storage-classes/>`_ in the Kubernetes deployment, which is used to persist Jupyter Notebooks and MLflow artefacts.
111
-
In MicroK8s, the Hostpath storage add-on is chosen, used to provision Kubernetes' *PersistentVolumeClaims* (`PVCs <https://kubernetes.io/docs/concepts/storage/persistent-volumes/>`_).
106
+
In Canonical K8s, a local storage class should be configured to provision Kubernetes' *PersistentVolumeClaims* (`PVCs <https://kubernetes.io/docs/concepts/storage/persistent-volumes/>`_).
112
107
113
108
A shared PVC is used across all Jupyter Notebooks to share and persist data.
114
109
MLflow also uses its dedicated PVC to store the logged artefacts.
115
110
This is the DSS default storage configuration and cannot be altered.
116
111
117
-
This choice ensures that all storage is backed up on the host machine in the event of MicroK8s restarts.
112
+
This choice ensures that all storage is backed up on the host machine in the event of cluster restarts.
118
113
119
114
.. note::
120
-
By default, you can access the DSS storage anytime under your local directory `/var/snap/microk8s/common/default-storage`.
115
+
By default, you can access the DSS storage anytime under your local directory `/var/snap/k8s/common/default-storage`.
121
116
122
117
The following diagram summarises the DSS storage:
123
118
@@ -132,7 +127,7 @@ The following diagram summarises the DSS storage:
132
127
Operating system
133
128
~~~~~~~~~~~~~~~~
134
129
135
-
DSS is native on Ubuntu, being developed, tested and validated on it.
130
+
DSS is native on Ubuntu, being developed, tested, and validated on it.
136
131
Moreover, the solution can be used on any Linux distribution.
137
132
138
133
Namespace configuration
@@ -147,8 +142,7 @@ This includes the GPU Operator for managing access and usage.
147
142
Accessibility
148
143
-------------
149
144
150
-
Jupyter Notebooks and MLflow can be accessed from a web browser through the Pod IP that is given access through MicroK8s.
145
+
Jupyter Notebooks and MLflow can be accessed from a web browser through the Pod IP that is given access through Canonical K8s.
151
146
See :ref:`access_notebook` and :ref:`access_mlflow` for more details.
You can remove DSS from your MicroK8s cluster through ``dss purge``.
66
+
You can remove DSS from your Canonical K8s cluster through ``dss purge``.
67
67
This command purges all the DSS components, including:
68
68
69
69
* All Jupyter Notebooks.
@@ -72,8 +72,8 @@ This command purges all the DSS components, including:
72
72
73
73
.. note::
74
74
75
-
This action removes the components of the DSS environment, but it does not remove the DSS CLI or your MicroK8s cluster.
76
-
To remove those, `delete their snaps <https://snapcraft.io/docs/quickstart-tour>`_.
75
+
This action removes the components of the DSS environment, but it does not remove the DSS CLI or your Canonical K8s cluster.
76
+
To remove those, `delete their snaps <https://snapcraft.io/docs/get-started>`_.
77
77
78
78
.. code-block:: bash
79
79
@@ -91,7 +91,7 @@ You should expect an output like this:
91
91
Success: All DSS components and notebooks purged successfully from the Kubernetes cluster.
92
92
93
93
Get status
94
-
-----------
94
+
----------
95
95
96
96
You can check the DSS status through ``dss status``.
97
97
This command provides a quick way to check the status of your DSS environment, including the MLflow status and whether a GPU is detected in your environment.
@@ -109,7 +109,7 @@ If you already have a DSS environment running and no GPU available, the expected
109
109
GPU acceleration: Disabled
110
110
111
111
List commands
112
-
--------------
112
+
-------------
113
113
114
114
You can get the list of available commands for DSS through the ``dss`` command with the ``--help`` option:
115
115
@@ -134,12 +134,11 @@ You should expect an output like this:
134
134
list Lists all created notebooks in the DSS environment.
135
135
logs Prints the logs for the specified notebook or DSS component.
136
136
purge Removes all notebooks and DSS components.
137
-
remove Remove a Jupter Notebook in DSS with the name NAME.
137
+
remove Remove a Jupyter Notebook in DSS with the name NAME.
138
138
start Starts a stopped notebook in the DSS environment.
139
139
status Checks the status of key components within the DSS...
140
140
stop Stops a running notebook in the DSS environment.
141
141
142
-
143
142
**Get details about a specific command**:
144
143
145
144
To see the usage and options of a DSS command, run ``dss <command>`` with the ``--help`` option.
@@ -174,4 +173,4 @@ See also
174
173
--------
175
174
176
175
* To learn how to manage your Jupyter Notebooks, check :ref:`manage_notebooks`.
177
-
* If you are interested in managing MLflow within your DSS environment, see :ref:`manage_MLflow`.
176
+
* If you are interested in managing MLflow within your DSS environment, see :ref:`manage_MLflow`.
0 commit comments