Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] - Add local support for Mac #1405

Closed
iameskild opened this issue Aug 18, 2022 · 26 comments
Closed

[ENH] - Add local support for Mac #1405

iameskild opened this issue Aug 18, 2022 · 26 comments
Assignees
Labels
area:developer-experience 👩🏻‍💻 area: docker 🐋 area: user experience 👩🏻‍💻 impact: medium 🟨 This item affects some users, not critical needs: investigation 🔍 Someone in the team needs to find the root cause and replicate this bug type: maintenance 🛠 Day-to-day maintenance tasks

Comments

@iameskild
Copy link
Member

iameskild commented Aug 18, 2022

Feature description

It would be nice, as developers, to develop on Mac (and Windows) using a local (kind) deployment.

Value and/or benefit

Currently local deployments are only available for Linux. Adding this would allow those with Macs to develop and prototype more quickly the current workarounds (see dev docs).

Anything else?

Affects all Macs (Apple Silicon + Intel).

@iameskild
Copy link
Member Author

I am currently able to partially deploy on Mac (see branch kind_on_mac). Unfortunately this solution is only partially and quite "hacky". More work will be needed before this is fully usable.

@iameskild
Copy link
Member Author

So far I have tried (or researched) the following with no success:

  • UTM virtual machine
  • virtualbox virtual machine
    • last time I was reading about this, it wasn't possible to run on Apple Silicon (M1/M2) macs but this might have changed and I will need to give this another try.

@pavithraes pavithraes changed the title [ENH] - Add local support for Mac [ENH] - Add local support for Mac & Windows Aug 28, 2023
@pavithraes pavithraes added the project: JATIC Work item needed for the JATIC project label Aug 28, 2023
@kcpevey
Copy link
Contributor

kcpevey commented Aug 28, 2023

Blocked by #1950

@kcpevey kcpevey added the status: blocked ⛔️ This item is on hold due to another task label Aug 28, 2023
@kcpevey kcpevey added this to the 2024.2.1 milestone Jan 30, 2024
@pavithraes pavithraes added the type: maintenance 🛠 Day-to-day maintenance tasks label Jan 30, 2024
@kcpevey kcpevey removed impact: high 🟥 This issue affects most of the nebari users or is a critical issue project: JATIC Work item needed for the JATIC project labels Feb 6, 2024
@pavithraes pavithraes added the impact: medium 🟨 This item affects some users, not critical label Feb 8, 2024
@pavithraes pavithraes removed the status: blocked ⛔️ This item is on hold due to another task label Feb 8, 2024
@pavithraes pavithraes modified the milestones: 2024.2.1, Release Q2 2024 Feb 16, 2024
@marcelovilla
Copy link
Member

The issue around Nebari not working locally on a kind cluster on macOS/Windows stems from the fact that Docker containers are not executed natively but rather in a VM. Thus, the container networks are not exposed and cannot be reached from the host.

There are a couple of options we can try to be able to deploy Nebari locally on macOS/Windows:

@viniciusdc
Copy link
Contributor

seems interesting as an option with Vagrant and Kubernetes https://ugurakgul.medium.com/creating-a-local-kubernetes-cluster-with-vagrant-ba591ab70ee2

@aktech
Copy link
Member

aktech commented May 8, 2024

I have got it working on my Mac M1. Here are the docs: nebari-dev/nebari-docs#454

Screenshot 2024-05-08 at 3 09 16 pm Screenshot 2024-05-08 at 3 09 37 pm Screenshot 2024-05-08 at 3 09 02 pm

This what my nebari config looks like:

provider: local
namespace: dev
nebari_version: 2024.5.1rc1
project_name: local-dev
domain: nebari.iakte.ch
ci_cd:
  type: none
terraform_state:
  type: remote
security:
  keycloak:
    initial_root_password: <SANITIZED>
    overrides:
      image:
        repository: quay.io/aktech/keycloak
        tag: 15.0.2
  authentication:
    type: password

default_images:
  jupyterhub: quay.io/nebari/nebari-jupyterhub:m1-image
  jupyterlab: quay.io/nebari/nebari-jupyterlab:m1-image
  dask_worker: quay.io/nebari/nebari-dask-worker:m1-image

jhub_apps:
  enabled: true

argo_workflows:
  enabled: false

conda_store:
  image: quay.io/aktech/conda-store-server
  image_tag: sha-558beb8
theme:
  jupyterhub:
    hub_title: Nebari - local-dev
    welcome: Welcome! Learn about Nebari's features and configurations in <a href="https://www.nebari.dev/docs/welcome">the
      documentation</a>. If you have any questions or feedback, reach the team on
      <a href="https://www.nebari.dev/docs/community#getting-support">Nebari's support
      forums</a>.
    hub_subtitle: Your open source data science platform, hosted
certificate:
  type: lets-encrypt
  acme_email: <SANITIZED>

Relevant issues:

Relevant PRs:

@nkaretnikov
Copy link

@aktech Is your M1 a laptop or a mini/server? How stable is it when you're running it? Is it useable at all? I have a 10-core M1 laptop with 32 gigs of RAM. Using the new docs/config above, I managed to deploy it by putting the domain in the hosts file (172.18.1.100 nebari.example.com) and installing/enabling docker-mac-net-connect. Then the laptop fans started spinning very fast and the cluster crashed and never recovered, which was after starting JupyterLab and attempting to build a single empty conda-store env. So it looks like it has limited usefulness on a laptop unless there's a way to reduce resource usage of the K8s cluster. Thoughts?

@marcelovilla
Copy link
Member

@nkaretnikov I just tried Amit's config now on a 8-core M1 with 32GB and was able to spin a small Jupyter server up and create a simple conda environment with numpy on it. I think the local cluster has already some memory restrictions because I was not able to spin a medium size server up.

image

@nkaretnikov
Copy link

OK, turns out there are a bunch of issues on Mac with Docker. So I think the stability/crashes/disconnects were solved by me updating everything to latest, but I also didn't test extensively:

  • I had to update Docker Desktop
  • I had to install K8s via Docker Desktop instead of using minikube with brew (it can be done via the settings in the UI)
  • I had to update macOS to latest because there were crashes in Virtual Machine Service, which are related to Linux kernel support. Before that, I also saw high CPU usage by this service when running Docker
  • I had to create this symlink because Docker no longer creates it and terraform cannot find the socket otherwise:
sudo ln -s "$HOME/.docker/run/docker.sock" /var/run/docker.sock

However, I still cannot get argo workflows working. These are disabled in the config provided by Amit. I needed them for my work, so I set them to true, but then I saw that the nebari-workflow-controller was in the failed state trying to pull the image, so I attempted to do (this image is available on quay):

default_images:
...
  nebari_workflow_controller: quay.io/nebari/nebari-workflow-controller:m1-image

But it's not possible to set this image via the config because the key is not supported. So I patched this Python file instead:

./lib/python3.10/site-packages/_nebari/constants.py:

DEFAULT_NEBARI_WORKFLOW_CONTROLLER_IMAGE_TAG = "m1-image"

This seemed to solve the problem with the image pull, but the workflows still immediately fail. I did check a bunch of logs in various pods, but I didn't find anything. Not sure where it's supposed to be logged if at all. It's still possible that the change above is not sufficient to use this image correctly. And yes, I did add relevant permissions for my user to be able to use argo in Keycloak.

If someone can get argo workflows working (scheduling a notebook via the JLab UI), let me know. But for now, I'm going to use the cloud again because it's not clear how much more work is needed here.

@aktech
Copy link
Member

aktech commented May 29, 2024

@aktech Is your M1 a laptop or a mini/server? How stable is it when you're running it? Is it useable at all? I have a 10-core M1 laptop with 32 gigs of RAM. Using the new docs/config above, I managed to deploy it by putting the domain in the hosts file (172.18.1.100 nebari.example.com) and installing/enabling docker-mac-net-connect. Then the laptop fans started spinning very fast and the cluster crashed and never recovered, which was after starting JupyterLab and attempting to build a single empty conda-store env. So it looks like it has limited usefulness on a laptop unless there's a way to reduce resource usage of the K8s cluster. Thoughts?

I have a M1 Pro laptop (32GB, 10 cores). It is very stable for me. I have conda environment built with packages required create apps. I never had a cluster crash. Do you have anything else running other than kind cluster?

I am using docker desktop 4.30.0 (149282)

@nkaretnikov
Copy link

@aktech Yeah, ignore the crash issue. I think it was resolved by updating everything to latest. I'm now more interested in how to get argo-workflows working locally. Have you tried that? They are disabled in your original config.

@aktech
Copy link
Member

aktech commented May 29, 2024

However, I still cannot get argo workflows working. These are disabled in the config provided by Amit. I needed them for my work, so I set them to true, but then I saw that the nebari-workflow-controller was in the failed state trying to pull the image, so I attempted to do (this image is available on quay):

I should have mentioned that, I didn't needed argo, so didn't bothered adding it to config.

@aktech
Copy link
Member

aktech commented May 29, 2024

@aktech Yeah, ignore the crash issue. I think it was resolved by updating everything to latest. I'm now more interested in how to get argo-workflows working locally. Have you tried that? They are disabled in your original config.

I can give it a try, let me try the m1-image you shared above.

Also, apologies for the late reply.

@aktech
Copy link
Member

aktech commented May 29, 2024

I deployed argo with the latest image 2024.6.1rc3, it deployed successfully. The notebook scheduling didn't wor. Looks like TLS verification failed while connecting to the API, which is expected as I am running it locally without proper certs. We would probably need to toggle TLS verification to avoid this. Is this what you saw as well?

Screenshot 2024-05-29 at 4 01 35 pm

@nkaretnikov
Copy link

I cannot recall at this point, but I agree that fixing the cert issue would be a good first step to resolving this.

@krassowski
Copy link
Member

I see the exact same certificate error, c-f nebari-dev/argo-jupyter-scheduler#10 (comment). I tried to disable SSL verification but only got me half way thought (the error message changed but did not disappear)

If anyone wants to pick up: nebari-dev/argo-jupyter-scheduler@main...krassowski:argo-jupyter-scheduler:km-papermill-error-2

@aktech
Copy link
Member

aktech commented May 29, 2024

That's a good starting point, thanks @krassowski do you recall what was the new error message?

@krassowski
Copy link
Member

See nebari-dev/argo-jupyter-scheduler#10 (comment) (but maybe it is me issue and I have something misconfigured - so maybe the changes linked above are sufficient - though we need to ensure to only apply them when running a local deployment, so possibly something controlled by an environment variable)

@aktech
Copy link
Member

aktech commented May 30, 2024

I have got the notebook scheduler working after setting up real valid certs in my local deployment. I can confirm it's certainly an ssl issue.

Completed Pods

Screenshot 2024-05-30 at 12 33 35 pm

Completed Job

Screenshot 2024-05-30 at 12 33 22 pm

@aktech
Copy link
Member

aktech commented May 30, 2024

I shared this in slack, I think it might be useful for others as well:

Local deployment with lets encrypt certs: (without certs some things do not work out of the box, like argo notebook scheduler, etc)

For anyone interested: . You can setup let's encrypt certs with DNS challenge locally (if you have a domain or can use a subdomain in quansight domains), this basically verifies that you control the domain, by using domain provider's API token, provided by you to let's encrypt. I think its definitely worth setting it up to get a full production environment locally. Here is the branch for setting it up, if anyone want to try: https://github.com/nebari-dev/nebari/compare/dns-challenge?expand=1 (You need to create an API token from cloudflare for the domain you're using)

We should consider adding the support natively. It's specially useful for Nebari deployments with non-public IP addresses (like inside a private network).

Screenshot 2024-05-30 at 12 40 56 pm

@marcelovilla
Copy link
Member

Thanks to @aktech we can now deploy on Mac! Should we keep this issue open or close it for now and open a specific one for windows?

@kcpevey
Copy link
Contributor

kcpevey commented Aug 1, 2024

@marcelovilla at this point, is there any special setup required to deploy locally on mac?

@marcelovilla
Copy link
Member

@kcpevey there are two things that need to be configured:

  1. We need to use a specific ARM image for Keycloack
  2. We need to setup https://github.com/chipmk/docker-mac-net-connect so the container IPs are exposed.

There's a PR outlining these two steps.

I'm happy to help you deploy it locally if you ever need to.

@marcelovilla marcelovilla changed the title [ENH] - Add local support for Mac & Windows [ENH] - Add local support for Mac Aug 2, 2024
@marcelovilla
Copy link
Member

I edited the title of this issue so it's only about local support for Mac. We can open another specific issue for Windows when/if we think it's relevant.

@kcpevey
Copy link
Contributor

kcpevey commented Aug 2, 2024

Thanks @marcelovilla I was just wanting to make sure we had documentation on the "how", so it looks like that's all taken care of :)

The docs are now published here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:developer-experience 👩🏻‍💻 area: docker 🐋 area: user experience 👩🏻‍💻 impact: medium 🟨 This item affects some users, not critical needs: investigation 🔍 Someone in the team needs to find the root cause and replicate this bug type: maintenance 🛠 Day-to-day maintenance tasks
Projects
Development

No branches or pull requests

9 participants