-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turing joining the Binder Federation: Part 2! #1154
Comments
For the subdomain: create a new issue in the team-compass repo (like jupyterhub/team-compass#203, we don't have a template/procedure for this yet :-/). To actually execute the change we need Chris or Min. With the issue we can create a paper trail and officially decide to add the subdomain. You will also need a domain for the jupyterhub, do you want that as hub.turing.mybinder.org (GKE style) or will the Turing hub have its own domain (OVH style)? Something to discuss in the subdomain issue. Deployment: mirroring the OVH setup would be the way I'd go. So a new |
Cool, will open the issue. I think hub.turing.mybinder.org will be fine. Done in jupyterhub/team-compass#205
Sure, I'll try and do that at some point today. |
If you need a domain to test the setup with before we have the "final" details for the cluster let me know and I can assign a throw away subdomain from wtte.ch. If it is convenient to have a domain that can be updated more quickly than mybinder.org which requires someone in a different timezone. Or you register your own domain to host throwaway stuff :D |
Keybase account created and verified with GitHub 👍 |
Update:
|
Service Principal received! Will deploy the cluster soon. |
What is a resource group? Is it a azure name for a kubernetes concept (namespaces)? OR a azure cloud thing? Completely selfish suggestion: do you have time for a tour of (very!) basic Azure stuff during the team meeting? I'd reciprocate with a tour of the Google cloud UI, buttons and CLI commands. |
It's an azure cloud thing- a way of grouping resources (compute, storage, network, etc). |
Yes, a Resource Group is just a label. Computationally means nothing, but allows you to group together resources that are related. (Here, "related" means that I, as a human, know that these things are being used for the same conceptual project.) Yes, I'm happy to give a tour during the team meeting, we could maybe do a specific zoom call for this too so there's more time for questions? |
(Just a small note to say 😻 😻 😻) Should we add notes about comms etc to this issue? Or keep this technical and make a new issue to drum up lots of excitement 😉 ? |
Thank you! 💖 I think keep this one technical and a second one for comms 😄 |
A new meeting would be nice but also tricky because we'd have to find a timeslot for it. Depending on how much is on the agenda for the next meeting I'd be happy to spend 20-30min of the meeting to listen and ask a few questions. When I wrote my earlier comment I was thinking of watching you setup a kubernetes cluster, install something on it, look at the logs, do something else, done. Something to take away the feeling of "oh wow, so many buttons and it all has different names to Google cloud. Ok maybe I need to block off a few hours to just figure out where I am." New issue for comms sounds good. |
It's a lot of work and requires self-confidence, but if you're up for it you could record a screencast on your own and upload it to e.g. youtube? Could also be linked from the docs. |
We could have a 1-to-1 zoom call if you wanted, I may also be able to do a screencast at some point. But tbh, my usual Azure workflow is having the Azure CLI installed locally and running stuff from my terminal. Deploying the k8s cluster will be very similar to the JupyterHub docs, but I'll probably do it with autoscaling. There's also the docs I (try to) keep updated in the hub23-deploy repo. I spend more time looking at |
Ok, that is already super useful ("using CLI most of the time, hardly ever click"). Let's see how we are doing for time at the meeting and if there is interest but I'd be happy to show people around https://console.cloud.google.com/. Back to discussing "Turing joins the federation" :D |
Last comment on this topic is that I added it to the agenda for the meeting 🎉 Back to the proper topic! Turing is switching its subscription backend (which is more important if you're interested in billing than interacting with resources), so I think I will migrate the subscription before deploying the cluster. It's quite a lengthy process - took 6 hours to migrate a single VM 😱- so doing that before we get a load of resources set up will probably be easier. |
I finally managed to deploy a cluster! I'm going to do some tests with a basic BinderHub set-up before I properly integrate it. Lots of stuff has been migrating on Azure recently so I want to check it all still works. |
I'm experimenting with multiple nodepools: https://docs.microsoft.com/en-gb/azure/aks/use-multiple-node-pools Other useful docs: https://docs.microsoft.com/en-us/cli/azure/ext/aks-preview/aks/nodepool?view=azure-cli-latest |
Where I'm currently at with the Turing federation cluster. Running
Which means it's looking for:
in What is that and how do I get one? |
For that matter, how come we have matomo as a top level key but it's not listed in the chart requirements? Where does this dependency come from? |
Hmmm, I believe that Matomo was planned to be used instead of Google Analytics (maybe @yuvipanda set it up?) but I don't believe we are actively deploying it...somebody correct me if I'm wrong! |
It comes from https://github.com/jupyterhub/mybinder.org-deploy/tree/master/mybinder/templates/matomo. Along with all the custom stuff in https://github.com/jupyterhub/mybinder.org-deploy/tree/master/mybinder/templates. We do have it deployed (https://mybinder.org/matomo/index.php) and collecting data. I was hoping to remove Google Analytics to give our users more privacy (See #725 for more info). I'm not super involved anymore, so I understand if folks wanna remove it and keep a hard dependency on Google Analytics instead. |
Thanks everyone! I don't mind if we keep it or scrap it, but I need to know how to set it up for the Turing cluster so I can remove it as a blocker. I'm going to try generating an auth_token here and see if that's enough. |
So one thing that seems to work was just leaving the I'm now very close to having BinderHub installed on the Turing cluster, except |
Actually, all the pods are running except for the binder pod itself. Binder pod
|
I think if we have matomo, we can just run it on the main cluster instead of doing that per cluster. Similar to our analytics stuff. How does that feel? |
@yuvipanda This sounds perfect! I do think we need to have a refactor of the configs (as per the discussion here) so that GKE-specific stuff doesn't present a blocker to other new federation members. I'd like someone who's a bit more familiar with what's what in all the various yaml files to help me on that though. So I don't break anything! 😂 |
The plan looks good. Agree that we want to keep the domains separate. I'd get the PR merged and cluster running, then slowly step up the The thing I'd look out for is errors related to the container registry as the traffic increases. |
What version of BinderHub is running on https://turing.mybinder.org/? It doesn't look like the latest. |
Looks like outbound egress isn't restricted to these ports: mybinder.org-deploy/mybinder/values.yaml Lines 30 to 40 in 105c474
|
#1203 is the PR with config from which turing is deployed (manually). |
@manics I think this deserves it's own issue as that wasn't part of the config that I edited and is, therefore, perhaps a problem across all clusters? |
I'm just attempting
|
The NetworkPolicy was added last year #699 so it should be included. |
Exactly, if it was added last year and is not effective then that's a separate issue to me incorporating the Turing into the federation? Or are you saying that they're not working on the Turing cluster but are on others? (I've just managed to set the Turing cluster on fire so can't test this right now.) |
@betatim no I've never set up grafana before, how do I go about retrieving it? |
Current bugs:
|
I solved this by re-deploying with new A records and new secrets.
I'm not sure if this is happening because the WiFi at the Turing is terrible this week (we're running a data study group and have a lot of people here using interwebs), it seems pretty variable as to whether the grafana pods switching over causes
@manics The hub is now at newhub.hub23.turing.ac.uk and the certificates should now be real. Can we check again if this is still an issue? If so, what do we need to do to solve this? |
I can still ssh out of `https://newbinder.hub23.turing.ac.uk/
|
Looks like the policies are created, next thing is to verify that the cluster implements them. |
I will try and get hold of @trallard today |
I think those are security rules which are independent from the K8s rules. It's the equivalent of a "physical" firewall operating at the network level. Then the K8s network policies are in addition to these, and they're implemented at the software level inside each Kubernetes VM. Either can be used to restrict network traffic, but obviously only the K8s network policies will be managed through the helm chart deployment. |
I think this IP address is now blocked as expected: 51.124.8.42 I may have to tear the Turing cluster down and redeploy with a vnet. |
Redeployed cluster with a virtual network to solve the unrestricted pod issue. Currently re-installing BinderHub. |
Meant to post this comment here: #1203 (comment) Currently having certmanager issues. I get a certificate for the hub but not the binder page. |
Using debug commands found here, I learned that the cluster has no challenge resources. cert-manager issue could be related to this issue: cert-manager/cert-manager#1745 Output of cert-manager logs shows following message:
I can't find anything called |
Fixed the naming issue. Still no luck with lets encrypt though. Currently have a solver issue. |
Got let's encrypt working!!!! 🎉 🎉 🎉 |
Clarifying what I need to do for grafana:
|
MERGED! |
The proposal we wrote in #1124 was accepted! We now have an Azure subscription with $10k to deploy a cluster on to 🎉 So this issue is documenting the next steps we'll be taking.
TODOs
I'm going to try and keep the naming conventions similar between the Azure and GKE clusters where possible.
Open Questions
I'll keep this updated as more things occur to me 😄
cc: @KirstieJane
The text was updated successfully, but these errors were encountered: