Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RStudio not inheriting gh-scoped-creds env vars #5551

Closed
ryanlovett opened this issue Feb 20, 2024 · 11 comments · Fixed by #5566
Closed

RStudio not inheriting gh-scoped-creds env vars #5551

ryanlovett opened this issue Feb 20, 2024 · 11 comments · Fixed by #5566

Comments

@ryanlovett
Copy link
Collaborator

ryanlovett commented Feb 20, 2024

Originally posted by @cboettig in #5515 (comment)

@balajialg ah, ok we did discover one wrinkle here. When students are using the RStudio interface, the GH_SCOPED_CREDS_CLIENT_ID environmental variable and GH_SCOPED_CREDS_APP_URL aren't visible on the terminal or the R session because RStudio insists on ignoring global env vars unless they are echoed into Renviron (e.g. into $R_HOME/etc/Renviron.site). Is there a way to do this?

@ryanlovett
Copy link
Collaborator Author

Here is one general approach without making it specific to gh-scoped-creds:

  1. Create a configmap from the gh-scoped-creds vars
  2. Mount the configmap into the singleuser pod at somewhere like /etc/Renviron.site.d/gh-scoped-creds.env
  3. In Renviron.site, run something like:
# Get a list of files in the directory
env_dir <- "/etc/Renviron.site.d/"
env_files <- list.files(env_dir, full.names = TRUE)

# Iterate over each file
for (env_file in env_files) {
    cat("Loading environment variables from:", env_file, "\n")
    
    # Read and parse the environment file
    readRenviron(env_file)
}

This is code in Renviron.site, but it doesn't violate the spirit of the file I think. This way we could add any number of different env vars without having to alter the setup code each time.

I'll think about it a little more though. The configmap would be redundant with the extra env vars, so maybe there's a way to consolidate.

@ryanlovett
Copy link
Collaborator Author

More spitballing...

In hub/templates/gh-scoped-creds.yaml we'd have:

{{ if .Values.ghScopedCreds.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
  name: gh-scoped-creds
data:
  GH_SCOPED_CREDS_CLIENT_ID: {{ .Values.ghScopedCreds.clientID | quote }}
  GH_SCOPED_CREDS_APP_URL: {{ .Values.ghScopedCreds.appURL  | quote }}
{{ end }}

This is similar to how we define NFS server config per hub.

We'd set the default in hub/values.yaml:

ghScopedCreds:
  enabled: false

And we'd prepare hub-specific config at deployments/deployment_name/config/common.yaml:

ghScopedCreds:
  enabled: true
  clientID: ...client ID...
  appURL: ...app url...

Then in singleuser.extraEnv we'd export the configmap as env vars:

    - name: GH_SCOPED_CREDS_CLIENT_ID
      valueFrom:
        configMapKeyRef:
          name: gh-scoped-creds
          key: GH_SCOPED_CREDS_CLIENT_ID
    - name: GH_SCOPED_CREDS_APP_URL
      valueFrom:
        configMapKeyRef:
          name: gh-scoped-creds
          key: GH_SCOPED_CREDS_APP_URL

And in singleuser we make the configmap available as a file:

  extraVolumes:
    - name: gh-scoped-creds
      configMap:
        name: gh-scoped-creds
  extraVolumeMounts:
    - name: gh-scoped-creds
      mountPath: /etc/R/Renviron.d/gh-scoped-creds.env
      subPath: gh-scoped-creds.env
      readOnly: true

This lets us set the gh-scoped-creds key/values once in per-deployment config, and it will both export them in the environment, and make the vars available via file for RStudio to pick up. All at the expense of more yaml up front. The alternative is to add the env vars within files to the singleuser image, which involves rebuilds for small changes.

I wonder if we can sweep up the latter two sections above into the template or in some other conditional way, since they'd be boilerplate whenever ghScopedCreds is enabled.

@shaneknapp Thoughts? I'll test this with manual hubploys before I make a PR since I've never created a template before. I'll need to modify /etc/R/Renviron.site with a separate PR as described above.

@balajialg Once I'm confident that the env vars are being passed to RStudio correctly, I'll make a PR that defines the variables for dlab.

@ryanlovett
Copy link
Collaborator Author

The alternative is to add the env vars within files to the singleuser image, which involves rebuilds for small changes.

Another alternative is to use singleuser.extraFiles. This is simpler, but we'd have to set the variables twice: once in extraEnv and once in extraFiles. (along with the code addition to Renviron.site)

singleuser:
  extraFiles:
    gh-scoped-creds:
      mountPath: /etc/R/Renviron.d/gh-scoped-creds.r
      stringData: |
        GH_SCOPED_CREDS_APP_URL=...
        GH_SCOPED_CREDS_CLIENT_ID=...
      mode: 0644

I have a feeling this is the right choice, although the configmap method would be interesting if it could be streamlined.

ryanlovett added a commit to ryanlovett/datahub that referenced this issue Feb 20, 2024
This should enable R and RStudio to read dynamically created files. Part of berkeley-dsep-infra#5551.
@cboettig
Copy link

@ryanlovett my post-doc suggested to me -- would it be possible to install gh command line utility (e.g. via apt-get)? It seems to handle device authentication in much the same way, but would sidestep the need to propagate the GH_SCOPED_CREDS_CLIENT_ID I think? Maybe it is not as preferable, though either option seems better to me than having students manually generate and manage tokens.

@ryanlovett
Copy link
Collaborator Author

@cboettig The GH_SCOPED_CREDS_ vars are now available on dlab hub from within RStudio. I need to clean up some setup variables, but let me know how gh-scoped-creds works for you.

@balajialg
Copy link
Contributor

balajialg commented Feb 22, 2024

@ryanlovett I tried testing gh-scoped-creds from R Studio's terminal in D Lab hub, I entered gh-scoped-creds in R Studio's terminal (as per instructions in https://github.com/jupyterhub/gh-scoped-creds) and got the following error message

(base) jovyan@jupyter-balajialwar:~$ gh-scoped-creds
Traceback (most recent call last):
  File "/srv/conda/bin/gh-scoped-creds", line 8, in <module>
    sys.exit(main())
  File "/srv/conda/lib/python3.9/site-packages/gh_scoped_creds/__init__.py", line 90, in main
    access_token, expires_in = do_authenticate_device_flow(args.client_id, in_jupyter)
  File "/srv/conda/lib/python3.9/site-packages/gh_scoped_creds/__init__.py", line 31, in do_authenticate_device_flow
    raise ValueError(f"Got error response from GitHub: {verification_resp}")
ValueError: Got error response from GitHub: {'error': 'Not Found'}

I invoked gh-scoped-creds in lab and it seems to work fine (ref: snapshot). What am I doing wrong while invoking gh-scoped-creds in R Studio?

import gh_scoped_creds
%ghscopedcreds

image

@ryanlovett
Copy link
Collaborator Author

The client ID and app URL are reversed. Preparing a fix...

@ryanlovett
Copy link
Collaborator Author

@balajialg and I discovered over slack that any open RStudio Terminals will need to be restarted. This is true even if RStudio was launched in a previous jupyterhub session.

@cboettig
Copy link

looks good for me too on dlab-staging 🎉

Will this make it to the main r.datahub / datahub?

btw, Yuvi and I dug into the official gh client a bit more, the authentication model looks a lot looser (credentials are long-lived and grant access to all user and org repos, even though user can set scopes). Some discussion in cli/cli#5924. Out of my expertise here but gh-scoped-creds seems to have a substantially better security profile for use on JupyterHubs.

@ryanlovett
Copy link
Collaborator Author

@cboettig Okay, great. Yes, we can make the vars available on the other hubs in the same way. I need to add this config setup to our docs too.

@ryanlovett
Copy link
Collaborator Author

@cboettig When it comes to the other hubs which serve larger communities, there may need to be more than one pair of variables set. We could set the env vars based on course enrollments, although even then there could be conflicts. It isn't a problem now because most courses/communities don't take advantage of the tool, but we'll need to sort out how to support it before there is a conflict.

ryanlovett added a commit to ryanlovett/datahub that referenced this issue Feb 23, 2024
This should enable R and RStudio to read dynamically created files. Part of berkeley-dsep-infra#5551.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants