-
Notifications
You must be signed in to change notification settings - Fork 0
/
05-RStudio.Rmd
109 lines (82 loc) · 6.34 KB
/
05-RStudio.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# RStudio in Alliance Canada
*Note 1*: This refers to an experimental usage of RStudio for HPC. This approach
has no official support from the Alliance, and the responsability is solely to
the user. An official R environment is provided, with support, by the Alliance at
[this website](https://docs.alliancecan.ca/wiki/Technical_documentation).
*Note 2*: This approach is aimed to single node processes. If you want to
explore (or need) distributed processing, we recommend to use the supported
version, and speak with user groups and the Alliance support to help designing routines.
## Introduction
In this approach, we use a prestablished [RockeR](https://rocker-project.org/),
docker container from which we expand to our user needs. Binary versions of the
container are available at [Docker Hub](https://hub.docker.com/repository/docker/ricardobarroslourenco/rstudio_container/general), with the releases and their code available on [GitHub](https://github.com/MacRemoteSensing/rstudio_container/releases).
## Running an Apptainer HPC Container as an interactive session
Prior to be able to run RStudio, you need to have a suitable container already built
as a Singularity container, or build by yourself.
_From the cluster login node_ you can build it with the following steps (it is assumed that you are using [bash](https://en.wikipedia.org/wiki/Bash_(Unix_shell)) as a regular user - no sudo needed):
**1. Load Apptainer**:
```{bash, eval=FALSE}
module load apptainer
```
Pick up the most recent version of it (please note that Docker Hub may have
different tags, with versioning for each), and accept.
**2. Then run the build** (run this command at your _home_, _project_ or _scratch_ folder):
```{bash, eval=FALSE}
apptainer build rstudio_container_v1_0.sif docker://ricardobarroslourenco/rstudio_container:v1_0
```
You should have, at the end, a file named _rstudio_container_v1_0.sif_ in the
directory you ran the command. This is your Apptainer container.
**3. Create transient folders and files to be able to run your container** (these should be hosted at the same location as your container). Creating directories:
```{bash, eval=FALSE}
mkdir -p run var-lib-rstudio-server .rstudio
```
Then the sqlite configuration file:
```{bash, eval=FALSE}
printf 'provider=sqlite\ndirectory=/var/lib/rstudio-server\n' > database.conf
```
**4. Starting an Interactive Job on the Supercomputer**
```{bash, eval=FALSE}
salloc --time=0:30:0 --account=rrg-ggalex --cpus-per-task=8 --mem=0 [email protected] --mail-type=ALL
```
The setup of the flags is defined as this (more details available at the Alliance Documentation, as well as on the Slurm manual):
- *time*: time of the interactive session in HH:MM:SS;
- *account*: PI account at the Alliance Can project;
- *cpus-per-task*: Amount of cpus in this interactive session. Capped to the limit of the node;
- *mem*: amount of RAM you want to allocate (zero means all memory of the node);
- *mail-user*: e-mail address to send updates on the job allocation;
- *mail-type*: verbosity of the e-mail.
After running this, you probably will see your bash change to something like (in the Graham cluster):
```{bash, eval=FALSE}
(base) [lourenco@gra-login1 scratch]$ salloc --time=0:30:0 --account=rrg-ggalex --cpus-per-task=8 --mem=0 [email protected] --mail-type=ALL
salloc: Pending job allocation 6806097
salloc: job 6806097 queued and waiting for resources
salloc: job 6806097 has been allocated resources
salloc: Granted job allocation 6806097
salloc: Waiting for resource configuration
salloc: Nodes gra189 are ready for job
(base) [lourenco@gra189 scratch]$
```
Please note that **gra189** in this case is your machine name. You need to remember it later to use it in your ssh tunnel.
**5. Start the container in the newly allocated working node**
```{bash, eval=FALSE}
apptainer exec --bind run:/run,var-lib-rstudio-server:/var/lib/rstudio-server,database.conf:/etc/rstudio/database.conf,.rstudio:/home/rstudio/.rstudio/ rstudio_container_v1_0.sif /usr/lib/rstudio-server/bin/rserver --auth-none=1 --auth-pam-helper-path=pam-helper --server-user=$(whoami) --www-port=8787
```
This will launch an rstudio server as a singularity service. If you need, bash is
available as a terminal tab in that RStudio. To access the instance, you need
to do a ssh tunnel, to forward the cluster port in the working node, to your machine. For that, **you must to create a new, separate, ssh connection, and keep it open** (without
running any job on it). You can do as this (similar to what we described in the ssh section of CCPrimer, with a twist):
```{bash, eval=FALSE}
ssh -i ~/computecanada/computecanada-key -L 9102:graXXX:8787 [email protected]
```
Some details:
- *-i ~/computecanada/computecanada-key*: the *-i* flag calls your ssh key file;
- *-L 9102:graXXX:8787*: the *ssh tunnel* in fact. It forwards node port 8787 to your localhost at port 9102. Note that **you must replace graXXX with the node name of the machine assigned to you** after running *salloc*;
- *[email protected]*: your username, followed by the cluster url. Note that our lab research allocation is available in Graham for now.
After this, you should be able to access the rstudio server by using this url
in your browser:
```{bash, eval=FALSE}
localhost:9102
```
## Finishing your Running session
Do not forget saving your work in partitions that enable saving (ex.: `/home`, `/project`, `/scratch`), because **if you save in a folder within the container, it will not be saved**, after your session ends. Remember that `/scratch` is meant for temporary save, being automatically wiped by the cluster systems after 30 days.
Once saved, you can simply end your running sessions by either using `Ctrl + C` in the apptainer run session, and then calling `exit` on bash up to ending the `salloc` session, and another time, to relinquish the ssh session (remember to do this both in the bash session that runs the jobs, as well in the ssh tunnel). At the end you will only see the bash of your local machine. It is important to do this procedure, because otherwise it may be possible that the machine is still running, and consument the group Alliance Canada credits. Once you finish the slurm `salloc` session you will receive an e-mail (as you got one when the session started), telling that the session has finished (or relinquished, if your runnign time expired).