-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindex.Rmd
142 lines (96 loc) · 6.72 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---
title: "HPC User Guide"
author: "Patrick Schratz, Jannes Muenchow"
date: "`r Sys.Date()`"
site: bookdown::bookdown_site
documentclass: book
bibliography: [book.bib]
biblio-style: apalike
link-citations: yes
# github-repo: rstudio/bookdown-demo
# description: "This is a minimal example of using the bookdown package to write a book. The output format for this example is bookdown::gitbook."
---
# Introduction {#intro}
Welcome to the user manual of the High-Performance-Cluster (HPC) of the GIScience group (University of Jena).
It is tailored towards R processing but can also be used with any other programming language.
**A short introduction to HPCs**
The big advantage of a HPC is that users can submit jobs to ONE machine which then distributes the work across multiple machines in the background.
Incoming processing requests (jobs) are handled by the scheduler (SLURM), taking away the work of queuing the job and the potential issue of clashing into jobs from other users.
Administration is simplified by provisioning all computing nodes with the same virtual image.
This way, maintenance tasks are reduced and differences between the machines are avoided.
Library management across nodes is is done via the [Spack](https://spack.io) package manager as this application allows for version-agnostic environment module installations.
The `$HOME` directory of every user is shared across all nodes, avoiding the need to keep data and scripts in sync across multiple machines.
**Before you start**:
- Working on a Linux server naturally requires a certain amount of familiarity with UNIX command-line shells and text editors.
There are dozens of Linux online tutorials which should help to get you started.^[For example, https://ryanstutorials.net/linuxtutorial/.]
Of course, there are also great books on how to use Linux such as @shotts_linux_2012, @sobell_practical_2010 and @ward_how_2015 all of which are freely available.
If you still get stuck, Google will certainly help you.
- Please add a SSH key pair to your account to be able to log in to the server without having to type your password every time.
This is especially useful since your password will consist of many letters and numbers (> 10) which you do not want to memorize.
See [this](https://help.github.com/articles/connecting-to-github-with-ssh/) guide if you have never worked with SSH keys before.
If you already use a SSH key pair on your machine, you can use `ssh-copy-id <username>@10.232.16.28` to copy your key to the server.
Afterwards you should be able to login via `ssh <username>@10.232.16.28` without being prompted for your password.
## Web Address
https://edi.geogr.uni-jena.de
IP: 10.232.16.28
## Hardware
The cluster consists of the following machines:
Group "**threadripper**":
- CPU: AMD Threadripper 2950X, 16-core, Hyperthreading support, 3.5 GHz - 4.4 GHz
- RAM: 126 GB DDR4
- Number of nodes: 4 c[0-2] (+ frontend)
- The "frontend" is only operating on 12 cores with 100 GB RAM
Group "**opteron**":
- CPU: AMD Opteron 6172, 48 cores, no Hyperthreading, 2.1 GHz
- RAM: 252 GB DDR3 (c5 only comes with 130 GB RAM)
- Number of nodes:
- 2 (c[3-4])
- 1 (c5)
The groups are reflected in the scheduler via the "partition" setting.
Group "threadripper" is about 3.5x faster than group "opteron".
## Software
The HPC was built following the installation guide provided by the [Open HPC](https://openhpc.community/) community (using the "Warewulf + Slurm" edition) and operates on a CentOS 7 base.
The scheduler that is used for queuing processing job requests is [SLURM](https://slurm.schedmd.com/).
Load monitoring is performed via [Ganglia](http://ganglia.sourceforge.net/).
A live view is accessible [here](http://edi.geogr.uni-jena.de/ganglia/?r=hour&cs=&ce=&m=load_one&s=by+name&c=&tab=m&vn=&hide-hf=false).
[Spack](https://spack.io/) is used as the package manager.
More detailed instructions on the scheduler and the package manager can be found in their respective chapters.
## Data storage
The _mars_ data server is mounted at `/home` and stores all the data.
Currently we have a capacity of 20 TB for all users combined.
Data can be stored directly under your `/home` directory.
## Accessing files from your local computer
It is recommended to mount the server via `sshfs` to your local machine.
Transfer speed ranges between 50 - 100 Mbit/s when you're in the office so you should be able to access files without a delay.
Accessing files from outside will be slower.
If you really run in trouble with transfer speed, you could directly connect to the _mars_ server.
Otherwise, the route is as follows: `<local> (sshfs) -> edi (nfs) -> mars`
### Unix
For Unix system, the following command can be used
```sh
sudo sshfs -o reconnect,idmap=user,transform_symlinks,identityFile=~/.ssh/id_rsa,allow_other,cache=yes,kernel_cache,compression=no,default_permissions,uid=1000,gid=100,umask=0 <username>@10.232.16.28:/ <local-directory>
```
The mount process is passwordless if you do it via SSH (i.e. via your `~/.ssh/id_rsa` key).
Note that the mount is actually performed by the root user, so you need to copy your SSH key to the root user: `cp ~/.ssh/id_rsa /root/.ssh/id_rsa`.
For convenience you can create an executable script that performs this action every time you need it.
```{block, type='rmdcaution'}
Auto-mount during boot via `fstab` is not recommended since sometimes the network is not yet up when the mount is executed.
This applies especially if you are not in the office but accessing the server from outside.
```
### Windows
Please install [sshfs-win](https://github.com/billziss-gh/sshfs-win) and follow the instructions.
## Accessing the HPC remotely (VPN)
A VPN connection is required to access the HPC remotely.
FSU uses the Cisco Anyconnect protocol for VPN purposes and recommends to use the ["Cisco Anyconnect Client"](https://www.uni-jena.de/VPN_Windows_Apple_Mobile).
This is a GUI which requires manual connect/disconnect actions.
In addition, the Cisco Anyconnect client is known to be slow and laggy (sorry no references here, just personal experience).
The open-source alternative [openconnect](https://gitlab.com/openconnect/openconnect) does a much better job - and comes with a powerful CLI implementation.
If a GUI is preferred, have a look [here](https://openconnect.github.io/openconnect-gui/) (looks like in Windows there is no way around the GUI?).
**Installation**
- macOS: `brew install openconnect`
- Windows: `choco install openconnect-gui` or downloading the [GUI](https://openconnect.github.io/openconnect-gui/) directly
**Usage**
On a UNIX-based system:
Start: `echo <PASSWORD> | sudo openconnect --user=<USER>@uni-jena.de --passwd-on-stdin --background vpn.uni-jena.de`
Stop: `sudo killall -SIGINT openconnect`
On Windows: Sorry, not using Windows - feel free to add this info here.