-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathscipy16.html
484 lines (422 loc) · 16.8 KB
/
scipy16.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>JupyterHub as an Interactive Supercomputing Gateway</title>
<meta name="description" content="JupyterHub as an Interactive Supercomputing Gateway">
<meta name="author" content="Michael Milligan">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
<link rel="stylesheet" href="css/reveal.css">
<link rel="stylesheet" href="css/theme/league.css" id="theme">
<!-- Theme used for syntax highlighting of code -->
<link rel="stylesheet" href="lib/css/zenburn.css">
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? 'css/print/pdf.css' : 'css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<!--[if lt IE 9]>
<script src="lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<!-- Any section element inside of this container is displayed as a slide -->
<div class="slides">
<section>
<h2>JupyterHub as an Interactive Supercomputing Gateway</h2>
<h4>Interactive HPC with batchspawner, profiles, SSO and more!</h4>
<p>
<small>
Michael Milligan<br/>
Head of Application Development<br/>
Minnesota Supercomputing Institute
</small>
</p>
</section>
<!--
(while (re-search-forward "^* " nil t)
(replace-match "<section>\n" nil nil)
(re-search-forward "^$" nil t)
(replace-match "</section>\n" nil nil)
)
-->
<section>
<h3>A couple of years ago we decided to work towards supporting<br/>
<span style="color: red">Interactive HPC</span><br/> as a first-class service.</h3>
</section>
<section>
Partly because I kept putting on these cool demos using IPython
notebooks, IPython Parallel, and similar tools.
</section>
<section>
<p>But the user experience wasn't great</p>
<p>We had to teach users to ...</p>
<p align="left"><span class="fragment fade-up" style="padding-left: 4em">set up remote desktops</span>
<br/><span class="fragment fade-right" style="padding-left: 9em">craft and submit job scripts</span>
<br/><span class="fragment fade-right" style="padding-left: 14em">create ssh tunnels</span>
<br/><span class="fragment fade-right" style="padding-left: 20em">etc :(</span>
</p>
<p class="fragment">If only there was a way to run them as a service...</p>
</section>
<section>
<h1>JupyterHub</h1>
<img data-src="mymedia/arch.png" height=300>
<p>
<small style="align: left">
<a href="https://jhamrick.github.io/2015-07-17-jupyterday/">jhamrick.github.io/2015-07-17-jupyterday/</a>
</small>
<aside class="notes"> Not sure when
I discovered the fact that
Jupyterhub had a configurable proxy
attached (maybe mentioned at scipy
2015?), but that was probably the
element that got the ball rolling
for me
</aside>
</section>
<section>
<div style="float: left"><img height=500 data-src="mymedia/snail.jpg"></div>
<small style="padding-top: 200px; width: 12em">I considered putting a picture of a
lightbulb here, <br/>but that's
cliche, I couldn't bring myself to
do it.</small>
</section>
<section>
<h1>batchspawner</h1>
<p>started out as a hack to generalize Michael Gilbert's
<a href="https://github.com/mkgilbert/slurmspawner">slurmspawner</a>[1],
with liberal inspiration from Andrea Zonca's blogging
about <a href="http://zonca.github.io/2015/04/jupyterhub-hpc.html">Jupyterhub
on supercomputers</a>[2]</p>
<p><a href="https://github.com/mbmilligan/batchspawner">github.com/mbmilligan/batchspawner</a></p>
<p><small><ul>
<li>[1] https://github.com/mkgilbert/slurmspawner</li>
<li>[2] http://zonca.github.io/2015/04/jupyterhub-hpc.html</li>
</ul></small></p>
</section>
<section>
Result turned out to be extremely general
<pre style="font-size: 0.46em"><code class="hljs" data-trim contenteditable>
class TorqueSpawner(BatchSpawnerRegexStates):
batch_script = Unicode("""#!/bin/sh
#PBS -q {queue}@{host}
...etc...
{cmd}
""", config=True)
batch_submit_cmd = Unicode("sudo -E -u {username} qsub", config=True)
batch_query_cmd = Unicode("sudo -E -u {username} qstat -x {job_id}", config=True)
batch_cancel_cmd = Unicode("sudo -E -u {username} qdel {job_id}", config=True)
state_pending_re = Unicode(r"<job_state>[QH]</job_state>", config=True)
state_running_re = Unicode(r"<job_state>R</job_state>", config=True)
state_exechost_re = Unicode(r"<exec_host>((?:[\w_-]+\.?)+)/\d+", config=True)
</code></pre>
<aside class="notes">the actual
TorqueSpawner class ended up being nothing but a job script template
plus six default trait values
* also have SlurmSpawner and GridengineSpawner classes that are only
slightly longer
</aside>
</section>
<section>
But wait, there's more!
</section>
<section>
At MSI we have more than one cluster, multiple queues and node types
-- is there an easier way to expose this than running multiple
Jupyterhub instances?
<aside class="notes">
Jupyterhub doesn't have any configurable point to modify spawner
creation itself (to inject trait values)
<br/>just dumping values in the spawner_options variable is ugly,
any class that wants to use configuration
has to be modified to look for it, they all have to agree on
how to get it
</aside>
</section>
<section>
<h1>WrapSpawner</h1>
Realized that we'd abstracted all the characteristics of batch systems and job
types into configurable traits, and:
<ul>
<li>trait values can be set arbitrarily at instance creation time
without cooperation from class being configured</li>
<li>the Spawner Options Form feature accepts user input during
spawning</li>
<li>the Spawner public interface is pretty simple</li>
</ul>
</section>
<section>
<h1>WrapSpawner</h1>
emulates the public interface of a Spawner, but:
<ul>
<li>doesn't instantiate a real spawner right away</li>
<li>when no spawner exists, just pretends to be an idle spawner</li>
<li>when told to start or recreated from saved state, instantiates a
child class of configurable type and configures it with trait
values from a dict</li>
<li>when a child spawner exists, proxies calls to the public interface
through to the child</li>
</ul>
</section>
<section>
Cute trick, but...
</section>
<section>
<section>
<h1>Profiles</h1>
<p>Builds on WrapSpawner to intervene in starting the notebook server</p>
<p><img src="mymedia/hub-start.png" /></p>
</section>
<section>
<p>Auto-generates a spawner options form drop-down listing profiles</p>
<p><img src="mymedia/hub-spawn.png" /></p>
</section>
<section>
<p>User gets to select a profile and run their notebook server with
different options or even totally different spawner classes</p>
<p><img src="mymedia/hub-options.png" /></p>
</section>
<section>
<p>Site admin simply configures a list of "profiles":</p>
<pre style="font-size: 0.45em"><code class="hljs" data-trim>
c.ProfilesSpawner.profiles = [
( "Local server", 'local', 'jupyterhub.spawner.LocalProcessSpawner',
{'ip':'0.0.0.0'} ),
('Mesabi - 2 cores, 4 GB', 'mesabi2c4g12h', 'batchspawner.TorqueSpawner',
dict(req_nprocs='2', req_queue='mesabi', req_memory='4gb')),
('Mesabi - 12 cores, 128 GB', 'mesabi128gb', 'batchspawner.TorqueSpawner',
dict(req_nprocs='12', req_queue='ram256g', req_memory='125gb')),
('Another Cluster - 8 hours', 'small', 'batchspawner.TorqueSpawner',
dict(req_nprocs='2', req_host='labhost.xyz.edu', req_queue='small',
req_runtime='8:00:00', req_memory='4gb', state_exechost_exp='')),
]
</code></pre>
</section>
</section>
<section>
<p>Selection is saved as per-user state, so different users can run
their notebooks with totally different spawner classes concurrently</p>
<p style="padding-top:5ex;"><small>
WrapSpawner and ProfilesSpawner are included in the
<a href="https://github.com/mbmilligan/batchspawner">batchspawner</a>
repo
</small>
</p>
</section>
<section>
Next up -- authentication
</section>
<section>
<section>
<h1>Authentication</h1>
<ul>
<li>at an site like ours, it's a bit
deprecated to have a public web app that just takes usernames / passwords</li>
<li>for security, we conceivably might not even
<b>have</b> normal users on the web-facing server <br />(i.e. in a minimal
VM or container)</li>
</ul>
</section>
<section>
<h1>Authentication</h1>
<p>At U of M we have a site-wide Shibboleth provider, and MSI has a web
SSO system that wraps that</p>
<p align="left">Result:</p>
<ul>
<li>easy to throw up an Apache server that understands our
auth system</li>
<li>annoying and error-prone to write an endpoint to
do the same thing</li>
</ul>
</section>
<section>
Final configuration looks like:
<ul>
<li>Apache module talks to our auth setup, puts data in request headers<br/>
<pre>
Remote_User: username
Remote_User_Data: stuff;morestuff</pre>
</li>
<li>Apache terminates SSL (with official UofM certs)</li>
<li><a href="https://github.com/cwaldbieser/jhub_remote_user_authenticator/">remoteuser
Authenticator class</a>[1] simply reads those headers</li>
<li>Apache proxies resulting requests to configurable-http-proxy</li>
</ul>
<p><small>[1] github.com/cwaldbieser/jhub_remote_user_authenticator/</small></p>
</section>
<section>
Exact proxy rules to make everything work are a tad arcane:
<pre style="font-size: 0.4em"><code class="hljs" data-trim>
<LocationMatch "/jupyter/(user/[^/]*)/(api/kernels/[^/]+/channels|terminals/websocket)(.*)">
ProxyPassMatch ws://localhost:8999/jupyter/$1/$2$3
ProxyPassReverse ws://localhost:8999
</LocationMatch>
<Location "/jupyter">
ProxyPass http://localhost:8999/jupyter
ProxyPassReverse http://localhost:8999/jupyter
Header edit Origin <%= @fqdn %> localhost:8999
RequestHeader edit Origin <%= @fqdn %> localhost:8999
Header edit Referer <%= @fqdn %> localhost:8999
RequestHeader edit Referer <%= @fqdn %> localhost:8999
</Location>
</code></pre>
</section>
</section>
<section>
<h1>Deployment and testing</h1>
</section>
<section>
<section>
<p>MSI is currently a CentOS 6 shop under Puppet management</p>
<ul>
<li>puppet python classes needed some hacking to properly get python 3 installed</li>
<li>install paths and hostnames are handled via puppet templates</li>
<li>created separate classes for Apache config, Jupyterhub installation, and service startup scripts.</li>
</ul>
</section>
<section>
<p>This is a production service, so reproducible and testable deployment is mandatory!</p>
<ul>
<li>Jupyterhub install is modelled as a directory containing a Python virtualenv and a npm module tree</li>
<li>code is pulled from github or PyPI/npm repos via pip/npm</li>
<li>Init scripts and Apache config modelled as Puppet classes (unique per node)</li>
</ul>
<p>Result: easy to create test server or parallel test install, guaranteed consistent system state</p>
<p><small>would love to hook into jupyterhub's test framework to get CI testing
during development - not there yet</small></p>
</section>
</section>
<section>
<h1>Monitoring</h1>
<p>We log everything to Splunk: just dump log lines to syslog</p>
<ul>
<li>Hub and proxy can generate very thorough logging output</li>
<li>Splunk digests it all and runs various saved searches (nightly, weekly)</li>
</ul>
</section>
<section>
<p>Example: I get detailed nightly reports of users, runtimes, failures, job types</p>
<aside class="notes">* failures are mostly timeouts from queue being busy (1-2% of attempts won't start
in under 5 minutes)
</aside>
<table>
<tbody style="font-size: 0.45em">
<tr>
<th >username</th>
<th >count</th>
<th >Failures</th>
<th >avg(starttime)</th>
<th >max(starttime)</th>
<th >latest(queue)</th>
<th >latest(mem)</th>
<th >latest(walltime)</th>
<th >latest(nodes)</th>
</tr>
<tr valign="top">
<td >balc0022</td>
<td >26</td>
<td >1</td>
<td >46.565667</td>
<td >117.146</td>
<td >ram256g@mesabim3</td>
<td >125gb</td>
<td >4:00:00</td>
<td >1:ppn=12</td>
</tr>
<tr valign="top">
<td >jj</td>
<td >4</td>
<td >2</td>
<td ></td>
<td ></td>
<td >mesabi@mesabim3</td>
<td >4gb</td>
<td >8:00:00</td>
<td >1:ppn=2</td>
</tr>
<tr valign="top">
<td >yminato</td>
<td >10</td>
<td >0</td>
<td >34.327000</td>
<td >86.152</td>
<td >mesabi@mesabim3</td>
<td >4gb</td>
<td >8:00:00</td>
<td >1:ppn=2</td>
</tr>
</tbody>
</table>
</section>
<section>
<p>Hub admin dashboard is also a handy way to see at a glance who's using the site</p>
<img data-src="mymedia/hub-users.png" />
<p>5-10 regular users, ~50 have used at some point since public announcement
in May</p>
</section>
<section>
<h1>Next steps</h1>
<p>Building an all-purpose interactive supercomputing hub...</p>
<ul>
<li>software integration: support more kernels, automatically do kernel setup
when we upgrade packages (looking at you, R)</li>
<li>user handholding: autopopulate user configuration as appropriate to system</li>
<li>environment management: environment modules system assumes a shell</li>
</ul>
</section>
<section>
<h1>Next Steps</h1>
<ul>
<li>ipyparallel integration: Jupyter extension enabling, plus config issues</li>
<li>multiple spawners-per-user</li>
<li>easier integration of site local web template customization</li>
</ul>
</section>
<section>
<h1>Next Steps</h1>
<p>What other interesting things can we tunnel through the proxy? Services
architecture might enable new applications, esp in the visualization space</p>
<ul>
<li>great interest in GPU-accelerated viz</li>
<li>can we integrate a noVNC-style remote desktop similar to terminal app?</li>
</ul>
</section>
<section style="text-align: left;">
<h1>THANKS!</h1>
<p>
- <a href="https://github.com/jupyterhub/batchspawner">github.com/jupyterhub/batchspawner</a> <br />
- <a href="https://www.msi.umn.edu">www.msi.umn.edu</a> <br />
- <a href="mailto:[email protected]">[email protected]</a>
</p>
</section>
</div>
</div>
<script src="lib/js/head.min.js"></script>
<script src="js/reveal.js"></script>
<script>
// More info https://github.com/hakimel/reveal.js#configuration
Reveal.initialize({
controls: true,
progress: true,
history: true,
center: true,
transition: 'slide', // none/fade/slide/convex/concave/zoom
// More info https://github.com/hakimel/reveal.js#dependencies
dependencies: [
{ src: 'lib/js/classList.js', condition: function() { return !document.body.classList; } },
//{ src: 'plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
//{ src: 'plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: 'plugin/highlight/highlight.js', async: true, callback: function() { hljs.initHighlightingOnLoad(); } },
{ src: 'plugin/zoom-js/zoom.js', async: true },
{ src: 'plugin/notes/notes.js', async: true }
]
});
</script>
</body>
</html>