-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
480 lines (460 loc) · 19.8 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<meta name="author" content="Aidan Heerdegen">
<meta name="dcterms.date" content="2018-10-17">
<title>Payu Features and Configs</title>
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="./reveal.js/css/reveal.css">
<style type="text/css">
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
</style>
<link rel="stylesheet" href="./reveal.js/css/theme/smallsky.css" id="theme">
<!-- Explicitly add zenburn for highlight support -->
<link rel="stylesheet" href="./reveal.js/lib/css/zenburn.css" id="theme">
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? './reveal.js/css/print/pdf.css' : './reveal.js/css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<!--[if lt IE 9]>
<script src="./reveal.js/lib/js/html5shiv.js"></script>
<![endif]-->
<base href="https://coecms-training.github.io/payucourse2/index.html">
</head>
<body>
<div class="reveal">
<!-- style="background: url(img/abstract-1780333_crop.png);
background-size: cover;"> -->
<header style="width: 10%; position: absolute; top: 2%; left: 2%;">
<img src="img/CE-logo-COL-clex-horiz-screen.png">
</header>
<footer style="font-size: 1pc; position: absolute; bottom: 2%; left: 2%;">
<code>https://.githu-training.io/payucourse2/index.html</code>
</footer>
<div class="slides">
<section id="title-slide">
<div class="reveal" style="text-align: right;">
<img src="img/CE-ARC-logo-lockup-1000.png"
style="background: none; border: none; box-shadow: none;
width: 30%"
alt="NCI">
</div>
<h1 class="title">Payu Features and Configs</h1>
<p class="subtitle" style="text-align: left;"><p>New features and running ACCESS-OM2 models</p></p>
<p class="author" style="text-align: right;">Aidan Heerdegen</p>
<p class="date" style="text-align: right;">17 October 2018</p>
</section>
<section><section id="outline" class="title-slide slide level1"><h1>Outline</h1></section><section class="slide level2">
<ul>
<li>payu recap</li>
<li>New payu features</li>
<li>Upcoming payu features</li>
<li>Running ACCESS-OM2 models</li>
</ul>
</section></section>
<section><section id="what-is-payu" class="title-slide slide level1"><h1>What is Payu?</h1></section><section id="payu-is" class="slide level2">
<h2>Payu is</h2>
<blockquote>
<p>... a python based "scientific workflow manager"</p>
</blockquote>
</section><section id="huh" class="slide level2">
<h2>Huh?</h2>
<p>That means it runs your model for you. In short:</p>
<ul>
<li>Setup model run directory (<code>work</code>)</li>
<li>Run the model</li>
<li>Move outputs/restarts to <code>archive</code> directory</li>
<li>Clean up the run directory</li>
<li>Run again (if instructed to do so)</li>
</ul>
</section></section>
<section id="new-features" class="title-slide slide level1"><h1>New features</h1></section>
<section><section id="fast-mom-collation" class="title-slide slide level1"><h1>Fast MOM collation</h1></section><section class="slide level2">
<p>There is a new mppnccombine in town ... and it's fast.</p>
<p>How fast?</p>
</section><section class="slide level2">
<p>Probably not faster than the Waco Kid</p>
<p><img data-src="img/waco_kid.gif" alt="image" /></p>
</section><section class="slide level2">
<aside class="notes">
<p>That is it collates tiled outputs to multiple files, which makes model input and output faster</p>
<p>Directly copies the compressed chunks from one file to another, skipping the decompress/recompress step</p>
<p>Example of the raison d'etre of CMS, to improve researcher productivity</p>
</aside>
<p>But seriously fast, hence the name <strong>mppnccombine-fast</strong></p>
<p><a href="https://github.com/coecms/mppnccombine-fast">https://github.com/coecms/mppnccombine-fast</a></p>
<ul>
<li>Written by Scott Wales</li>
<li>Collates any tiled FMS model output (MOM5/MOM6/GOLD)</li>
<li>Particularly fast with netCDF4 compressed data</li>
</ul>
</section><section id="requirements" class="slide level2">
<h2>Requirements</h2>
<ul>
<li>Copy <code>/short/public/aph502/mppnccombine-fast</code> to <code>/short/$PROJECT/$model/bin</code></li>
</ul>
<p>or</p>
<ul>
<li>Specify full path in <code>config.yaml</code></li>
<li>A version of <code>payu</code> of <code>0.10</code> or greater (<code>module load payu/0.10</code> on <code>raijin</code>)</li>
<li>Updated <code>config.yaml</code> syntax</li>
</ul>
</section><section id="old-syntax" class="slide level2">
<h2>Old Syntax</h2>
<pre class="yaml"><code>collate: true
collate_mem: 16GB
collate_queue: express
collate_ncpus: 4
collate_flags: -n4 -r</code></pre>
</section><section id="new-syntax" class="slide level2">
<h2>New syntax</h2>
<aside class="notes">
<p>Must specify mpi to use mppnccombine-fast. Minimum of 2 cpus, so can't use copyq ncpus per thread is ncpus / nthreads nthreads defaults to 1 ncpus defaults to 2 enable defaults to true Don't need to specify flags, enable or exe Fewer flags, as mppnccombine-fast has fewer options Don't get your hopes up Ryan, I haven't written restart collation, but when it is done, adding restart:true will collate restarts when the restart cleaning is done</p>
</aside>
<p>Replaces <code>collate_</code> options with dictionary</p>
<pre class="yaml"><code>collate:
enable: true
queue: express
memory: 4GB
walltime: 00:30:00
mpi: true
ncpus: 4
threads: 2
# flags: -v
# exe: /full/path/to/mppnccombine-fast
# restart: true</code></pre>
</section><section id="resource-requirements" class="slide level2">
<h2>Resource requirements</h2>
<aside class="notes">
<p>Memory use should only depend on chunksize in the compressed file, not on the overall size of the file being written, so resolution independent.</p>
<p>Unfortunately a memory leak bug in the underlying <code>HDF5</code> library means memory use will go up with the number of times data is written to a collated file. It is difficult to predict, but 2-4GB per thread has been the upper limit observed so far.</p>
<p>No speed-up for low resolution outputs (MPI overhead swamps fast run times). Quarter degree 10-50x faster. Tenth 100x faster.</p>
</aside>
<ul>
<li>Memory independent of resolution (<4GB per thread)</li>
<li>Walltime in minutes</li>
<li>No speed-up for low resolution (1 deg global model)</li>
<li>Minimum of 2 cpus</li>
</ul>
</section><section id="layout-affects-efficiency" class="slide level2">
<h2>Layout affects efficiency</h2>
<ul>
<li>Chunk sizes chosen automatically by netCDF4 and depend on tile size</li>
<li>Inconsistent tile sizes => inconsistent chunk sizes</li>
<li>Inconsistent chunk sizes makes program slow (has to uncompress/compress)</li>
<li>Make processor layout an integer divisor of grid</li>
<li>Make io_layout an integer divisor of layout</li>
</ul>
</section><section id="example" class="slide level2">
<h2>Example</h2>
<aside class="notes">
<p>Might think with io_layout would make consistent tile sizes, but the decomposition algorithm has already chosen some distribution of different tile sizes that cannot be evenly combined with io_layout Surprise to me to!</p>
</aside>
<p>Quarter degree MOM-SIS model: 1440 x 1080.</p>
<pre class="fortran"><code>layout = 64, 30
io_layout = 8, 6</code></pre>
<ul>
<li>1920 CPUs</li>
<li>Tiles are 22 x 36 and 23 x 36</li>
<li>IO tiles are 184 x 180, and 176 x 180</li>
<li>Slow for collating normal data and slow for untiled data (restarts and regional output)</li>
</ul>
</section><section id="improved-layout" class="slide level2">
<h2>Improved Layout</h2>
<p>Quarter degree MOM-SIS model: 1440 x 1080.</p>
<pre class="fortran"><code>layout = 60, 36
io_layout = 10, 6</code></pre>
<ul>
<li>2160 CPUs</li>
<li>Tiles are 24 x 10</li>
<li>IO tile is 144 x 180</li>
<li>Fast for collating tiled and untiled output</li>
</ul>
</section></section>
<section><section id="runs-per-submit" class="title-slide slide level1"><h1>Runs per submit</h1></section><section class="slide level2">
<aside class="notes">
<p>Don't agree with Marshall from first payu training session nf_limits -P project -q queue -n ncpus 48 hrs < 256 CPUs 255 < 24 hrs < 512 512 < 10 hrs < 1024</p>
</aside>
<ul>
<li>For low CPU count model: walltime up to 48 hours</li>
<li>Maximise walltime to reduce effect of queue time</li>
<li>A single 48 hour model run: What if crashes? Output non optimal?</li>
</ul>
</section><section id="runspersub" class="slide level2">
<h2>runspersub</h2>
<aside class="notes">
<p>Runspersub to the rescue! Being conservative with walltime in case some runs take > 2hr When last run crashes, only time of last run is lost</p>
</aside>
<pre class="yaml"><code>runspersub: 23
walltime: 48:00:00</code></pre>
<ul>
<li>Say model takes 2hr per run</li>
<li>Above config would run the model 23 times per PBS submit</li>
<li><code>walltime</code> must allow for <code>runspersub</code> runs of the model</li>
<li>If <code>walltime</code> exceeded last run will crash. <code>payu</code> will not resubmit</li>
</ul>
</section><section id="resubmission" class="slide level2">
<h2>Resubmission</h2>
<ul>
<li><code>payu</code> can resubmit itself with <code>-n</code> command line option</li>
<li>Using same model example if I wanted 50 runs of the model:</li>
</ul>
<pre class="bash"><code>payu run -n 50</code></pre>
<ul>
<li><code>runspersub: 1</code> => 50 PBS submissions, single run in each</li>
<li><code>runspersub: 23</code> => 3 PBS submissions, 23/23/4 model runs respectively</li>
</ul>
</section></section>
<section><section id="upcoming-features" class="title-slide slide level1"><h1>Upcoming features</h1></section><section id="file-tracking" class="slide level2">
<h2>File Tracking</h2>
<p>Wanted to do this for a long long time</p>
</section><section id="key-advantages" class="slide level2">
<h2>Key Advantages</h2>
<aside class="notes">
<p>Very early in this job, there was a "dodgy aerosol file" that had been used in some simulations, but hard/impossible to say which runs/files were affected</p>
</aside>
<ul>
<li>Track input files used for each model run</li>
<li>Reproducibly re-run previous experiment</li>
<li>Share experiments more easily as input files all specified</li>
<li>Flexibility with specifying path to input files</li>
<li>Identify all runs using specified file (possible future feature)</li>
</ul>
</section><section id="what-is-tracked" class="slide level2">
<h2>What is tracked?</h2>
<aside class="notes">
<p>Executables and inputs are not expected to change. Can specify a flag to either warn if they do and stop, or update manifest and continue</p>
<p>Restarts are the opposite, and by default are always expected to be different for each run, unless a flag is specified to reproduce a run, in which case any difference will flag an error and stop</p>
</aside>
<table>
<tbody>
<tr class="odd">
<td>Executables</td>
<td><code>mf_exe.yaml</code></td>
</tr>
<tr class="even">
<td>Inputs</td>
<td><code>mf_inputs.yaml</code></td>
</tr>
<tr class="odd">
<td>Restarts</td>
<td><code>mf_restarts.yaml</code></td>
</tr>
</tbody>
</table>
</section><section id="how-is-it-tracked" class="slide level2">
<h2>How is it tracked?</h2>
<ul>
<li>Uses yamanifest</li>
<li>Creates a <code>YaML</code> file</li>
<li>Each file (symlink) in <code>work</code> is dictionary key</li>
</ul>
</section><section id="example-1" class="slide level2">
<h2>Example</h2>
<aside class="notes">
<p>Note there is a header and a version string, can ignore All files in work are either config files (which are tracked by git) or symbolic links to files elsewhere on filesystem Issues with getting this working has to do with enforcing this for all models - can be difficult with hardwired paths etc</p>
</aside>
<ul>
<li><code>fullpath</code> is the actual location of the file</li>
<li>The hashes uniquely identify file</li>
</ul>
<pre class="yaml"><code>format: yamanifest
version: 1.0
---
work/fms_MOM_SIS.intel14:
fullpath: /short/v45/aph502/mom/bin/fms_MOM_SIS.intel14
hashes:
binhash: 74b079574d3160fd2024ca928f3097a0
md5: e10bf223ae2564701ae310d341bbe63b</code></pre>
</section><section id="hierachy-of-hashes" class="slide level2">
<h2>Hierachy of hashes</h2>
<aside class="notes">
<p>binhash uses datestamp and size combined with first 100MB of a file. Not guaranteed unique, but likely to detect if the file has changed</p>
</aside>
<ul>
<li>yamanifest supports multiple hashes => hierachy of hashes</li>
<li>Unique hashes (md5, sha128) take too long on large files</li>
<li>Fast hashing to check for file changes</li>
<li>Use unique hash check when necessary (or periodically?)</li>
</ul>
</section></section>
<section><section id="access-om2" class="title-slide slide level1"><h1>ACCESS-OM2</h1><p>ACCESS-OM2 model suite from 1 degree global to 0.1 degree global, Ocean/Ice model forced with atmospheric data and almost identical model parameters.</p>
<p>Single <code>access-om2</code> repository with all code and configs</p>
<p><a href="https://github.com/OceansAus/access-om2">https://github.com/OceansAus/access-om2</a></p></section><section id="components" class="slide level2">
<h2>Components</h2>
<table>
<tbody>
<tr class="odd">
<td>Ocean</td>
<td><code>MOM5</code></td>
</tr>
<tr class="even">
<td>Ice</td>
<td><code>CICE5</code></td>
</tr>
<tr class="odd">
<td>Atmosphere</td>
<td><code>libaccessom2</code></td>
</tr>
<tr class="even">
<td>Coupler</td>
<td><code>OASIS3-MCT</code></td>
</tr>
</tbody>
</table>
</section><section id="code" class="slide level2">
<h2>Code</h2>
<table>
<tbody>
<tr class="odd">
<td><code>MOM5</code></td>
<td><a href="https://github.com/mom-ocean/MOM5">https://github.com/mom-ocean/MOM5</a></td>
</tr>
<tr class="even">
<td><code>CICE5</code></td>
<td><a href="https://github.com/OceansAus/cice5">https://github.com/OceansAus/cice5</a></td>
</tr>
<tr class="odd">
<td><code>libaccessom2</code></td>
<td><a href="https://github.com/OceansAus/libaccessom2">https://github.com/OceansAus/libaccessom2</a></td>
</tr>
<tr class="even">
<td><code>OASIS3-MCT</code></td>
<td><a href="https://github.com/OceansAus/oasis3-mct">https://github.com/OceansAus/oasis3-mct</a></td>
</tr>
</tbody>
</table>
</section><section id="forcing-data" class="slide level2">
<h2>Forcing Data</h2>
<ul>
<li>Uses JRA55 reanalysis derivative product JRA55-do</li>
</ul>
<p><a href="http://jra.kishou.go.jp/JRA-55/index_en.html">http://jra.kishou.go.jp/JRA-55/index_en.html</a> <a href="https://www.sciencedirect.com/science/article/pii/S146350031830235X">https://www.sciencedirect.com/science/article/pii/S146350031830235X</a></p>
<ul>
<li>IAF (Interannual Forcing) : JRA55-do (1955-present)</li>
<li>RYF (Repeat Year Forcing) : RYF8485, RYF9091, RYF0304</li>
</ul>
</section><section id="access-om2-1" class="slide level2">
<h2>ACCESS-OM2</h2>
<ul>
<li>Nominal 1 degree global resolution</li>
<li>JRA55 RYF and IAF, and CORE-II configurations</li>
</ul>
<p><a href="https://github.com/OceansAus/1deg_jra55_iaf">https://github.com/OceansAus/1deg_jra55_iaf</a> <a href="https://github.com/OceansAus/1deg_jra55_ryf">https://github.com/OceansAus/1deg_jra55_ryf</a> <a href="https://github.com/OceansAus/1deg_core_nyf">https://github.com/OceansAus/1deg_core_nyf</a></p>
</section><section id="access-om2-025" class="slide level2">
<h2>ACCESS-OM2-025</h2>
<ul>
<li>Nominal 0.25 degree global resolution</li>
<li>JRA55 RYF and IAF configurations</li>
</ul>
<p><a href="https://github.com/OceansAus/025deg_jra55_ryf">https://github.com/OceansAus/025deg_jra55_ryf</a> <a href="https://github.com/OceansAus/025deg_jra55_iaf">https://github.com/OceansAus/025deg_jra55_iaf</a></p>
</section><section id="access-om2-01" class="slide level2">
<h2>ACCESS-OM2-01</h2>
<aside class="notes">
<p>Don't suggest anyone runs this without contacting COSIMA as runs are expensive and a bit tricky to get running on raijin.</p>
</aside>
<ul>
<li>Nominal 0.1 degree global resolution</li>
<li>JRA55 RYF and IAF configurations</li>
<li>Minimal JRA55 IAF configuration (fewer cores)</li>
</ul>
<p><a href="https://github.com/OceansAus/01deg_jra55_iaf">https://github.com/OceansAus/01deg_jra55_iaf</a> <a href="https://github.com/OceansAus/01deg_jra55_ryf">https://github.com/OceansAus/01deg_jra55_ryf</a> <a href="https://github.com/OceansAus/minimal_01deg_jra55_iaf">https://github.com/OceansAus/minimal_01deg_jra55_iaf</a></p>
</section><section id="running-an-access-om2-model" class="slide level2">
<h2>Running an ACCESS-OM2 model</h2>
<aside class="notes">
<p>Can run in a branch to keep config clean Can fork</p>
</aside>
<ul>
<li>Follow the Quick Start instructions in the ACCESS-OM2 Wiki on github</li>
</ul>
<p><a href="https://github.com/OceansAus/access-om2/wiki/Getting-started#quick-start">https://github.com/OceansAus/access-om2/wiki/Getting-started#quick-start</a></p>
<aside class="notes">
<p>All executables and Can fork</p>
</aside>
<p>Use the 1 deg JRA55 IAF configuration:</p>
<pre class="bash"><code>module load payu/0.10
git clone https://github.com/OceansAus/1deg_jra55_iaf
cd 1deg_jra55_iaf
payu run</code></pre>
</section><section class="slide level2">
<p>The PBS and platform specific options for <code>normalbw</code> queue</p>
<pre class="yaml"><code># PBS configuration
queue: normalbw
walltime: 1:00:00
jobname: 1deg_jra55_iaf
ncpus: 252
platform:
nodesize: 28
nodemem: 128</code></pre>
</section><section class="slide level2">
<p>The model options</p>
<pre class="yaml"><code># Model configuration
name: common
model: access-om2
input: /short/public/access-om2/input_2407a7bc/common_1deg_jra55
submodels:
- name: atmosphere
model: yatm
exe: /short/public/access-om2/bin/yatm_037e4b61.exe
input: /short/public/access-om2/input_2407a7bc/yatm_1deg
ncpus: 1
- name: ocean
model: mom
exe: /short/public/access-om2/bin/fms_ACCESS-OM_304fe837.x
input: /short/public/access-om2/input_2407a7bc/mom_1deg
ncpus: 216
- name: ice
model: cice5
exe: /short/public/access-om2/bin/cice_auscom_360x300_24p_5a56b59a.exe
input: /short/public/access-om2/input_2407a7bc/cice_1deg
ncpus: 24</code></pre>
</section><section class="slide level2">
<p>Miscellaneous options (including collation)</p>
<pre class="yaml"><code># Misc
collate: true
stacksize: unlimited
collate_walltime: 1:00:00
collate_exe: /short/public/access-om2/bin/mppnccombine
qsub_flags: -lother=hyperthread -W umask=027
# postscript: sync_output_to_gdata.sh</code></pre>
</section></section>
</div>
</div>
<script src="./reveal.js/lib/js/head.min.js"></script>
<script src="./reveal.js/js/reveal.js"></script>
<script>
// Full list of configuration options available at:
// https://github.com/hakimel/reveal.js#configuration
Reveal.initialize({
// Display the page number of the current slide
slideNumber: true,
// Push each slide change to the browser history
history: true,
// Optional reveal.js plugins
dependencies: [
{ src: './reveal.js/lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: './reveal.js/plugin/zoom-js/zoom.js', async: true },
{ src: './reveal.js/plugin/highlight/highlight.js',
async: true,
condition: function() {
return !!document.querySelector( 'pre code' ); },
callback: function() {
hljs.initHighlightingOnLoad(); }
},
{ src: './reveal.js/plugin/notes/notes.js', async: true }
]
});
</script>
</body>
</html>