From cf2c869d95918691f100ae1cf5580324437b8def Mon Sep 17 00:00:00 2001 From: Maxime Rio Date: Mon, 4 Dec 2023 16:17:18 +1300 Subject: [PATCH] reviewed and fixed page (#41) * reviewed and fixed page * change lexers * small changes * Move proselint conf into seperate file, add more rule exceptions. --------- Co-authored-by: cal --- .markdownlint.json | 3 +- .proselint.json | 6 ++ checks/run_proselint.py | 11 ++- .../Next_Steps/Finding_Job_Efficiency.md | 78 +++++++------------ 4 files changed, 44 insertions(+), 54 deletions(-) create mode 100644 .proselint.json diff --git a/.markdownlint.json b/.markdownlint.json index acaf6e18a..5a2216341 100644 --- a/.markdownlint.json +++ b/.markdownlint.json @@ -1,5 +1,6 @@ { "MD013": false, "MD033": false, - "MD038": false + "MD038": false, + "MD041": false } \ No newline at end of file diff --git a/.proselint.json b/.proselint.json new file mode 100644 index 000000000..9bbdcc5b6 --- /dev/null +++ b/.proselint.json @@ -0,0 +1,6 @@ +{ + "checks":{ + "hyperbole.misc": false, + "typography.exclamation": false, + "typography.symbols": false +}} \ No newline at end of file diff --git a/checks/run_proselint.py b/checks/run_proselint.py index 8d69efb18..b89114eab 100755 --- a/checks/run_proselint.py +++ b/checks/run_proselint.py @@ -6,18 +6,21 @@ import sys import proselint -from proselint import config - +from proselint import config, tools files = sys.argv[1:] ret_code = 0 -proselint.config.default["checks"]["hyperbole.misc"] = False + +# Load defaults from config. +config_custom = tools.load_options(config_file_path=".proselint.json", conf_default=config.default) + +print(config_custom) for file in files: with open(file, "r", encoding="utf8") as f: - for notice in proselint.tools.lint(f.read(), config=config.default): + for notice in proselint.tools.lint(f.read(), config=config_custom): if (notice[7] == "error"): ret_code = 1 print(f"::{notice[7]} file={file},line={notice[2]},col={notice[3]},endLine={notice[2]+notice[6]},title={notice[0]}::'{notice[1]}'") diff --git a/docs/Getting_Started/Next_Steps/Finding_Job_Efficiency.md b/docs/Getting_Started/Next_Steps/Finding_Job_Efficiency.md index 6952b6ed1..d9321ca7d 100644 --- a/docs/Getting_Started/Next_Steps/Finding_Job_Efficiency.md +++ b/docs/Getting_Started/Next_Steps/Finding_Job_Efficiency.md @@ -4,22 +4,12 @@ hidden: false position: 5 tags: - slurm -title: Finding Job Efficiency vote_count: 8 vote_sum: 8 zendesk_article_id: 360000903776 zendesk_section_id: 360000189716 --- - - -[//]: <> (REMOVE ME IF PAGE VALIDATED) -[//]: <> (vvvvvvvvvvvvvvvvvvvv) -!!! warning - This page has been automatically migrated and may contain formatting errors. -[//]: <> (^^^^^^^^^^^^^^^^^^^^) -[//]: <> (REMOVE ME IF PAGE VALIDATED) - ## On Job Completion It is good practice to have a look at the resources your job used on @@ -29,13 +19,13 @@ future. Once your job has finished check the relevant details using the tools: `nn_seff` or `sacct` For example: -**nn\_seff** +### Using `nn_seff` -``` sl +```bash nn_seff 30479534 ``` -``` sl +```txt Job ID: 1936245 Cluster: mahuika User/Group: user/group @@ -48,30 +38,25 @@ CPU Efficiency: 98.55% 00:01:08 of 00:01:09 core-walltime Mem Efficiency: 10.84% 111.00 MB of 1.00 GB ``` -Notice that the CPU efficiency was high but the memory efficiency was -very low and consideration should be given to reducing memory requests -for similar jobs.  If in doubt, please contact for -guidance. - -  +Notice that the CPU efficiency was high but the memory efficiency was low and consideration should be given to reducing memory requests +for similar jobs.  If in doubt, please contact [support@nesi.org.nz](mailto:support@nesi.org.nz) for guidance. -**sacct** +### Using `sacct` -``` sl +```bash sacct --format="JobID,JobName,Elapsed,AveCPU,MinCPU,TotalCPU,Alloc,NTask,MaxRSS,State" -j ``` -!!! prerequisite Tip + +!!! tip *If you want to make this your default* `sacct` *setting, run;* - ``` sl + ```bash echo 'export SACCT_FORMAT="JobID,JobName,Elapsed,AveCPU,MinCPU,TotalCPU,Alloc%2,NTask%2,MaxRSS,State"' >> ~/.bash_profile source ~/.bash_profile ``` ------------------------------------------------------------------------- - Below is an output for reference: -``` sl +```txt JobID JobName Elapsed AveCPU MinCPU TotalCPU AllocCPUS NTasks MaxRSS State ------------ ---------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ---------- 3007056 rfm_ANSYS+ 00:27:07 03:35:55 16 COMPLETED @@ -82,9 +67,7 @@ Below is an output for reference: *All of the adjustments below still allow for a degree of variation. There may be factors you have not accounted for.* ------------------------------------------------------------------------- - -### **Walltime** +#### Walltime From the `Elapsed` field we may want to update our next run to have a more appropriate walltime. @@ -93,7 +76,7 @@ more appropriate walltime. #SBATCH --time=00:40:00 ``` -### **Memory** +#### Memory The `MaxRSS` field shows the maximum memory used by each of the job steps, so in this case 13 GB. For our next run we may want to set: @@ -102,7 +85,7 @@ steps, so in this case 13 GB. For our next run we may want to set: #SBATCH --mem=15G ``` -### **CPU's** +#### CPUs `TotalCPU` is the number of computation hours, in the best case scenario the computation hours would be equal to `Elapsed` x `AllocCPUS`. @@ -116,8 +99,6 @@ however bear in mind there are other factors that affect CPU efficiency. #SBATCH --cpus-per-task=10 ``` -  - Note: When using sacct to determine the amount of memory your job used - in order to reduce memory wastage - please keep in mind that Slurm reports the figure as RSS (Resident Set Size) when in fact the metric @@ -153,19 +134,20 @@ If 'nodelist' is not one of the fields in the output of your `sacct` or `squeue` commands you can find the node a job is running on using the command; `squeue -h -o %N -j ` The node will look something like `wbn123` on Mahuika or `nid00123` on Māui -!!! prerequisite Note + +!!! note If your job is using MPI it may be running on multiple nodes -### htop  +### Using `htop` -``` sl +```bash ssh -t wbn175 htop -u $USER ``` If it is your first time connecting to that particular node, you may be prompted: -``` sl +```txt The authenticity of host can't be established  Are you sure you want to continue connecting (yes/no)? ``` @@ -185,15 +167,16 @@ Processes in green can be ignored **S** - State, what the thread is currently doing. -- R - Running -- S - Sleeping, waiting on another thread to finish. -- D - Sleeping -- Any other letter - Something has gone wrong! +- R - Running +- S - Sleeping, waiting on another thread to finish. +- D - Sleeping +- Any other letter - Something has gone wrong! **CPU%** - Percentage CPU utilisation. -**MEM% **Percentage Memory utilisation. -!!! prerequisite Warning +**MEM%** - Percentage Memory utilisation. + +!!! warning If the job finishes, or is killed you will be kicked off the node. If htop freezes, type `reset` to clear your terminal. @@ -204,21 +187,18 @@ time* the CPUs are in use. This is not enough to get a picture of overall job efficiency, as required CPU time *may vary by number of CPU*s. -The only way to get the full context, is to compare walltime performance -between jobs at different scale. See [Job -Scaling](../../Getting_Started/Next_Steps/Job_Scaling_Ascertaining_job_dimensions.md) -for more details. +The only way to get the full context, is to compare walltime performance between jobs at different scale. See [Job Scaling](../../Getting_Started/Next_Steps/Job_Scaling_Ascertaining_job_dimensions.md) for more details. ### Example ![qdyn\_eff.png](../../assets/images/Finding_Job_Efficiency_0.png) From the above plot of CPU efficiency, you might decide a 5% reduction -of CPU efficiency is acceptable and scale your job up to 18 CPU cores .  +of CPU efficiency is acceptable and scale your job up to 18 CPU cores . ![qdyn\_walltime.png](../../assets/images/Finding_Job_Efficiency_1.png) However, when looking at a plot of walltime it becomes apparent that performance gains per CPU added drop significantly after 4 CPUs, and in fact absolute performance losses (negative returns) are seen after 8 -CPUs. \ No newline at end of file +CPUs.