Skip to content

Commit

Permalink
reviewed and fixed page (#41)
Browse files Browse the repository at this point in the history
* reviewed and fixed page

* change lexers

* small changes

* Move proselint conf into seperate file, add more rule exceptions.

---------

Co-authored-by: cal <[email protected]>
  • Loading branch information
jennan and CallumWalley authored Dec 4, 2023
1 parent ec7c7ef commit cf2c869
Show file tree
Hide file tree
Showing 4 changed files with 44 additions and 54 deletions.
3 changes: 2 additions & 1 deletion .markdownlint.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
{
"MD013": false,
"MD033": false,
"MD038": false
"MD038": false,
"MD041": false
}
6 changes: 6 additions & 0 deletions .proselint.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"checks":{
"hyperbole.misc": false,
"typography.exclamation": false,
"typography.symbols": false
}}
11 changes: 7 additions & 4 deletions checks/run_proselint.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,21 @@

import sys
import proselint
from proselint import config

from proselint import config, tools


files = sys.argv[1:]

ret_code = 0
proselint.config.default["checks"]["hyperbole.misc"] = False

# Load defaults from config.
config_custom = tools.load_options(config_file_path=".proselint.json", conf_default=config.default)

print(config_custom)

for file in files:
with open(file, "r", encoding="utf8") as f:
for notice in proselint.tools.lint(f.read(), config=config.default):
for notice in proselint.tools.lint(f.read(), config=config_custom):
if (notice[7] == "error"):
ret_code = 1
print(f"::{notice[7]} file={file},line={notice[2]},col={notice[3]},endLine={notice[2]+notice[6]},title={notice[0]}::'{notice[1]}'")
Expand Down
78 changes: 29 additions & 49 deletions docs/Getting_Started/Next_Steps/Finding_Job_Efficiency.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,12 @@ hidden: false
position: 5
tags:
- slurm
title: Finding Job Efficiency
vote_count: 8
vote_sum: 8
zendesk_article_id: 360000903776
zendesk_section_id: 360000189716
---



[//]: <> (REMOVE ME IF PAGE VALIDATED)
[//]: <> (vvvvvvvvvvvvvvvvvvvv)
!!! warning
This page has been automatically migrated and may contain formatting errors.
[//]: <> (^^^^^^^^^^^^^^^^^^^^)
[//]: <> (REMOVE ME IF PAGE VALIDATED)

## On Job Completion

It is good practice to have a look at the resources your job used on
Expand All @@ -29,13 +19,13 @@ future.
Once your job has finished check the relevant details using the tools:
`nn_seff` or `sacct` For example:

**nn\_seff**
### Using `nn_seff`

``` sl
```bash
nn_seff 30479534
```

``` sl
```txt
Job ID: 1936245
Cluster: mahuika
User/Group: user/group
Expand All @@ -48,30 +38,25 @@ CPU Efficiency: 98.55% 00:01:08 of 00:01:09 core-walltime
Mem Efficiency: 10.84% 111.00 MB of 1.00 GB
```

Notice that the CPU efficiency was high but the memory efficiency was
very low and consideration should be given to reducing memory requests
for similar jobs.  If in doubt, please contact <[email protected]> for
guidance.


Notice that the CPU efficiency was high but the memory efficiency was low and consideration should be given to reducing memory requests
for similar jobs.  If in doubt, please contact [[email protected]](mailto:[email protected]) for guidance.

**sacct**
### Using `sacct`

``` sl
```bash
sacct --format="JobID,JobName,Elapsed,AveCPU,MinCPU,TotalCPU,Alloc,NTask,MaxRSS,State" -j <jobid>
```
!!! prerequisite Tip

!!! tip
*If you want to make this your default* `sacct` *setting, run;*
``` sl
```bash
echo 'export SACCT_FORMAT="JobID,JobName,Elapsed,AveCPU,MinCPU,TotalCPU,Alloc%2,NTask%2,MaxRSS,State"' >> ~/.bash_profile
source ~/.bash_profile
```

------------------------------------------------------------------------

Below is an output for reference:

``` sl
```txt
JobID JobName Elapsed AveCPU MinCPU TotalCPU AllocCPUS NTasks MaxRSS State
------------ ---------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ----------
3007056 rfm_ANSYS+ 00:27:07 03:35:55 16 COMPLETED
Expand All @@ -82,9 +67,7 @@ Below is an output for reference:
*All of the adjustments below still allow for a degree of variation.
There may be factors you have not accounted for.*

------------------------------------------------------------------------

### **Walltime**
#### Walltime

From the `Elapsed` field we may want to update our next run to have a
more appropriate walltime.
Expand All @@ -93,7 +76,7 @@ more appropriate walltime.
#SBATCH --time=00:40:00
```

### **Memory**
#### Memory

The `MaxRSS` field shows the maximum memory used by each of the job
steps, so in this case 13 GB. For our next run we may want to set:
Expand All @@ -102,7 +85,7 @@ steps, so in this case 13 GB. For our next run we may want to set:
#SBATCH --mem=15G
```

### **CPU's**
#### CPUs

`TotalCPU` is the number of computation hours, in the best case scenario
the computation hours would be equal to `Elapsed` x `AllocCPUS`.
Expand All @@ -116,8 +99,6 @@ however bear in mind there are other factors that affect CPU efficiency.
#SBATCH --cpus-per-task=10
```



Note: When using sacct to determine the amount of memory your job used -
in order to reduce memory wastage - please keep in mind that Slurm
reports the figure as RSS (Resident Set Size) when in fact the metric
Expand Down Expand Up @@ -153,19 +134,20 @@ If 'nodelist' is not one of the fields in the output of your `sacct` or
`squeue` commands you can find the node a job is running on using the
command; `squeue -h -o %N -j <jobid>` The node will look something like
`wbn123` on Mahuika or `nid00123` on Māui
!!! prerequisite Note

!!! note
If your job is using MPI it may be running on multiple nodes

### htop 
### Using `htop`

``` sl
```bash
ssh -t wbn175 htop -u $USER
```

If it is your first time connecting to that particular node, you may be
prompted:

``` sl
```txt
The authenticity of host can't be established 
Are you sure you want to continue connecting (yes/no)?
```
Expand All @@ -185,15 +167,16 @@ Processes in green can be ignored

**S** - State, what the thread is currently doing.

- R - Running
- S - Sleeping, waiting on another thread to finish.
- D - Sleeping
- Any other letter - Something has gone wrong!
- R - Running
- S - Sleeping, waiting on another thread to finish.
- D - Sleeping
- Any other letter - Something has gone wrong!

**CPU%** - Percentage CPU utilisation.

**MEM% **Percentage Memory utilisation.
!!! prerequisite Warning
**MEM%** - Percentage Memory utilisation.

!!! warning
If the job finishes, or is killed you will be kicked off the node. If
htop freezes, type `reset` to clear your terminal.

Expand All @@ -204,21 +187,18 @@ time* the CPUs are in use. This is not enough to get a picture of
overall job efficiency, as required CPU time *may vary by number of
CPU*s.

The only way to get the full context, is to compare walltime performance
between jobs at different scale. See [Job
Scaling](../../Getting_Started/Next_Steps/Job_Scaling_Ascertaining_job_dimensions.md)
for more details.
The only way to get the full context, is to compare walltime performance between jobs at different scale. See [Job Scaling](../../Getting_Started/Next_Steps/Job_Scaling_Ascertaining_job_dimensions.md) for more details.

### Example

![qdyn\_eff.png](../../assets/images/Finding_Job_Efficiency_0.png)

From the above plot of CPU efficiency, you might decide a 5% reduction
of CPU efficiency is acceptable and scale your job up to 18 CPU cores . 
of CPU efficiency is acceptable and scale your job up to 18 CPU cores .

![qdyn\_walltime.png](../../assets/images/Finding_Job_Efficiency_1.png)

However, when looking at a plot of walltime it becomes apparent that
performance gains per CPU added drop significantly after 4 CPUs, and in
fact absolute performance losses (negative returns) are seen after 8
CPUs.
CPUs.

0 comments on commit cf2c869

Please sign in to comment.