title | author |
---|---|
Interactive computing on a sci server |
Fatima Chami |
I need to do a random sampling of my dataset to estimate the distribution of sample mean. I need to use the random number generator function available in the Numpy library in Python. I need to test this function. I also need to check if there is a potential multithreading with Numpy array arithmetic e.g. dot product.
After completing this exercise I will be able to:
- execute a computing task on the sci machine from a command line
- monitor CPU and memory resources usage of my computing task
- understand the modules environemnt e.g. JASPY and software on JASMIN
- become aware of implicit multithreading and how to disable it
- learn about the capabilities and limitations of the scientific analysis servers
- Scientific analysis servers:
sci[1-8].jasmin.ac.uk
- Group workspace:
/gws/pw/j07/workshop
- Python example scripts are provided:
/gws/pw/j07/workshop/exercises/ex02/code/random-number-gen.py
/gws/pw/j07/workshop/exercises/ex02/code/dot-product-2arrays.py
- help documentation at https://help.jasmin.ac.uk
- SSH key & passphrase
- Terminal application or NX client
- A valid
jasmin-login
grant associated with your JASMIN account
You can follow this exercise by watching the videos below, or by following the text of this article, or a combination of both.
Task | |
Solutions & Discussion | |
Demonstration |
This is the outline of what you need to do. The recommended way of doing each step is covered in the "Cheat Sheet" but you may wish to try solving it for yourself first.
- Your starting point is on a JASMIN
login
server (see exercise 01) - Login to a JASMIN scientific analysis server from the login server
- Launch two terminal sessions
- Access a JASMIN login server on each terminal (see exercise 01)
- Choose a Sci server with the lowest load
- Login to the chosen sci server on each terminal
NOTE: The purpose of having two SSH terminal sessions running on the same sci server is to facilitate compute and monitoring. One terminal is for executing commands on the sci while the second terminal is for monitoring user processes (or editing a script)
- Execute the first Python example script on the Sci server
- Copy the first Python example script
random-number-gen.py
(shown in the JASMIN resources section) to your current working directory - Find out the software available on JASMIN via the module environment by executing the command
module avail
- Enable a Python environemnt via the module
jaspy
by executing the commandmodule add jaspy
- Execute the command
python random-number-gen.py
- Check the process ID (pid), state, memory and CPU usage on the monitoring terminal
- What is the CPU and memory used by the process during Run and Sleep state?
- Copy the first Python example script
- Monitor your processes on the sci machine
- Execute the Linux command
top -u <username>
- How many processes do you have?
- Sort all processes per CPU usage by executing
top
- To exit the monitoring tool
top
press the keyboard letterq
- Try another utility to list all your processes on the sci server
ps -aux | grep <username>
- Execute the Linux command
- Make changes to the Python example and re-execute it
- Open the Python script file in a text editor e.g. vim, emacs -See note below-
- Decrease the size of the random numbers
nran
from 1024 to 500 - Save the file and exit the text editor
- Execute
python random-number-gen.py
- Monitor and note the memory and CPU usage
- Compare the CPU and memory resources used to generate 1024 and 500 random numbers. What can you conclude?
NOTE: If you are not familiar with a text editor then please execute the following command to change the size of the random numbers
$ sed -i 's/nran = 1024/nran = 500/' random-number-gen.py
- Test for a potential multithreading
- Copy the second Python example script (shown in the JASMIN resources section) to your current working directory
- Remove the default JASPY environment (jaspy/3.10/r20220721) then enable the JASPY version (jaspy/3.7/20210320) for this task
- Execute the command
python dot-product-2arrays.py
- On the monitoring terminal execute the command
top -H -u <username>
orps -T -p <pid>
- How many threads the process spawned?
- Set the environment variable
OMP_NUM_THREADS
to one thread (or two if you wish) by executing the commandexport OMP_NUM_THREADS=1
- Re-execute
python dot-product-2arrays.py
- Did the setting
OMP_NUM_THREADS=1
disable multithreading? - Edit the script in a text editor and uncomment the line of code
#os.environ[“OMP_NUM_THREADS”] = “2"
and save the script - Rerun the Python script
- What can you conclude?
- Logout from the sci machine to end your SSH session on JASMIN sci
All too easy? Here are some questions to test your knowledge an understanding. You might find the answers by exploring the JASMIN Documentation
- Is there a limit on the number of processes running on the sci server at any given time by a user?
- What tasks are not suitable to run on the sci servers?
- What software is available via the module environment?
- How do you switch between different version of a software modulefile?
- What text editors are available on JASMIN?
- How to control and limit the number of threads?
- Can I install software on JASMIN?
You will be able to run and test a script on the scientific analysis servers. You will be able to monitor the resources used by your script. You can scale up by using the high-memory scientific sci[3,6,8].jasmin.ac.uk server for testing and then move your workflow to the batch cluster LOTUS.
What tasks I can not run on the Sci server?
- Do not run processes with execution time over two hours
- Do not run parallel applications e.g. MPI or OpenMP, high threaded codes on the Sci servers
- Do not run data transfer processes on the sci servers. Please use a transfer server e.g.
xfer3.jasmin.ac.uk
(Except when moving data from/work/scratch-pw[2,3]
to a GWS because/work/scratch-pw[2,3]
are not mounted on thexfer
servers) - Use the high memory scientific analysis servers
sci[3,6,8].jasmin.ac.uk
for testing high memory or multithreaded code - Only test multi-threaded code on the high memory servers and limit the number of threads
- It is necessary to consider moving a processing task to the batch system LOTUS when the resource demand is high, e.g. CPU, memory and processing time
Manage your processes on the Sci server
- If a process hangs, do not simply close the terminal window. Please contact the helpdesk and alert the team so that the process can be shut down. Otherwise hung processes build up and contribute to machine overloading.
- Many instances of an application e.g., Ipython, can impact the performance of the scientific servers.
- Monitor the CPU and memory resources of your processes
- You might use STOP and CONT to delay execution of a process until a less-busy time like this:
kill -STOP <pid>
,kill -CONT <pid>
or kill the processKill -TERM <pid>
- Do not “hog” IDL development licenses on the Sci servers. A limited number of these are available for development and compilation of IDL code which should then be run on LOTUS using IDL runtime licenses, of which there are many more
Usage of the storage:
- Do not use
/tmp
on the scientific servers. Using /tmp can cause the scientifiec analysis server to crash, resulting in loss of work. Set the environment variable TMPDIR to a temporary directory under a GWS area-export TMPDIR=/GWS-path/<your_project>/<your_username>/tmp
- Do not generate huge numbers of files (>1000) in a single directory
- The user home directory
/home/users/<username>
has a fixed quota of 100 GB - Manage your disk usage space regularly, e.g. delete unused files and archive files using the
tar
command
-
Login to a JASMIN scientific analysis server
- Login to the chosen sci server from a JASMIN login server
$ ssh -A <username>@sci<number>.jasmin.ac.uk
For example the user
train049
connects to sci4:$ ssh -A [email protected] [train049@sci4 ~]$
-
Execute the Python example script on the Sci server
- Copy the Python example script (shown in the JASMIN resources section) to your current working directory
$ cp /gws/pw/j07/workshop/exercises/ex02/code/random-number-gen.py .
- Enable a Python environemnt via the module
jaspy
by executing the commandmodule add jaspy
$ module add jaspy $ module list Currently Loaded Modulefiles: 1) jaspy/3.10/r20220721
- Execute the Python script
python random-number-gen.py
$ python random-number-gen.py Get ready to monitor PID 10108 1024 ======>>> random numbers Process 17203 is in sleep mode for 10 sec check its resources usage in this state Finished in 3.316218542982824 seconds
- Check the process ID
PID
, stateS
, memory%MEM%
and CPU%CPU
usage on the monitoring terminal from the interactivetop
command:
$ top -u <username>
-
Monitor your processes on the sci server
- Execute the Linux command
top -u <username>
$ top -u <username>
-
Which process is running? give the process ID
The process ID and its state are shown in the 1st column
PID
and 8th columnS
, respectively. Python process with PID 10108 is in Sleep state but still using 3.3 %MEM which is 1 GB of the physical/resident memoryRES
(6th column) -
Sort all processes per CPU usage Execute
top
$ top No screenshot to avoid displaying usernames of logged on users
- To exit the monitoring tool
top
press the keyboard letterq
- Execute the Linux command
-
Make changes to the Python example and re-execute it
- Open the Python script file in a text editor e.g. vim, emacs
$ vim random-number-gen.py
- Decrease the size of the random numbers
nran
from1024
to500
# Import Python libraries numpy, time and os import numpy as np import time import os # Get the Process Identifier of the current process PID=os.getpid() print("Process ID %s (PID) started" %(os.getpid())) print("Get ready to monitor PID", PID) # Wait for tsleep tsleep= 5 time.sleep(tsleep) # Number of random numbers to be generated nran = 500 # <- CHANGED # Generate a random number from the normal distribution t1 = time.perf_counter() result = [np.random.bytes(nran*nran) for x in range(nran)]
- Save the file and exit the text editor
vim
. Press "Esc" then
:wq
- Execute
python random-number-gen.py
$ python random-number-gen.py 500 ======>>> random numbers I am sleeping for 40 seconds so you can check the resources usage
- Monitor and note the memory and CPU usage
$top -u <username>
- What can you conclude?
The Python process used less CPU and memory to generate and storenran=500
in memory compared tonran=1024
-
Test for a potential multithreading
- Copy the second Python example script (shown in the JASMIN resources section) to your current working directory
$ cp /gws/pw/j07/workshop/exercises/ex02/code/dot-product-2arrays.py .
- Remove the default JASPY environment (jaspy/3.10/r20220721)then enable the JASPY version jaspy/3.7/20210320 for this task
$ module rm jaspy $ module add jaspy/3.7/r20210320
or switch module using the command
module switch jaspy/3.10/r20220721 jaspy/3.7/20210320
- Execute the command
python dot-product-2arrays.py
$ module add jaspy $ python dot-product-2arrays.py Process 14545 started Time with None threads: 15.869181 s Finished in 15.869181300047785 seconds
Note: the variable
echo $OMP_NUM_THREADS
is not set. Hence, the message above 'None threads'-
On the monitoring terminal execute the command
top -H -u <username>
orps -T -p <pid>
using theps -T -p <pid>
-
How many threads the process spawned? 7 threads with the master process e.g. PID 16566 as shown in the example above.
-
Set the environment variable
OMP_NUM_THREADS
to 1 by executing the commandexport OMP_NUM_THREADS=1
$ export OMP_NUM_THREADS=1 $ echo $OMP_NUM_THREADS 1
- Re-execute
python dot-product-2arrays.py
$ python dot-product-2arrays.py Process 1838 started Time with 1 threads: 14.571710 s Finished in 14.571709612966515 seconds
- Did the setting
OMP_NUM_THREADS=1
disable multithreading? Yes, the setting ofOMP_NUM_THREADS=1
disabled multithreading. There is only a single process and no threads - Edit the script in a text editor and uncomment the line of code
#os.environ[“OMP_NUM_THREADS”] = “2"
and save the script or use the command 'sed':
sed -i 's/#os.environ/os.environ/' dot-product-2arrays.py
Note: This setting must be done before numpy import -see screenshot above.
- Rerun the Python script
$ python dot-product-2arrays.py Process 10625 started Time with 2 threads: 13.889903 s Finished in 13.889903239090927 seconds
-
What can you conclude? There is one process PID 10625 and one threads with SPID 10871. The setting of
os.environ["OMP_NUM_THREADS"] = "2"
from the Python script overrides the setting from the SHELL. -
Logout from the sci machine to end your SSH session on JASMIN sci
$ logout Connection to sci<number>.jasmin.ac.uk closed.
- Is there a limit on the number of processes running on the sci server at any given time by a user?
There is no limit on the number of processes launched by a user on the scientific analysis servers. However, the user should limit the number of processes to a maximum of 2 as the sci server is shared by other users. Distribute the processing tasks across other Sci servers and consider moving the tasks to the batch cluster LOTUS.
- What tasks are not suitable to run on the Sci servers?
Long-running tasks and heavy processing, MPI parallel codes and multithreaded applications
- What software is available via module environment?
JASPY, jasmin_sci, Intel/GNU compiler, NetCDF library, NAG libray and IDL(Restricted)
- How do you switch between different version of a software module? Use
Here is an example of using the command "module switch" to enable a different version of Python:
$ module add jaspy
$ python --version
Python 3.7.6
Now, we'll switch to a different version of JASPY to enable Python 2.7
$ module switch jaspy/3.7 jaspy/2.7
$ python --version
Python 2.7.15
- What text editor or IDE are available on JASMIN?
Emacs, vim, nedit, geany and nano
- How to control and limit the number of threads?
Set the environment variable OMP_NUM_THREADS
by adding the following line of code in your .bashrc
file export OMP_NUM_THREADS=1
- Can I install software on JASMIN?
- You can install software under your home directory for your own use.
- If you need to share a software environment with other JASMIN users and the software licence allows it then enquire on "small files" Group Workspace by contacting the JASMIN support helpdesk.