forked from stanfordnqp/maxwell-b
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
100 lines (71 loc) · 3.8 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Maxwell is a multi-GPU implementation of finite-difference frequency domain solver.
This code is intended to be used in conjunction with SPINS (github.com/stanfordnqp/spins-b).
Overview
========
Maxwell is implemented as a server to which SPINS can send simulations to be run.
This allows the actual simulation server (i.e. where the GPUs are) to be located separately
from where the rest of the optimization code is running, though it is recommended to keep
SPINS and Maxwell on the same machine if possible.
At its core, running Maxwell involves running two separate services:
1. A webserver `maxwell-server/webserver.py` that manages sending and receiving simulation data over HTTP.
2. A simserver `maxwell-server/simserver.py` that manages running the simulations.
Both services must be up and running for Maxwell to function properly.
Maxwell can run in the follow methods:
1) Use the Dockerfile provided. This is the preferred mechanism as it creates an isolated environment for Maxwell.
2) Manually launching the webserver and simserver. This allows for most fine-grain control.
Docker
======
We have Dockerized the Maxwell solver to make solver maintenance easier.
In addition to CUDA Toolkit, all other Maxwell dependencies are listed in the Dockerfile.
Installation
------------
1. Install Docker (http://docker.com).
2. Install CUDA 10.0 Toolkit (https://developer.nvidia.com/cuda-10.0-download-archive).
3. Install NVIDIA-Docker (https://github.com/NVIDIA/nvidia-docker).
Usage
-----
The Dockerfile is contained the root Maxwell directory.
We have also provided a script to build and launch the Docker container:
$ ./run_docker
To change the number of GPUs used per simulation, edit the `run_docker` script and set `NGPUS` to the desired value.
Docker Quick Reference
----------------------
To list all running containers,
$ docker ps ls
To kill a container,
$ docker kill [container-name-or-id]
To clean up all containers (dead containers still take up disks space),
$ docker system prune --volumes
To examine the container state, launch an interactive bash session:
$ docker exec -it [container-name-or-id] bash
Manual Installation
===================
Maxwell can be manually installed. Follow the installation procedure listed in
`Dockerfile` for the installation procedure. See `./start_maxwell` for an
example of how to launch Maxwell manually.
Options
=======
1. Work directory: Maxwell requires a folder to store temporary data.
This location is specified by the environment variable `MAXWELL_SERVER_FILES`
that must be set for Maxwell to run. Maxwell must have permissions to read
and write to this directory.
2. Port number: The webserver by default runs on port 9041. This can be changed
by changing the argument passed to `webserver.py`
(e.g. `python webserver.py 9042). If using Docker, change the `PORT` variable
in `./run_docker`. Note that the port number change must be reflected in
`spins` as well in order for this to work.
3. Number of GPUs per solve: By default, Maxwell will attempt to use 1 GPU
per simulation. For larger simulations, it may be beneficial to use multiple
GPUs per simulation. This can be used by changing the `NGPUS` value.
Troubleshooting
===============
1. Check the Maxwell server log files.
Maxwell consists of two separate servers: `webserver.py` manages sending and receiving data
and `simserver.py` manages running the actual simulations. Both servers save logs named `webserver.log` and
`simserver.log` respectively.
2. Check the individual simulation log files.
Every simulation maintains its own log file. This can be found under `$MAXWELL_SERVER_FILES` directory.
By default, the Docker sets this location to be `/mnt/maxwell-server-files` in the container.
Acknowledgements
================
This code is primarily based off of https://github.com/JesseLu/maxwell-solver.