Skip to content

Commit

Permalink
Merge branch 'MultiResLBMOpt' of https://github.com/Autodesk/Neon int…
Browse files Browse the repository at this point in the history
…o develop
  • Loading branch information
Ahdhn committed Apr 10, 2024
2 parents 82e4c75 + 6057df8 commit 0b7b258
Show file tree
Hide file tree
Showing 4 changed files with 121 additions and 112 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
![Neon logo](docs/logo/neonDarkLogo.jpg "Neon")

# For information about the GPU LBM Grid Refinement work (IPDPS 2024), go to this [README](/apps/lbmMultiRes/README.md)

Neon is a research framework for programming multi-device systems maintained by [Autodesk Research](https://www.autodesk.com/research/overview). Neon's goal is to automatically transform user sequential code into, for example, a scalable multi-GPU execution.

To reach its goal, Neon takes a domain-specific approach based on the parallel skeleton philosophy (a.k.a parallel patterns). Neon provides a set of domain-specific and programmable patterns that users compose through a sequential programming model to author their applications. Then, thanks to the knowledge of the domain, the patterns and their composition, Neon automatically optimizes the sequential code into an execution optimized for multi-device systems.
Expand Down
107 changes: 107 additions & 0 deletions apps/lbmMultiRes/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Overview

We outline here the complete guide on how to reproduce the results presented in the [Optimized GPU Implementation of Grid Refinement in Lattice Boltzmann Method](http://escholarship.org/uc/item/0x86w4w1) paper. We also outline how to use the code to run other user-defined examples along with areas for customization. For more information about Neon, please check out the main [README](/../../README.md)

# Getting Started

## Prerequisites
Neon runs on all major systems that support running Nvidia GPUs. We have tested the code on Windows 10/11 (VS 2019/2022) and Ubuntu 20.04/22.04.

- C++ compiler with C++17 standard support
- CUDA version 11 or higher
- CMake version 3.19 or higher


## Build

To build and compile the grid refinement LBM application:

# @@@$$$$$$$$$$ TODO Specify the tag/release $$$$$$$$$$@@@
```
git clone -b v0.5 https://github.com/Autodesk/Neon
cd Neon
mkdir build
cd build
cmake ../
cmake --build . --target app-lbmMultiRes --config Release -j 99
```

# Grid Refinement LBM

`app-lbmMultiRes` comes with two main problem setup

1. Virtual wind tunnel where we simulate a flow over an input geometry defined by a triangle mesh
2. Lid-driven cavity which is a classical test case for measuring the accuracy of the simulation

Both problem setups can be run on either GPU (for fast high-performance simulation) or CPU (for debugging). The executable comes with a set of input users. To display them, run

```bash
./bin/app-lbmMultiRes -h
```
### Problem setup parameters:

| Parameter | Option | Comment |
|--------------------|--------------------|----------------------------------------------------------------------------------------------------------------------------|
| `--deviceType` | `cpu`, `gpu` | to select between running on GPU or CPU |
| `--deviceId` | integer value | The GPU device ID |
| `--numIter` | integer value | Number of LBM iterations run on the coarsest level |
| `--problemType` | `lid`, `mesh` | Problem type where `lid` is the lid-driven cavity problem and `mesh` is the flow over an input mesh, i.e., virtual wind tunnel |
| `--dataType` | `float`, `double` | The precision of data type used in the simulation |
| `--re` | real value | Reynolds number used in the simulation |
| `--thetaX` | real value | For the `mesh` problem, the angle of rotation of the input mesh along the X axis |
| `--thetaY` | real value | For the `mesh` problem, the angle of rotation of the input mesh along the Y axis |
| `--thetaZ` | real value | For the `mesh` problem, the angle of rotation of the input mesh along the Z axis |
| `--scale` | integer value | A value that allows scaling up the problem to a larger size to allow easy benchmarking |

### Visualization parameters:

| Parameter | Option | Comment |
|--------------------|--------------------|----------------------------------------------------------------------------------------------------------------------------|
| `--benchmark` | | Run in benchmark mode, i.e., no visualization output |
| `--visual` | | Run in visualization mode where we output PNG images of the simulation at the specified frequency |
| `--freq` | integer value | Frequency of the output for visualization. This option is allowed only with `--visual` mode |
| `--vtk` | | Output VTK files of the simulation. This option is allowed only with `--visual` mode |
| `--binary` | | Output binary down-sampled files of the simulation. This option is allowed only with `--visual` mode |
| `--sliceX` | integer value | Slice along the X axis for output images/VTK |
| `--sliceY` | integer value | Slice along the Y axis for output images/VTK |
| `--sliceZ` | integer value | Slice along the Z axis for output images/VTK |

### Performance parameters:
By default, we run the best possible configuration as presented in our [paper](http://escholarship.org/uc/item/0x86w4w1). Below are the parameters that would help to reproduce the ablation study along with the figure in the paper that schematically shows the operation

| Parameter | Option | Comment |
|------------------------------|--------------------|----------------------------------------------------------------------------------------------------------------------------|
| `--storeCoarse` | | Initiate the Accumulate operation from the coarse level as done in the baseline algorithm (Figure 4.a) |
| `--storeFine` | | Initiate the Accumulate operation from the fine level (Figure 4.b) |
| `--collisionFusedStore` | | Fuse Collision with Accumulate operation (Figure 4.c) |
| `--streamFusedExpl` | | Fuse Stream with Explosion (Figure 4.d) |
| `--streamFusedCoal` | | Fuse Stream with Coalescence (Figure 4.e) |
| `--streamFuseAll` | | Fuse Stream with Coalescence and Explosion (Figure 4.f) |
| `--fusedFinest` | | Fuse all operations on the finest level, i.e., Collision, Accumulate, Explosion, Stream (Figure 4.f) |

Finally, to switch between the `KBC` and `BGK` collision model, change the #define directive parameter at the top of the [`lbmMultiRes.cu`](/lbmMultiRes.cu).

## Lid-driven cavity
After running the lid-drive cavity problem, the simulation will output two files (`NeonMultiResLBM_####_Y.dat`, `NeonMultiResLBM_####_X.dat`) which can be used to reproduce Figure 7 in the paper. To reproduce the figure, pass these two files to this [python script](/scripts/plot.py).

## Virtual Wind Tunnel

The `flowOverMesh` method in [`flowOverShape.h`](/flowOverShape.h) defined various geometric properties to run a fluid simulation over a shape. The method is fully documented to facilitate customization.

The airplane input mesh used in Figure 1 can be found [here](/practice_v28.obj).


# Citation

```
@inproceedings{Mahmoud:2024:OGI,
author = {Mahmoud, Ahmed H. and Salehipour, Hesam and Meneghin, Massimiliano},
booktitle = {Proceedings of the 38th IEEE International Parallel and Distributed Processing Symposium},
title = {Optimized GPU Implementation of Grid Refinement in Lattice Boltzmann Method},
year = 2024,
month = {},
pages = {},
doi = {},
url = {http://escholarship.org/uc/item/0x86w4w1}
}
```
89 changes: 0 additions & 89 deletions apps/lbmMultiRes/flowOverShape.h
Original file line number Diff line number Diff line change
Expand Up @@ -114,95 +114,6 @@ void initFlowOverShape(Neon::domain::mGrid& grid,
initSumStore<T, Q>(grid, sumStore);
}

template <typename T, int Q>
void flowOverJet(const Neon::Backend backend,
const Params& params)
{
static_assert(std::is_same_v<T, float> || std::is_same_v<T, double>);

Neon::index_3d gridDim(19 * params.scale, 8 * params.scale, 8 * params.scale);

Neon::index_3d jetBoxDim(2 * params.scale, 2 * params.scale, 2 * params.scale);
Neon::index_3d jetBoxPosition(3 * params.scale, 3 * params.scale, 3 * params.scale);

int depth = 3;

const Neon::mGridDescriptor<1> descriptor(depth);

Neon::domain::mGrid grid(
backend, gridDim,
{[&](const Neon::index_3d idx) -> bool {
return idx.x >= 2 * params.scale && idx.x < 7 * params.scale &&
idx.y >= 3 * params.scale && idx.y < 5 * params.scale &&
idx.z >= 3 * params.scale && idx.z < 5 * params.scale;
},
[&](const Neon::index_3d idx) -> bool {
return idx.x >= params.scale && idx.x < 11 * params.scale &&
idx.y >= 2 * params.scale && idx.y < 6 * params.scale &&
idx.z >= 2 * params.scale && idx.z < 6 * params.scale;
},
[&](const Neon::index_3d idx) -> bool {
return true;
}},
Neon::domain::Stencil::s19_t(false), descriptor);


//LBM problem
const T uin = 0.04;
const T clength = T((jetBoxDim.x / 2) / (1 << (depth - 1)));
const T visclb = uin * clength / static_cast<T>(params.Re);
const T omega = 1.0 / (3. * visclb + 0.5);
const Neon::double_3d inletVelocity(uin, 0., 0.);

//auto test = grid.newField<T>("test", 1, 0);
//test.ioToVtk("Test", true, true, true, true, {-1, -1, 1});
//exit(0);

//allocate fields
auto fin = grid.newField<T>("fin", Q, 0);
auto fout = grid.newField<T>("fout", Q, 0);
auto storeSum = grid.newField<float>("storeSum", Q, 0);
auto cellType = grid.newField<CellType>("CellType", 1, CellType::bulk);


//init fields
initFlowOverShape<T, Q>(grid, storeSum, fin, fout, cellType, inletVelocity, [=] NEON_CUDA_HOST_DEVICE(Neon::index_3d idx) {
idx.x -= jetBoxPosition.x;
idx.y -= jetBoxPosition.y;
idx.z -= jetBoxPosition.z;
if (idx.x < 0 || idx.y < 0 || idx.z < 0) {
return false;
}

idx.x = (jetBoxDim.x / 2) - (idx.x - (jetBoxDim.x / 2));
return sdfJetfighter(glm::ivec3(idx.z, idx.y, idx.x), glm::ivec3(jetBoxDim.x, jetBoxDim.y, jetBoxDim.z)) <= 0;

//Neon::index_4d sphere(jetBoxPosition.x + jetBoxDim.x / 2, jetBoxPosition.y + jetBoxDim.y / 2, jetBoxPosition.z + jetBoxDim.z / 2, jetBoxDim.x / 4);
//const T dx = sphere.x - idx.x;
//const T dy = sphere.y - idx.y;
//const T dz = sphere.z - idx.z;
//if ((dx * dx + dy * dy + dz * dz) < sphere.w * sphere.w) {
// return true;
//} else {
// return false;
//}
});

//cellType.updateHostData();
//cellType.ioToVtk("cellType", true, true, true, true);

runNonUniformLBM<T, Q>(grid,
params,
clength,
omega,
visclb,
inletVelocity,
cellType,
storeSum,
fin,
fout);
}

template <typename T, int Q>
void flowOverSphere(const Neon::Backend backend,
const Params& params)
Expand Down
35 changes: 12 additions & 23 deletions apps/lbmMultiRes/lbmMultiRes.cu
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,15 @@ struct Params
std::string meshFile = "";
int freq = 100;
int Re = 100;
int deviceId = 99;
int numIter = 2;
int deviceId = 0;
int numIter = 1000;
bool benchmark = true;
bool fineInitStore = false;
bool fineInitStore = true;
bool streamFusedExpl = false;
bool streamFusedCoal = false;
bool streamFuseAll = false;
bool collisionFusedStore = false;
bool fusedFinest = false;
bool streamFuseAll = true;
bool collisionFusedStore = true;
bool fusedFinest = true;
int sliceX = -1;
int sliceY = -1;
int sliceZ = 1;
Expand All @@ -53,13 +53,13 @@ int main(int argc, char** argv)

auto cli =
(clipp::option("--deviceType") & clipp::value("deviceType", params.deviceType) % "Type of device (gpu or cpu)",
clipp::required("--deviceId") & clipp::integers("deviceId", params.deviceId) % "Device id",
clipp::option("--deviceId") & clipp::integers("deviceId", params.deviceId) % "Device id",
clipp::option("--numIter") & clipp::integer("numIter", params.numIter) % "LBM number of iterations",
clipp::option("--problemType") & clipp::value("problemType", params.problemType) % "Problem type ('lid' for lid-driven cavity, 'sphere' for flow over sphere, or 'jet' for flow over jet fighter, 'mesh' for flow over mesh)",
clipp::option("--problemType") & clipp::value("problemType", params.problemType) % "Problem type ('lid' for lid-driven cavity, 'sphere' for flow over sphere, 'mesh' for flow over mesh)",
clipp::option("--meshFile") & clipp::value("meshFile", params.meshFile) % "Path to mesh file for 'mesh' type problem",
clipp::option("--dataType") & clipp::value("dataType", params.dataType) % "Data type (float or double)",
clipp::option("--re") & clipp::integers("Re", params.Re) % "Reynolds number",
clipp::option("--scale") & clipp::integers("scale", params.scale) % "Scale of the problem for parametrized problems. 0-9 for lid. jet is up to 112. Sphere is 2 (or maybe more)",
clipp::option("--re") & clipp::integer("Re", params.Re) % "Reynolds number",
clipp::option("--scale") & clipp::integer("scale", params.scale) % "Scale of the problem for parametrized problems. 0-9 for lid. Sphere is 2 (or maybe more)",

clipp::option("--sliceX") & clipp::integer("sliceX", params.sliceX) % "Slice along X for output images/VTK",
clipp::option("--sliceY") & clipp::integer("sliceY", params.sliceY) % "Slice along Y for output images/VTK",
Expand All @@ -76,7 +76,7 @@ int main(int argc, char** argv)
clipp::option("--binary").set(params.binary, true) % "Output binary (down-sampled) files. Active only with if 'visual' is true",
clipp::option("--gui").set(params.gui, true) % "Show Polyscope gui. Active only with if 'visual' is true",

clipp::option("--freq") & clipp::integers("freq", params.freq) % "Output frequency (only works with visual mode)",
clipp::option("--freq") & clipp::integer("freq", params.freq) % "Output frequency (only works with visual mode)",

((clipp::option("--storeFine").set(params.fineInitStore, true) % "Initiate the Store operation from the fine level") |
(clipp::option("--storeCoarse").set(params.fineInitStore, false) % "Initiate the Store operation from the coarse level") |
Expand All @@ -101,7 +101,7 @@ int main(int argc, char** argv)
NEON_THROW(exp);
}

if (params.problemType != "lid" && params.problemType != "sphere" && params.problemType != "jet" && params.problemType != "mesh") {
if (params.problemType != "lid" && params.problemType != "sphere" && params.problemType != "mesh") {
Neon::NeonException exp("app-lbmMultiRes");
exp << "Unknown input problem type " << params.problemType;
NEON_THROW(exp);
Expand Down Expand Up @@ -152,17 +152,6 @@ int main(int argc, char** argv)
}
}

if (params.problemType == "jet") {
report = Neon::Report("Jet MultiRes LBM");
report.commandLine(argc, argv);
if (params.dataType == "float") {
flowOverJet<float, Q>(backend, params);
}
if (params.dataType == "double") {
flowOverJet<double, Q>(backend, params);
}
}

if (params.problemType == "mesh") {
report = Neon::Report("Mesh MultiRes LBM");
report.commandLine(argc, argv);
Expand Down

0 comments on commit 0b7b258

Please sign in to comment.