Merge branch 'MultiResLBMOpt' of https://github.com/Autodesk/Neon int…

…o develop
Autodesk · Apr 10, 2024 · 0b7b258 · 0b7b258
2 parents 82e4c75 + 6057df8
commit 0b7b258
Show file tree

Hide file tree

Showing 4 changed files with 121 additions and 112 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,7 @@
 ![Neon logo](docs/logo/neonDarkLogo.jpg "Neon")
 
+# For information about the GPU LBM Grid Refinement work (IPDPS 2024), go to this [README](/apps/lbmMultiRes/README.md)
+
 Neon is a research framework for programming multi-device systems maintained by [Autodesk Research](https://www.autodesk.com/research/overview). Neon's goal is to automatically transform user sequential code into, for example, a scalable multi-GPU execution.
 
 To reach its goal, Neon takes a domain-specific approach based on the parallel skeleton philosophy (a.k.a parallel patterns). Neon provides a set of domain-specific and programmable patterns that users compose through a sequential programming model to author their applications. Then, thanks to the knowledge of the domain, the patterns and their composition, Neon automatically optimizes the sequential code into an execution optimized for multi-device systems.

diff --git a/apps/lbmMultiRes/README.md b/apps/lbmMultiRes/README.md
@@ -0,0 +1,107 @@
+# Overview 
+
+We outline here the complete guide on how to reproduce the results presented in the [Optimized GPU Implementation of Grid Refinement in Lattice Boltzmann Method](http://escholarship.org/uc/item/0x86w4w1) paper. We also outline how to use the code to run other user-defined examples along with areas for customization. For more information about Neon, please check out the main [README](/../../README.md)
+
+# Getting Started 
+
+## Prerequisites
+Neon runs on all major systems that support running Nvidia GPUs. We have tested the code on Windows 10/11 (VS 2019/2022) and Ubuntu 20.04/22.04. 
+
+- C++ compiler with C++17 standard support
+- CUDA version 11 or higher
+- CMake version 3.19 or higher
+
+
+## Build
+
+To build and compile the grid refinement LBM application:
+
+# @@@$$$$$$$$$$ TODO Specify the tag/release $$$$$$$$$$@@@
+```
+git clone -b v0.5 https://github.com/Autodesk/Neon
+cd Neon
+mkdir build
+cd build
+cmake ../
+cmake --build . --target app-lbmMultiRes --config Release -j 99
+```
+
+# Grid Refinement LBM
+
+`app-lbmMultiRes` comes with two main problem setup 
+
+1. Virtual wind tunnel where we simulate a flow over an input geometry defined by a triangle mesh 
+2. Lid-driven cavity which is a classical test case for measuring the accuracy of the simulation 
+
+Both problem setups can be run on either GPU (for fast high-performance simulation) or CPU (for debugging). The executable comes with a set of input users. To display them, run 
+
+```bash
+./bin/app-lbmMultiRes -h
+```
+### Problem setup parameters:
+
+| Parameter          | Option             | Comment                                                                                                                    |
+|--------------------|--------------------|----------------------------------------------------------------------------------------------------------------------------|
+| `--deviceType`     |  `cpu`, `gpu`      | to select between running on GPU or CPU                                                                                    |
+| `--deviceId`       |  integer value     | The GPU device ID                                                                                                          |
+| `--numIter`        |  integer value     | Number of LBM iterations run on the coarsest level                                                                         |
+| `--problemType`    |  `lid`, `mesh`     | Problem type where `lid` is the lid-driven cavity problem and `mesh` is the flow over an input mesh, i.e., virtual wind tunnel |
+| `--dataType`       |  `float`, `double` | The precision of data type used in the simulation                                                                          |
+| `--re`             |  real value        | Reynolds number used in the simulation                                                                                     |
+| `--thetaX`         |  real value        | For the `mesh` problem, the angle of rotation of the input mesh along the X axis                                                   |
+| `--thetaY`         |  real value        | For the `mesh` problem, the angle of rotation of the input mesh along the Y axis                                                   |
+| `--thetaZ`         |  real value        | For the `mesh` problem, the angle of rotation of the input mesh along the Z axis                                                   |
+| `--scale`          |  integer value     | A value that allows scaling up the problem to a larger size to allow easy benchmarking                                        |
+
+### Visualization parameters:
+
+| Parameter          | Option             | Comment                                                                                                                    |
+|--------------------|--------------------|----------------------------------------------------------------------------------------------------------------------------|
+| `--benchmark`      |                    | Run in benchmark mode, i.e., no visualization output                                                                       |
+| `--visual`         |                    | Run in visualization mode where we output PNG images of the simulation at the specified frequency                         |
+| `--freq`           |  integer value     | Frequency of the output for visualization. This option is allowed only with `--visual` mode                                |
+| `--vtk`            |                    | Output VTK files of the simulation. This option is allowed only with `--visual` mode                                       |
+| `--binary`         |                    | Output binary down-sampled files of the simulation. This option is allowed only with `--visual` mode                       |
+| `--sliceX`         |  integer value     | Slice along the X axis for output images/VTK                                                                                   |
+| `--sliceY`         |  integer value     | Slice along the Y axis for output images/VTK                                                                                   |
+| `--sliceZ`         |  integer value     | Slice along the Z axis for output images/VTK                                                                                   |
+
+### Performance parameters:
+By default, we run the best possible configuration as presented in our [paper](http://escholarship.org/uc/item/0x86w4w1). Below are the parameters that would help to reproduce the ablation study along with the figure in the paper that schematically shows the operation 
+
+| Parameter                    | Option             | Comment                                                                                                                    |
+|------------------------------|--------------------|----------------------------------------------------------------------------------------------------------------------------|
+| `--storeCoarse`              |                    |  Initiate the Accumulate operation from the coarse level as done in the baseline algorithm (Figure 4.a)                    |
+| `--storeFine`                |                    |  Initiate the Accumulate operation from the fine level (Figure 4.b)                                                        |
+| `--collisionFusedStore`      |                    |  Fuse Collision with Accumulate operation (Figure 4.c)                                                                     |
+| `--streamFusedExpl`          |                    |  Fuse Stream with Explosion (Figure 4.d)                                                                                   |
+| `--streamFusedCoal`          |                    |  Fuse Stream with Coalescence (Figure 4.e)                                                                                 |
+| `--streamFuseAll`            |                    |  Fuse Stream with Coalescence and Explosion (Figure 4.f)                                                                   |
+| `--fusedFinest`              |                    |  Fuse all operations on the finest level, i.e., Collision, Accumulate, Explosion, Stream  (Figure 4.f)                     |
+
+Finally, to switch between the `KBC` and `BGK` collision model, change the #define directive parameter at the top of the [`lbmMultiRes.cu`](/lbmMultiRes.cu).
+
+## Lid-driven cavity
+After running the lid-drive cavity problem, the simulation will output two files (`NeonMultiResLBM_####_Y.dat`, `NeonMultiResLBM_####_X.dat`) which can be used to reproduce Figure 7 in the paper. To reproduce the figure, pass these two files to this [python script](/scripts/plot.py).
+
+## Virtual Wind Tunnel 
+
+The `flowOverMesh` method in [`flowOverShape.h`](/flowOverShape.h) defined various geometric properties to run a fluid simulation over a shape.  The method is fully documented to facilitate customization. 
+
+The airplane input mesh used in Figure 1 can be found [here](/practice_v28.obj).
+
+
+# Citation
+
+```
+@inproceedings{Mahmoud:2024:OGI,
+  author    = {Mahmoud, Ahmed H. and Salehipour, Hesam and Meneghin, Massimiliano},
+  booktitle = {Proceedings of the 38th IEEE International Parallel and Distributed Processing Symposium},
+  title     = {Optimized GPU Implementation of Grid Refinement in Lattice Boltzmann Method},
+  year      = 2024,
+  month     = {},
+  pages     = {},
+  doi       = {},
+  url       = {http://escholarship.org/uc/item/0x86w4w1}
+}
+```
diff --git a/apps/lbmMultiRes/flowOverShape.h b/apps/lbmMultiRes/flowOverShape.h
@@ -114,95 +114,6 @@ void initFlowOverShape(Neon::domain::mGrid&                  grid,
     initSumStore<T, Q>(grid, sumStore);
 }
 
-template <typename T, int Q>
-void flowOverJet(const Neon::Backend backend,
-                 const Params&       params)
-{
-    static_assert(std::is_same_v<T, float> || std::is_same_v<T, double>);
-
-    Neon::index_3d gridDim(19 * params.scale, 8 * params.scale, 8 * params.scale);
-
-    Neon::index_3d jetBoxDim(2 * params.scale, 2 * params.scale, 2 * params.scale);
-    Neon::index_3d jetBoxPosition(3 * params.scale, 3 * params.scale, 3 * params.scale);
-
-    int depth = 3;
-
-    const Neon::mGridDescriptor<1> descriptor(depth);
-
-    Neon::domain::mGrid grid(
-        backend, gridDim,
-        {[&](const Neon::index_3d idx) -> bool {
-             return idx.x >= 2 * params.scale && idx.x < 7 * params.scale &&
-                    idx.y >= 3 * params.scale && idx.y < 5 * params.scale &&
-                    idx.z >= 3 * params.scale && idx.z < 5 * params.scale;
-         },
-         [&](const Neon::index_3d idx) -> bool {
-             return idx.x >= params.scale && idx.x < 11 * params.scale &&
-                    idx.y >= 2 * params.scale && idx.y < 6 * params.scale &&
-                    idx.z >= 2 * params.scale && idx.z < 6 * params.scale;
-         },
-         [&](const Neon::index_3d idx) -> bool {
-             return true;
-         }},
-        Neon::domain::Stencil::s19_t(false), descriptor);
-
-
-    //LBM problem
-    const T               uin = 0.04;
-    const T               clength = T((jetBoxDim.x / 2) / (1 << (depth - 1)));
-    const T               visclb = uin * clength / static_cast<T>(params.Re);
-    const T               omega = 1.0 / (3. * visclb + 0.5);
-    const Neon::double_3d inletVelocity(uin, 0., 0.);
-
-    //auto test = grid.newField<T>("test", 1, 0);
-    //test.ioToVtk("Test", true, true, true, true, {-1, -1, 1});
-    //exit(0);
-
-    //allocate fields
-    auto fin = grid.newField<T>("fin", Q, 0);
-    auto fout = grid.newField<T>("fout", Q, 0);
-    auto storeSum = grid.newField<float>("storeSum", Q, 0);
-    auto cellType = grid.newField<CellType>("CellType", 1, CellType::bulk);
-
-
-    //init fields
-    initFlowOverShape<T, Q>(grid, storeSum, fin, fout, cellType, inletVelocity, [=] NEON_CUDA_HOST_DEVICE(Neon::index_3d idx) {
-        idx.x -= jetBoxPosition.x;
-        idx.y -= jetBoxPosition.y;
-        idx.z -= jetBoxPosition.z;
-        if (idx.x < 0 || idx.y < 0 || idx.z < 0) {
-            return false;
-        }
-
-        idx.x = (jetBoxDim.x / 2) - (idx.x - (jetBoxDim.x / 2));
-        return sdfJetfighter(glm::ivec3(idx.z, idx.y, idx.x), glm::ivec3(jetBoxDim.x, jetBoxDim.y, jetBoxDim.z)) <= 0;
-
-        //Neon::index_4d sphere(jetBoxPosition.x + jetBoxDim.x / 2, jetBoxPosition.y + jetBoxDim.y / 2, jetBoxPosition.z + jetBoxDim.z / 2, jetBoxDim.x / 4);
-        //const T dx = sphere.x - idx.x;
-        //const T dy = sphere.y - idx.y;
-        //const T dz = sphere.z - idx.z;
-        //if ((dx * dx + dy * dy + dz * dz) < sphere.w * sphere.w) {
-        //    return true;
-        //} else {
-        //    return false;
-        //}
-    });
-
-    //cellType.updateHostData();
-    //cellType.ioToVtk("cellType", true, true, true, true);
-
-    runNonUniformLBM<T, Q>(grid,
-                           params,
-                           clength,
-                           omega,
-                           visclb,
-                           inletVelocity,
-                           cellType,
-                           storeSum,
-                           fin,
-                           fout);
-}
-
 template <typename T, int Q>
 void flowOverSphere(const Neon::Backend backend,
                     const Params&       params)

diff --git a/apps/lbmMultiRes/lbmMultiRes.cu b/apps/lbmMultiRes/lbmMultiRes.cu
@@ -18,15 +18,15 @@ struct Params
     std::string meshFile = "";
     int         freq = 100;
     int         Re = 100;
-    int         deviceId = 99;
-    int         numIter = 2;
+    int         deviceId = 0;
+    int         numIter = 1000;
     bool        benchmark = true;
-    bool        fineInitStore = false;
+    bool        fineInitStore = true;
     bool        streamFusedExpl = false;
     bool        streamFusedCoal = false;
-    bool        streamFuseAll = false;
-    bool        collisionFusedStore = false;
-    bool        fusedFinest = false;
+    bool        streamFuseAll = true;
+    bool        collisionFusedStore = true;
+    bool        fusedFinest = true;
     int         sliceX = -1;
     int         sliceY = -1;
     int         sliceZ = 1;
@@ -53,13 +53,13 @@ int main(int argc, char** argv)
 
         auto cli =
             (clipp::option("--deviceType") & clipp::value("deviceType", params.deviceType) % "Type of device (gpu or cpu)",
-             clipp::required("--deviceId") & clipp::integers("deviceId", params.deviceId) % "Device id",
+             clipp::option("--deviceId") & clipp::integers("deviceId", params.deviceId) % "Device id",
              clipp::option("--numIter") & clipp::integer("numIter", params.numIter) % "LBM number of iterations",
-             clipp::option("--problemType") & clipp::value("problemType", params.problemType) % "Problem type ('lid' for lid-driven cavity, 'sphere' for flow over sphere, or 'jet' for flow over jet fighter, 'mesh' for flow over mesh)",
+             clipp::option("--problemType") & clipp::value("problemType", params.problemType) % "Problem type ('lid' for lid-driven cavity, 'sphere' for flow over sphere, 'mesh' for flow over mesh)",
              clipp::option("--meshFile") & clipp::value("meshFile", params.meshFile) % "Path to mesh file for 'mesh' type problem",
              clipp::option("--dataType") & clipp::value("dataType", params.dataType) % "Data type (float or double)",
-             clipp::option("--re") & clipp::integers("Re", params.Re) % "Reynolds number",
-             clipp::option("--scale") & clipp::integers("scale", params.scale) % "Scale of the problem for parametrized problems. 0-9 for lid. jet is up to 112. Sphere is 2 (or maybe more)",
+             clipp::option("--re") & clipp::integer("Re", params.Re) % "Reynolds number",
+             clipp::option("--scale") & clipp::integer("scale", params.scale) % "Scale of the problem for parametrized problems. 0-9 for lid. Sphere is 2 (or maybe more)",
 
              clipp::option("--sliceX") & clipp::integer("sliceX", params.sliceX) % "Slice along X for output images/VTK",
              clipp::option("--sliceY") & clipp::integer("sliceY", params.sliceY) % "Slice along Y for output images/VTK",
@@ -76,7 +76,7 @@ int main(int argc, char** argv)
              clipp::option("--binary").set(params.binary, true) % "Output binary (down-sampled) files. Active only with if 'visual' is true",
              clipp::option("--gui").set(params.gui, true) % "Show Polyscope gui. Active only with if 'visual' is true",
 
-             clipp::option("--freq") & clipp::integers("freq", params.freq) % "Output frequency (only works with visual mode)",
+             clipp::option("--freq") & clipp::integer("freq", params.freq) % "Output frequency (only works with visual mode)",
 
              ((clipp::option("--storeFine").set(params.fineInitStore, true) % "Initiate the Store operation from the fine level") |
               (clipp::option("--storeCoarse").set(params.fineInitStore, false) % "Initiate the Store operation from the coarse level") |
@@ -101,7 +101,7 @@ int main(int argc, char** argv)
             NEON_THROW(exp);
         }
 
-        if (params.problemType != "lid" && params.problemType != "sphere" && params.problemType != "jet" && params.problemType != "mesh") {
+        if (params.problemType != "lid" && params.problemType != "sphere" && params.problemType != "mesh") {
             Neon::NeonException exp("app-lbmMultiRes");
             exp << "Unknown input problem type " << params.problemType;
             NEON_THROW(exp);
@@ -152,17 +152,6 @@ int main(int argc, char** argv)
             }
         }
 
-        if (params.problemType == "jet") {
-            report = Neon::Report("Jet MultiRes LBM");
-            report.commandLine(argc, argv);
-            if (params.dataType == "float") {
-                flowOverJet<float, Q>(backend, params);
-            }
-            if (params.dataType == "double") {
-                flowOverJet<double, Q>(backend, params);
-            }
-        }
-
         if (params.problemType == "mesh") {
             report = Neon::Report("Mesh MultiRes LBM");
             report.commandLine(argc, argv);