Add notes & change format

Wichai-pan · Nov 27, 2024 · 9369008 · 9369008
1 parent b72b0bb
commit 9369008
Showing 1 changed file with 75 additions and 42 deletions.
diff --git a/_posts/HPC/2024-11-27-HPC-Session 06-Non-blocking P2P commuication.md b/_posts/HPC/2024-11-27-HPC-Session 06-Non-blocking P2P commuication.md
@@ -18,16 +18,18 @@ pin: false
 
 ## **Buffers**
 
-MPI distinguishes different types of buffers: I variables
+MPI distinguishes different types of buffers: 
+
+- variables
 
 - user-level buffers
 - hardware/system buffers
 
 MPI implementations are excellent in tuning communication, i.e. avoid copying, but we have to assume that a message runs through all buffers, then through the network, and then bottom-up through all buffers again. This means that Send and Recv are expensive operations.
 
-Even worse, two concurrent sends might deadlock (but only for massive message counts or extremely large messages).
+Even worse, two concurrent sends might deadlock (but only for massive message counts or extremely large messages). 两个并发发送可能会发生死锁
 
-⇒ One way to deal with this is to allow MPI to optimize the messaging by giving both Send and Recv commands simultaneously — this is a `MPI_Sendrecv`.
+⇒ One way to deal with this is to allow MPI to optimize the messaging by giving both Send and Recv commands simultaneously — this is a `MPI_Sendrecv`. 允许 MPI 通过同时给出发送和接收命令来优化消息传递 
 
 
 
@@ -46,15 +48,19 @@ int MPI_Sendrecv(
 ```
 
 - Shortcut for send followed by receive
--  Allows MPI to optimise aggressively
--  Anticipates that many applications have dedicated compute and data exchange phases
-⇒ Does not really solve our efficiency concerns, just weaken them
+  发送后接收的快捷方式
+- Allows MPI to optimise aggressively
+  积极优化
+- Anticipates that many applications have dedicated compute and data exchange phases
+  预计许多应用程序都有专门的计算和数据交换阶段
+
+  ⇒ Does not really solve our efficiency concerns, just weaken them
 
 
 
 ### `MPI_Sendrecv` example
 
-We have a program which sends an nentries-length buffer between two processes:
+We have a program which **sends an nentries-length buffer** between two processes:
 ```cpp
 if (rank == 0) {
   MPI_Send(sendbuf, nentries, MPI_INT, 1, 0, ...);
@@ -65,7 +71,10 @@ if (rank == 0) {
 }
 ```
 
-- Recall that MPI Send behaves like MPI Bsend when buffer space is available, and then behaves like MPI Ssend when it is not.
+- Recall that MPI Send 
+  - behaves like `MPI_Bsend` when buffer space is available,  缓冲区空间可用
+  - and then behaves like `MPI_Ssend` when it is not. 不可用
+
 
 ```cpp
 if (rank == 0) {
@@ -85,8 +94,10 @@ if (rank == 0) {
 
 - Non-blocking commands start with I (immediate return, e.g.)
 - Non-blocking means that operation returns immediately though MPI might not have transferred data (mightnot even have started)
-- Buffer thus is still in use and we may not overwrite it
-- We explicitly have to validate whether message transfer has completed before we reuse or delete the buffer
+  非阻塞意味着操作立即返回，尽管 MPI 可能尚未传输数据（甚至可能尚未开始）
+- Buffer thus is still in use and we may not overwrite it 不能覆盖buffer
+- We explicitly have to validate whether message transfer has completed before we reuse or delete the buffer 
+  需要验证消息传输是否已完成，然后才能重用或删除缓冲区
 ```cpp
 // Create helper variable (handle)
 int a = 1;
@@ -105,9 +116,8 @@ a = 2;
 ## **Why non-blocking. . . ?**
 
 - Added flexibility of separating posting messages from receiving them.
-⇒ MPI libraries often have optimisations to complete sends quickly if the matching receive already exists.
-
-- Sending many messages to one process, which receives them all. . .
+⇒ MPI libraries often have optimisations to complete sends quickly if the matching receive already exists. 增加收发消息灵活性，若接收方已存在，库会进行优化以快速完成发送
+- Sending many messages to one process, which receives them all. . . 接收所有
 
 
 
@@ -133,8 +143,8 @@ for (int k=0; k<100; k++){
 
 
 ## `Isend` **&** `Irecv`
-- Non-blocking variants of `MPI_Send` and `MPI_Recv`
-- Returns immediately, but *buffer is not safe to reuse*
+- Non-blocking variants of `MPI_Send` and `MPI_Recv` 非阻塞变体
+- Returns immediately, but *buffer is not safe to reuse*  立即返回，但缓冲区不安全可重用
 
 ```cpp
 int MPI_Isend(const void *buffer, int count, MPI_Datatype dtype,
@@ -145,6 +155,7 @@ int MPI_Irecv(void *buffer, int count, MPI_Datatype dtype,
 
 - Note the `request` in the send, and the lack of status in recv
 - We need to process that `request` before we can reuse the buffers
+  我们需要在重用缓冲区之前处理 `request`
 
 
 
@@ -164,9 +175,9 @@ int MPI_Wait(MPI_Request *request, MPI_Status *status);
 
 - Pass additional pointer to object of type MPI Request. 
 - Non-blocking, i.e. operation returns immediately.
-- Check for send completition with `MPI_Wait` or `MPI_Test`. 
-- `MPI_Irecv` analogous.
-- The status object is not required for the receive process, as we have to hand it over to wait or test later.
+- Check for send completition with `MPI_Wait` or `MPI_Test`.  检查发送状况
+- `MPI_Irecv` analogous. 相似
+- The status object is not required for the receive process, as we have to hand it over to **wait or test later**. 状态对象不需要用于接收过程，因为我们必须将其交给等待或测试。
 
 
 
@@ -183,6 +194,9 @@ int MPI_Test(MPI_Request *request, int *flag, MPI_Status *status);
 ```
 
 - `flag` will be true (an `int` of value 1) if the provided request has been completed, and false otherwise. 
+  - true - completed
+  - false - not 
+
 - If we don’t want to test for completion, we can instead `MPI_Wait`. . .
 
 
@@ -233,9 +247,9 @@ MPI_Send(buffer2, 10, MPI_INT, right, 0, MPI_COMM_WORLD);
 //MPI_Wait(&request2, &status);
 ```
 
-- Does Variant A deadlock? *Yes!* `MPI_Recv` is always blocking.
-- Does Variant B deadlock? Not for only 10 integers (if not too many messages sent before).
-- Does Variant C deadlock? Is it correct? Is it fast? May we add additional operations before the first wait?
+- Does Variant A deadlock? *Yes!* `MPI_Recv` is always blocking. (always waiting)
+- Does Variant B deadlock? Not for only 10 integers (if not too many messages sent before). 若消息数量大 会死锁
+- Does Variant C deadlock? Is it correct? Is it fast? May we add additional operations before the first wait?  -- does not dead lock. fast but may not correct 
 
 
 
@@ -255,11 +269,13 @@ MPI_Send(buffer2, 10, MPI_INT, right, 0, MPI_COMM_WORLD);
 ## **Definition: collective**
 
 >  Collective operation: A collective (MPI) operation is an operation involving many/all nodes/ranks.
+>
+>  集体操作：集体（MPI）操作是涉及许多/所有节点/排名的操作。
 
-- In MPI, a collective operation involves all ranks of one communicator (introduced later)
+- In MPI, a collective operation involves all **ranks** of one communicator (introduced later)
 - For `MPI_COMM_WORLD`, a collective operation involves all ranks
-- Collectives are blocking (though newer (>=3.1) MPI standard introduces non-blocking collectives)
-- Blocking collectives always synchronise all ranks, i.e. all ranks have to enter the same collective instruction before any rank proceeds
+- Collectives are **blocking** (though newer (>=3.1) MPI standard introduces non-blocking collectives)
+- Blocking collectives always **synchronise all ranks**, i.e. all ranks have to enter the same collective instruction before any rank proceeds 同步所有等级，即所有等级必须在任何等级继续之前输入相同的集合指令
 
 
 
@@ -278,7 +294,7 @@ else {
   MPI_Send(&a,1,MPI_DOUBLE,0, ...);
 ```
 
-What type of collective operation is realised here?
+What type of collective operation is realised here?  (a ???)
 
 ```cpp
 double globalSum;
@@ -290,15 +306,19 @@ MPI_Reduce(&a, &globalSum, 1,
 
 ## **Flavours of collective operations in MPI**
 
-form
+| Type of collective | One-to-all         | All-to-one | All-to-all |
+| ------------------ | ------------------ | ---------- | ---------- |
+| Synchronisation    | Barrier            |            |            |
+| Communication      | Broadcast, Scatter | Gather     | Allgather  |
+| Computation        |                    | Reduce     | Allreduce  |
 
 Insert the following MPI operations into the table (MPI prefix and signature neglected): 
 
 - Barrier
 - Broadcast
 - Reduce
 - Allgather 
-- Scatter 
+- Scatter 散开
 - Gather
 - Allreduce
 
@@ -310,22 +330,25 @@ Insert the following MPI operations into the table (MPI prefix and signature neg
 
 ![image-20241127021620338](https://wichaiblog-1316355194.cos.ap-hongkong.myqcloud.com/image-20241127021620338.png)
 
-- Simplicity of code
+- Simplicity of code 代码的简介
 
 - Performance through specialised implementations
-- Support through dedicated hardware (cf. BlueGene’s three network topologies: clique, fat tree, ring)
+- Support through dedicated hardware (cf. BlueGene’s three network topologies: clique, fat tree, ring) 专用硬件支持
 
 
 
 
 
 ## MPI Barrier
 - Simplest form of collective operation — synchronization of all ranks in comm. 
+
+	最简单的集体操作形式——同步所有通信等级。
+
 - Rarely used:
 
-⇒ MPI Barrier doesn’t synchronize non-blocking calls
+⇒ MPI Barrier doesn’t synchronize non-blocking calls 不同步非阻塞调用
 
-⇒ Really meant for telling MPI about calls *outside* MPI, like IO
+⇒ Really meant for telling MPI about calls *outside* MPI, like IO 用于告诉 MPI 关于 MPI 外部调用，比如 IO
 
 ```cpp
 int rank, size;
@@ -343,11 +366,18 @@ for ( int ii = 0; ii < size; ++ii ) {
 
 
 
-## MPI Bcast **&** MPI Scatter
-- MPI Bcast sends the contents of a buffer from root to all other processes. 
-- MPI Scatter sends *parts* of a buffer from root to different processes.
-- MPI Bcast is the inverse of MPI Reduce
-- MPI Scatter is the inverse of MPI Gather
+## `MPI_Bcast` **&** `MPI_Scatter`
+- `MPI_Bcast` sends the contents of a buffer from root to all other processes. 
+
+  将缓冲区的内容从根进程发送到所有其他进程
+
+- `MPI_Scatter` sends *parts* of a buffer from root to different processes.
+
+  将缓冲区的部分从根进程发送到不同的进程
+
+- `MPI_Bcast` is the inverse of `MPI_Reduce` 逆操作
+
+- `MPI_Scatter` is the inverse of `MPI_Gather`
 
 
 ```cpp
@@ -370,9 +400,12 @@ MPI_Scatter(sendbuf, 100, MPI_INT, rbuf, 100, MPI_INT, root, comm);
 
 
 
-## MPI Reduce **&** MPI Gather
-- MPI_Reduce reduces a value across ranks to a single value on root using a prescribed reduction operator. 
-- MPI_Gather concatenates the array pieces from all processes onto the root process.
+## `MPI_Reduce` **&** `MPI_Gather`
+- `MPI_Reduce` reduces a value across ranks to a single value on root using a prescribed reduction operator. 
+
+  将跨等级的值减少到根上使用指定的减少运算符的单个值 ?
+
+- `MPI_Gather` concatenates the array pieces from all processes onto the root process. 将所有进程的数组片段连接到根进程
 
 
 ```cpp
@@ -396,9 +429,9 @@ MPI_Gather(sendarray, 100, MPI_INT, rbuf, 100, MPI_INT, root, comm);
 
 
 
-## MPI Allgather **&** MPI Allreduce
-- MPI Allgather is an MPI Gather which concatenates the array pieces on all processes. 
-- MPI Allreduce is an MPI Reduce which reduces on all processes.
+## `MPI_Allgather` **&** `MPI_Allreduce`
+- `MPI_Allgather` is an MPI Gather which concatenates the array pieces on all processes.  将所有进程上的数组片段连接起来
+- `MPI_Allreduce` is an `MPI_Reduce` which reduces on all processes. 一个在所有进程上进行减少的 `MPI_Reduce`
 
 ```cpp
 MPI_Comm comm;