Implement the ping-pong buffer in Halide #5458

Vernlium · 2020-11-19T02:42:47Z

Vernlium
Nov 19, 2020

Hi all,

Now, we use halide to implement some cv algorithm in Huawei DaVinci arch chip. When we want to implement ping-pong buffer, we're having some trouble.

DaVinci arch have different pileline, such as MTE2(move data from global memory to on-chip memory), VECTOR(do data compute, SIMD), MTE3(move data from on-chip memory to global memory). The different pipeline need to be synchronized. One way to improve the performance of algorithm is use ping-pong buffer that can implement the parallelism between different pipelines.

We have implement the single buffer in Halide. the IR like this:

for (i0, 0, i0.extent) {
     copy_data_in(addr1_onchip,  addr1_gm);
     vector_xxx(addr1_onchip);
     ...;
     copy_data_out(addr2_gm, addr2_onchip);
}

We want the ping-pong buufer IR like this:

for (i0, 0, i0.extent / 2) {
     // ping buffer
     copy_data_in(addr1_1_onchip,  addr1_gm);
     vector_xxx(addr1_1_onchip);
     ...;
     copy_data_out(addr2_gm, addr2_1_onchip);

     // pong buffer
     copy_data_in(addr1_2_onchip,  addr1_gm + offset);
     vector_xxx(addr1_2_onchip);
     ...;
     copy_data_out(addr2_gm + offset, addr2_2_onchip);
}

The pipeline changes is like this:

I can't get this ir by the schedule. Do your guys have any ideas?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the ping-pong buffer in Halide #5458

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Implement the ping-pong buffer in Halide #5458

Vernlium Nov 19, 2020

Replies: 0 comments

Vernlium
Nov 19, 2020