You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now, we use halide to implement some cv algorithm in Huawei DaVinci arch chip. When we want to implement ping-pong buffer, we're having some trouble.
DaVinci arch have different pileline, such as MTE2(move data from global memory to on-chip memory), VECTOR(do data compute, SIMD), MTE3(move data from on-chip memory to global memory). The different pipeline need to be synchronized. One way to improve the performance of algorithm is use ping-pong buffer that can implement the parallelism between different pipelines.
We have implement the single buffer in Halide. the IR like this:
This discussion was converted from issue #5458 on November 24, 2020 00:38.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi all,
Now, we use halide to implement some cv algorithm in Huawei DaVinci arch chip. When we want to implement ping-pong buffer, we're having some trouble.
DaVinci arch have different pileline, such as MTE2(move data from global memory to on-chip memory), VECTOR(do data compute, SIMD), MTE3(move data from on-chip memory to global memory). The different pipeline need to be synchronized. One way to improve the performance of algorithm is use ping-pong buffer that can implement the parallelism between different pipelines.
We have implement the single buffer in Halide. the IR like this:
We want the ping-pong buufer IR like this:
The pipeline changes is like this:
I can't get this ir by the schedule. Do your guys have any ideas?
Beta Was this translation helpful? Give feedback.
All reactions