You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The NVDLA unit description (http://nvdla.org/hw/v1/ias/unit_description.html) mentions an upper limit length of 32 for a Stripe Operation:
"The upper limit is 32 due to buffer size in the accumulator"
However, this seems to contradict the buffer size as mentioned in the "Convolution Accumulator" chapter. Let me explain why:
Every Atomic Operation results in 16 partial sums (see chapter "Atomic Operation"). So, we will have 32x16=512 Elements in total after a maximum sized Stripe Operation.
Each of these elements will be saved as an INT48 (when using INT16 in the previous steps) in the assembly SRAM group (see table 49).
This results in 512x6 Byte=3kiB.
According to the chapter "Convolution Accumulator", the buffer size is 96Bx128=12kiB
So, in theory the length of Stripe Operation could be 128 instead of 32.
Is there any reason why this is not the case or are the calculations wrong?
The text was updated successfully, but these errors were encountered:
The NVDLA unit description (http://nvdla.org/hw/v1/ias/unit_description.html) mentions an upper limit length of 32 for a Stripe Operation:
"The upper limit is 32 due to buffer size in the accumulator"
However, this seems to contradict the buffer size as mentioned in the "Convolution Accumulator" chapter. Let me explain why:
Every Atomic Operation results in 16 partial sums (see chapter "Atomic Operation"). So, we will have 32x16=512 Elements in total after a maximum sized Stripe Operation.
Each of these elements will be saved as an INT48 (when using INT16 in the previous steps) in the assembly SRAM group (see table 49).
This results in 512x6 Byte=3kiB.
According to the chapter "Convolution Accumulator", the buffer size is 96Bx128=12kiB
So, in theory the length of Stripe Operation could be 128 instead of 32.
Is there any reason why this is not the case or are the calculations wrong?
The text was updated successfully, but these errors were encountered: