Skip to content

Commit

Permalink
Fix invalid memory access on Nvidia GPUs in CSR-Adaptive SpMV kernel (#…
Browse files Browse the repository at this point in the history
…192)

* corrected condition to prevent buffer overflow (fixes bug on Nvidia K20m)

* removed unneeded line
  • Loading branch information
Moritz Kreutzer authored and Kent Knox committed Aug 1, 2016
1 parent ffb1950 commit 8b03255
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions src/library/kernels/csrmv_adaptive.cl
Original file line number Diff line number Diff line change
Expand Up @@ -422,8 +422,7 @@ csrmv_adaptive(__global const VALUE_TYPE * restrict const vals,
// However, this may change in the future (e.g. with shared virtual memory.)
// This causes a minor performance loss because this is the last workgroup
// to be launched, and this loop can't be unrolled.
const unsigned int max_to_load = rowPtrs[stop_row] - rowPtrs[row];
for(int i = 0; i < ((int)max_to_load-(int)lid); i += WG_SIZE)
for(int i = 0; col+i < rowPtrs[stop_row]; i += WG_SIZE)
partialSums[lid + i] = alpha * vals[col + i] * vec[cols[col + i]];
}
barrier(CLK_LOCAL_MEM_FENCE);
Expand Down

0 comments on commit 8b03255

Please sign in to comment.