Skip to content

jiskattema/gpu-sig-transpose

Repository files navigation

A full run is 12 tabs, 1536 channels, 50 sequences, 500 samples per sequence

NOTE: on making the example, using constants instead of parameters helped the compilere ~3 seconds on 50 seconds...

Doing 10 iterations

No optimization: gcc -o naive naive.c

x ~/Code/gpu_sig $ time ./naive 12 1536 50 12 1536 50 1757.81MB

real 0m47.513s user 0m46.923s sys 0m0.533s

. ~/Code/gpu_sig $ time ./naive 12 1536 50 12 1536 50 1757.81MB

real 0m47.541s user 0m46.998s sys 0m0.487s

With optimization: gcc -march=native -O3 -Ofast -o naive_optimized naive.c

. ~/Code/gpu_sig $ time ./naive_optimized 12 1536 50 12 1536 50 1757.81MB

real 0m9.927s user 0m9.393s sys 0m0.516s

. ~/Code/gpu_sig $ time ./naive_optimized 12 1536 50 12 1536 50 1757.81MB

real 0m9.945s user 0m9.399s sys 0m0.529s

Scaling up the number of packets: for i in seq 1 50; do time ./naive 12 1536 $i ; done &> o

Scaling up the number of channels: for i in seq 4 4 400; do time ./naive 12 $i 50 ; done &> o2 for i in seq 436 100 1536; do time ./naive 12 $i 50 ; done &> o2

About

Example code for a memory transpose problem

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published