Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accuarcy of function perf_get_mcycle64() #745

Open
Siris-Li opened this issue Dec 1, 2022 · 3 comments
Open

accuarcy of function perf_get_mcycle64() #745

Siris-Li opened this issue Dec 1, 2022 · 3 comments

Comments

@Siris-Li
Copy link

Siris-Li commented Dec 1, 2022

Hello! I have some questions about function perf_get_mcycle64().
There exists some difference in the total cycles if I use perf_counters in contrast to not using them.
The following shows the result without perf_counter when running KWS model.

Running kws
.............
"Event","Tag","Ticks"
0,CONV_2D,18426
1,DEPTHWISE_CONV_2D,4340
2,CONV_2D,12430
3,DEPTHWISE_CONV_2D,4290
4,CONV_2D,11224
5,DEPTHWISE_CONV_2D,4094
6,CONV_2D,12518
7,DEPTHWISE_CONV_2D,4397
8,CONV_2D,11369
9,AVERAGE_POOL_2D,90
10,RESHAPE,2
11,FULLY_CONNECTED,18
12,SOFTMAX,12
 Counter |  Total | Starts | Average |     Raw
---------+--------+--------+---------+--------------
    0    |     0  |     0  |   n/a   |            0
    1    |     0  |     0  |   n/a   |            0
    2    |     0  |     0  |   n/a   |            0
    3    |     0  |     0  |   n/a   |            0
    4    |     0  |     0  |   n/a   |            0
    5    |     0  |     0  |   n/a   |            0
    6    |     0  |     0  |   n/a   |            0
    7    |     0  |     0  |   n/a   |            0
    85M (     85231808 )  cycles total

However, if I add perf_enable_counter and perf_disable_counter in conv.h to use some perf_counters, the total cycles change.

Running kws
.............
"Event","Tag","Ticks"
0,CONV_2D,21499
1,DEPTHWISE_CONV_2D,4239
2,CONV_2D,9217
3,DEPTHWISE_CONV_2D,4193
4,CONV_2D,8028
5,DEPTHWISE_CONV_2D,3992
6,CONV_2D,8771
7,DEPTHWISE_CONV_2D,4709
8,CONV_2D,8350
9,AVERAGE_POOL_2D,90
10,RESHAPE,2
11,FULLY_CONNECTED,18
12,SOFTMAX,14
 Counter |  Total | Starts | Average |     Raw
---------+--------+--------+---------+--------------
    0    |    57M |     5  |    11M  |     57177105
    1    |    33M | 302720  |   108   |     32978671
    2    |     0  |     0  |   n/a   |            0
    3    |     0  |     0  |   n/a   |            0
    4    |     0  |     0  |   n/a   |            0
    5    |     0  |     0  |   n/a   |            0
    6    |     0  |     0  |   n/a   |            0
    7    |     0  |     0  |   n/a   |            0
    75M (     74897487 )  cycles total

It's strange that total cycles reduce a lot, I don't think it's due to error because the difference is so huge.
I know the total cycles are counted by using function perf_get_mcycle64(), which I think will not be affected by whether to use perf_counters.
Thanks in advance!

@alanvgreen
Copy link
Collaborator

alanvgreen commented Dec 1, 2022 via email

@tcal-x
Copy link
Collaborator

tcal-x commented Dec 1, 2022

@limingxuan-pku , can I assume that you used CPU variant "perf" or "perf+cfu" in both cases?

We have observed small code changes cause rather large runtime changes. The seems to be some sensitivity to code placement --- when adding calls to the perf routines, they should get inlined, which moves the location of other code, adding or removing L1 I-cache collisions.

You could look at the disassembly in build/software.elf.dis to see if there are any large changes in function code placement.

Ideally we'd have counters recording Icache and Dcache misses, but we don't have immediate plans for adding them.

@Siris-Li
Copy link
Author

Siris-Li commented Dec 2, 2022

Please publish your git repo and send a link to the branch used to run the above.

Hi! @alanvgreen @tcal-x
Here is my git repo.
I have written detailed README to guide you to recur my issue.
Thanks for your effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants