Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does:
qemu_ld_helpers_panda
based onpanda_use_memcb
helper_[endian]_[ld/st]_name
functions into the inline static functions of the formhelper_[endian]_[ld/st]_name_inner
(and changes internal calls to this form as well)helper_[endian]_[ld/st]_name
functions with PANDA equivalent that checkspanda_use_memcb
at runtimehelper_[endian]_[ld/st]_name_panda
functions for compatibility.The issue:
See #1119 for context.
A bug was reported that indicated that the x86 instruction
cmpxchg
(32 or 64-bit versions) did not generate memory callbacks. The examination in the PR is somewhat different from my analysis. First, the relevant functions are not:because when generating for single core we generate
_unlocked
versions.panda/target/i386/translate.c
Lines 5269 to 5273 in 0c15599
The unlocked versions use
cpu_ldst
functions like @AndrewFasano mentioned.panda/target/i386/mem_helper.c
Lines 28 to 52 in 0c15599
It seems to me that there are two problems:
helper_[endian]_st[size]_mmu_panda
are only replaced herepanda/tcg/i386/tcg-target.inc.c
Lines 1223 to 1242 in d2ee3d1
ld[size]_p
is called directly on translated memory. https://github.com/panda-re/panda/blob/dev/include/exec/cpu_ldst_template.h#L105-L106Advantages
cpu_ldst
functions, but also includes direct calls from other instruction such as:panda/target/i386/fpu_helper.c
Lines 72 to 80 in 6ed3f19
panda/target/i386/mpx_helper.c
Lines 138 to 148 in 6ed3f19
panda/target/i386/mpx_helper.c
Lines 105 to 120 in 6ed3f19
panda/target/ppc/mem_helper.c
Lines 56 to 66 in 6ed3f19
Potential issues
panda_use_memcb
for every memory access. The function makes the case for not using memory callbacks as fast as possible. It inlines the internal version, but it's still a change that could impact performance.panda/softmmu_template.h
Lines 536 to 543 in 6ed3f19
Design decisions
Instead of replacing
helper_[endian]_[ld/st]_name
with their PANDA equivalents and moving the old to the_internal
versions I could have instead changed thecpu_[ld/st][size]_data_ret
functions to use the_panda
versions based on the status ofpanda_use_memcb
. I chose not to because I think covering the default case ofhelper_[endian]_[ld/st]_name
with the PANDA version covers far more cases we may have not considered. I also wanted the ability to dynamically enable/disable memory callbacks. If it turns out that that is hugely detrimental to performance I'm happy to consider other options.Coverage decisions
This PR changes the PANDA model to cover virtual memory read/writes from all system mmu modes. This means every time data is written/read in a helper it will be recorded. Examples include #1179 and #1119. However, it also includes things not previously considered part of the model such as reads made by the FPU, register store/restore, segment reads, and SIMD.
Performance implications
Does this PR make a difference compare to two systems with
memcb
disabled?The performance difference here is almost indistinguishable.
Does this PR make a difference compare to two systems with
memcb
enabled?The relatively minimal performance impact here seems to result from additional reads/writes being hit.
In a relatively short recording on x86 there were about 1500 more calls out of around 7.3 million total calls.
In an ARM recording there were exactly the same number of calls. ARM doesn't seem to use these calls.
CLOSES: #1119