Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i3544 RV64: Enable memtrace-related tests #6506

Closed
wants to merge 5 commits into from

Conversation

ksco
Copy link
Contributor

@ksco ksco commented Dec 13, 2023

No description provided.

@ksco
Copy link
Contributor Author

ksco commented Dec 13, 2023

@abhinav92003 @derekbruening I'm trying to port memtrace-related tests, and with this PR, a simple reuse_distance command below is working as expected:

bin64/drrun -t drcachesim -simulator_type reuse_distance -- echo hello

But I have a few questions to ask:

  1. Besides normal load/stores, RISC-V has LR/SC pair (similar to exclusive monitors) and atomic (e.g. amoswap) instructions that also access memory. Currently, their memory address register is encoded as a normal register, should we encode them as base_disp opnd instead? I've implemented that for LR/SC in this PR, we should do the same for atomic instructions too, right? Atomic instructions like amoswap read and write memory from the same address, should we encode them as two separate opnds?
  2. I'm seeing this error: Basic block or trace instrumentation exceeded maximum size. Try lowering -max_bb_instrs and/or -max_trace_bbs. when trying to run gcc with reuse_distance. MAX_FRAGMENT_SIZE is very small (0x1000) on RISC-V, is there any way we can workaround this?

@derekbruening
Copy link
Contributor

  1. Besides normal load/stores, RISC-V has LR/SC pair (similar to exclusive monitors) and atomic (e.g. amoswap) instructions that also access memory. Currently, their memory address register is encoded as a normal register, should we encode them as base_disp opnd instead? I've implemented that for LR/SC in this PR, we should do the same for atomic instructions too, right? Atomic instructions like amoswap read and write memory from the same address, should we encode them as two separate opnds?

Yes, all memory accesses have to be memory operands to comply with the IR conventions which all the tools rely upon. Yes, a read-modify-write instruction has both a source and destination, even if identical.

  1. I'm seeing this error: Basic block or trace instrumentation exceeded maximum size. Try lowering -max_bb_instrs and/or -max_trace_bbs. when trying to run gcc with reuse_distance. MAX_FRAGMENT_SIZE is very small (0x1000) on RISC-V, is there any way we can workaround this?

Does RISC-V typically have longer basic blocks than other ISA's (conditional moves instead of branches maybe)?

All our internal uses set -max_bb_instrs to something like 256. Maybe that should be done on Github in the drcachesim launcher.

@ksco
Copy link
Contributor Author

ksco commented Dec 14, 2023

Does RISC-V typically have longer basic blocks than other ISA's (conditional moves instead of branches maybe)?

No, there are no conditional moves at least in standard RISC-V extensions. The size of the basic blocks is not much different from other arches:

<Stopping application /usr/bin/echo (2209)>
Number of basic blocks seen: 2833
               Maximum size: 52 instructions
               Average size:   5.4 instructions

I did some investigation, and the issue seems to come from the context switch of clean calls. For example, if I duplicate the dr_insert_clean_call() call in div.c a couple of times, this error can be reproduced due to the small size of MAX_FRAGMENT_SIZE.

@derekbruening
Copy link
Contributor

The max is hit on other arches with many clean calls in one large block. But I thought that was hundreds of clean calls, to reach the 64K max. Oh I see, it's just 4K on RISCV:

#ifdef AARCH64
/* On AArch64, TBNZ/TBZ has a range of +/- 32 KiB. */
enum { MAX_FRAGMENT_SIZE = 0x8000 };
#elif defined(RISCV64)
/* On RISCV64, direct branch has a range of +/- 4 KiB. */
enum { MAX_FRAGMENT_SIZE = 0x1000 };
#else
enum { MAX_FRAGMENT_SIZE = USHRT_MAX };
#endif

Hmm. What is the size of a (unoptimized) clean call on RISCV? How many fit in 4K?

@ksco
Copy link
Contributor Author

ksco commented Dec 15, 2023

What is the size of a (unoptimized) clean call on RISCV? How many fit in 4K?

Roughly speaking, there are 32 gprs and 32 fprs, so save + restore is 32 * 2 * 2 * 4 = 512, so 8 clean calls will surely exceed the limit. Ah, we could use c.ldsp/c.fldsp/c.sdsp/c.fsdsp to reduce the save/restore size by half, but is that enough for typical usage?

@derekbruening
Copy link
Contributor

Some users try to put in a clean call before every app instr, though we discourage that. So that use case would require -max_bb_instrs 15 or sthg. That's not that bad: there's no big downside to a limit like that for single blocks. For traces there is more of a cost; someone putting in that many clean calls should probably just turn off traces anyway as the clean call overhead means any trace speedup is overshadowed -- but a trace limit is harder to set by default.

Proposal:

  • Set -max_bb_instrs to 15 by default on RISC-V
  • Update that message about hitting the max to include "and/or set -disable_traces"
    ?

@ksco
Copy link
Contributor Author

ksco commented Dec 15, 2023

Thank you! The proposal look good to me. I’ll close this PR since it seems more suitable to separate the codec and clean call changes into two PRs.

@ksco ksco closed this Dec 15, 2023
ksco added a commit that referenced this pull request Jan 3, 2024
)

Atomic instructions from the A extension like LR/SC and AMO* access
memory so they should be treated as load/store. In addition, AMO*
instructions perform read-modify-write operations, they should have both
a source and a destination operands to comply with DR's IR convention.
We make the destination an implicit operand as they are identical to the
source.

Related PR: #6506
Issue: #3544
ksco added a commit that referenced this pull request Jan 6, 2024
tool.reuse_distance test is enabled by performing the following changes:

1. Set -max_bb_instrs to 15 by default on RISC-V
2. Use `c.ldsp/c.sdsp/c.fldsp/c.fsdsp` to reduce the size of context
switches for clean calls
3. Swap sc.w/d rd and mem operand positions so that mem becomes the
first operand to be consistent with AArch64
4. Implement `drutil_insert_get_mem_addr()`

Related PR: #6506
Issue: #3544
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants