Document branch relaxation #58

nick-knight · 2020-09-29T02:05:34Z

The toolchain currently implements various branch relaxations, but this behavior is undocumented. I think this is something the assembly programmer should be aware of.

nick-knight · 2021-06-18T23:50:32Z

riscv-asm.md

+
+Unconditional branches are implemented by the `j(al)?r?` pseudoinstructions.
+(The underlying instructions are `jalr?`.)
+The `j(al)?` targets can be any symbol or address.


Can it really be any symbol/address? I haven't tried them all.

“Any” is indeed an overstatement, since the call/tail macros have a maximum displacement of roughly 2 GB.

Thanks. I'll read up on this and improve it.

nick-knight · 2021-06-18T23:50:45Z

riscv-asm.md

+
+Conditional branches are implemented by the `b(l|g)(t|e)(z|u)?` and `b(eq|ne)z?` pseudoinstructions.
+(The underlying instructions are `b(lt|ge)u?` and `b(eq|ne)`.)
+Again, the targets can be any symbol or address.


(Same as above.)

Same here, except there’s a 1 MiB limit.

nick-knight · 2021-09-07T18:42:51Z

I just saw that LLVM folks (including @jrtc27 and @asb) are pushing back against LLVM supporting conditional branch relaxation in assembly programs: https://reviews.llvm.org/D108961

It seems that I misunderstood that conditional branch relaxation is actually a feature of the RISC-V assembly language. I assumed it was --- that this was part of why beqz et al. are considered to be pseudo-instructions --- which motivated me to file this PR against riscv-asm-manual. If the community disagrees, then this PR is inappropriate and should be closed. (And my assembly codes are "broken", and should be rewritten.)

jrtc27

I just saw that LLVM folks (including @jrtc27 and @asb) are pushing back against LLVM supporting conditional branch relaxation in assembly programs: https://reviews.llvm.org/D108961

It seems that I misunderstood that conditional branch relaxation is actually a feature of the RISC-V assembly language. I assumed it was --- that this was part of why beqz et al. are considered to be pseudo-instructions --- which motivated me to file this PR against riscv-asm-manual. If the community disagrees, then this PR is inappropriate and should be closed. (And my assembly codes are "broken", and should be rewritten.)

Yes, the question is whether this is something that should be relied upon as a standard feature in RISC-V assembly or is an implementation-specific extension.

jrtc27 · 2021-09-07T18:51:25Z

riscv-asm.md

+1:
+```
+
+The `bnez` is further relaxed to `bne`, while `j` is relaxed to `jal` with a relocation.


These are not relaxations. These are pure aliases for specific forms (bne with a zero register, and jal with a zero register).

jim-wilson · 2021-09-07T19:34:51Z

A compiler can and should be computing instruction sizes and branch offsets, and then emitting the correct branch instructions. But this is an unreasonable request for hand written assembly code. The only reasonable approach is if the assembler can do this for the programmer.

GNU Binutils has been doing assembler branch relaxation isince the 1990s at least, and maybe earlier, with the Motorola 68000 port perhaps being the first one. And note that assembler branch relaxation is a differenent process from linker relaxation. Assembler branch relaxation may increase the size of code, and does not involve any relocations. Assembler branch relaxation is to let the assembly language programmer use the obvious simple branch instruction, and the assembler figures out how to translate that into actual valid target instructions. For CISC machines, this is usually choosing between short and long forms of the branch instruction. For RISC machine, this usually means emitting one or multiple instructions as necessary. There are dozens of GNU Binutils targets that have branch relaxation support. The RISC-V port works roughly the same as the MIPS port here. Though the MIPS port has extra complications due to delay slots and the branch likely bit, and hence can emit longer sequences than the RISC-V port does.

The LLVM approach is that relaxation is OK as an optimization, but should not be required for correct operation. I think this rule should only apply to linker relaxation. It should not apply to assembler branch relaxation, as this is necessary for humans to write readable assembly language programs.

jrtc27 · 2021-09-07T19:48:11Z

A compiler can and should be computing instruction sizes and branch offsets, and then emitting the correct branch instructions. But this is an unreasonable request for hand written assembly code. The only reasonable approach is if the assembler can do this for the programmer.

GNU Binutils has been doing assembler branch relaxation isince the 1990s at least, and maybe earlier, with the Motorola 68000 port perhaps being the first one. And note that assembler branch relaxation is a differenent process from linker relaxation. Assembler branch relaxation may increase the size of code, and does not involve any relocations. Assembler branch relaxation is to let the assembly language programmer use the obvious simple branch instruction, and the assembler figures out how to translate that into actual valid target instructions. For CISC machines, this is usually choosing between short and long forms of the branch instruction. For RISC machine, this usually means emitting one or multiple instructions as necessary. There are dozens of GNU Binutils targets that have branch relaxation support. The RISC-V port works roughly the same as the MIPS port here. Though the MIPS port has extra complications due to delay slots and the branch likely bit, and hence can emit longer sequences than the RISC-V port does.

The LLVM approach is that relaxation is OK as an optimization, but should not be required for correct operation. I think this rule should only apply to linker relaxation. It should not apply to assembler branch relaxation, as this is necessary for humans to write readable assembly language programs.

MIPS is a terrible example to use, its assembly language is full of pitfalls (.set (no)macro, .set (no)at) and is an example I would use of how not to do things, speaking from experience of working with an extension of MIPS for several years. But even MIPS (a) had it as an option (b) defaulted it to off.

Referring to ancient architectures like the 68000 isn't great either, the world is a very different place to then. The best comparison points are generally other contemporary architectures like AArch64, where, to the best of my knowledge, there is no such equivalent, even though it also has smaller immediates for its conditional branches than its unconditional ones.

Yes, that rule is about linker relaxation. It's unfortunate that the same term is used for opposites. I disagree that it's necessary for readable assembly language programs though; it's rare that it ever matters, and in the case that it does I would argue that it is more surprising that the branch gets relaxed, since the assembly as written does not correspond to the disassembly, and only in certain edge-cases.

jrtc27 · 2021-09-07T19:53:40Z

(and for hand-written assembly it's really rather easy: you ignore the problem entirely, or it doesn't even occur to you, and on the off-chance you end up writing something where a branch target is out of range you see the error and fix your code)

jim-wilson · 2021-09-07T20:21:53Z

(and for hand-written assembly it's really rather easy: you ignore the problem entirely, or it doesn't even occur to you, and on the off-chance you end up writing something where a branch target is out of range you see the error and fix your code)

Then later you modify the code again and the branch is in range again, and now your code is unnecessarily larger and slower because there is no easy way to notice when a branch falls back in range again. Hence it is best if the assembler does this for you.

You don't like m68000 and MIPS as examples. How about x86? GNU as will automatically rewrite branch instructions for you, but since this is a CISC, it is a choice between various short and long forms of branches.

ARMv7 conditional branches have 24-bits of offset. RISC-V conditional branches have 12-bits of offset. 24-bits is enough. You would have to write contrived code to exceed that. But 12-bits is not enough, and it is easy to write code that breaks. Hence ARMv7 does not need branch relaxation, but RISC-V does.

nick-knight · 2021-09-07T20:31:00Z

I don't have the experience to feel comfortable taking a stand in this argument. I'd just like to share my team's use-case in case it helps color the discussion. Or perhaps someone can suggest a better approach.

We developed an assembly-code generator that emits tiled loop nests (from numerical methods) with various degrees of unrolling, and we want to use more compact branches where possible. The amount of unrolling is determined dynamically, based on how the generator is invoked. Currently the generator is designed to emit the simplest branches, which the GNU assembler handily expands as needed. Switching to LLVM, and observing different behavior, prompted an internal bug report (and this PR), which recently resulted in the aforementioned LLVM patch.

Conservatively using longer branches hurts performance unacceptably, so "fixing my code" means implementing code-size computation in my code generator. I acknowledge that writing code generators involves reinventing a lot of compiler wheels, but this was one I hoped to not have to, especially teaching my code generator how to detect compressible instructions and to expand things like li automatically.

jrtc27 · 2021-09-07T20:37:38Z

(and for hand-written assembly it's really rather easy: you ignore the problem entirely, or it doesn't even occur to you, and on the off-chance you end up writing something where a branch target is out of range you see the error and fix your code)

Then later you modify the code again and the branch is in range again, and now your code is unnecessarily larger and slower because there is no easy way to notice when a branch falls back in range again. Hence it is best if the assembler does this for you.

Yeah that can technically happen if you're on the edge. But that's so unlikely to happen, normally you're either way under or way over. And if you were happy to rewrite your code to use a long form the first time then clearly code size or performance wasn't a concern for that sequence as otherwise you should have rewritten it to avoid the issue. Plus I could make a similar argument that, with the binutils behaviour, you silently increase code size and instruction count rather than warn the developer that they might want to write their code in a more efficient manner, which makes things worse as you can't look at the assembly and know what the instruction count is going to be.

You don't like m68000 and MIPS as examples. How about x86? GNU as will automatically rewrite branch instructions for you, but since this is a CISC, it is a choice between various short and long forms of branches.

That's the key difference. It doesn't invert the condition and add a new instruction, it just uses a different form of the same instruction, so the disassembly still matches what you wrote. Code size goes up a bit but instruction count does not, which, unless you're thrashing in your I-cache/ITLB or are trying to squeeze your code into the smallest ROM possible, is the more important thing.

ARMv7 conditional branches have 24-bits of offset. RISC-V conditional branches have 12-bits of offset. 24-bits is enough. You would have to write contrived code to exceed that. But 12-bits is not enough, and it is easy to write code that breaks. Hence ARMv7 does not need branch relaxation, but RISC-V does.

I spoke of AArch64 not Armv7, where you only get a 19+2-bit immediate (19 encoded bits, plus 2 implied), and RISC-V has 13 bits since there's an implied 0, not 12 bits, though neither change things all that much. If you're writing code that exceeds a 13-bit offset within a function though I do start to question why on earth you're writing in assembly, that should be such a rare case.

aswaterman · 2021-09-07T20:41:51Z

If you're writing code that exceeds a 13-bit offset within a function though I do start to question why on earth you're writing in assembly

@nick-knight did offer a legitimate (and not exactly rare) counterexample.

jrtc27 · 2021-09-07T20:54:44Z

If you're writing code that exceeds a 13-bit offset within a function though I do start to question why on earth you're writing in assembly

@nick-knight did offer a legitimate (and not exactly rare) counterexample.

The discussion had so far been about hand-written assembly, not machine-generated (which isn't writing assembly, it's generating it). I agree it's a useful datapoint, and perhaps suggests that a sensible approach would be to support the feature but have it be behind an off-by-default option. That way you avoid the surprise of writing one branch instruction and getting two, despite the instruction you wrote being a valid RISC-V instruction save for the immediate range, but still provide a way to support people who are hand-writing assembly that is up against the branch range limits and don't want to manually expand them, and minimalistic code generators that don't want to count instructions. I'd still rather it weren't supported at all, but I doubt I'd be able to get consensus on that...

topperc · 2021-09-07T21:17:37Z

That's the key difference. It doesn't invert the condition and add a new instruction, it just uses a different form of the same instruction, so the disassembly still matches what you wrote. Code size goes up a bit but instruction count does not, which, unless you're thrashing in your I-cache/ITLB or are trying to squeeze your code into the smallest ROM possible, is the more important thing.

If I recall correctly, binutils for X86 does have support for emitting two instructions prior to the 386. The Jcc opcodes with more than 1 bytes of displacement were added in the 386. Of course that code isn't really relevant these days, but it is a historical example.

nick-knight · 2021-09-07T21:49:06Z

GAS for Xtensa also supports branch relaxation (on by default, with ways of disabling it), along with a few other nifty assembly-level relaxations:
https://sourceware.org/binutils/docs/as/Xtensa-Relaxation.html
Xtensa is less ancient than some of the other targets mentioned in this thread.

asb · 2021-09-08T13:21:41Z

I just saw that LLVM folks (including @jrtc27 and @asb) are pushing back against LLVM supporting conditional branch relaxation in assembly programs: https://reviews.llvm.org/D108961

Just to clarify my position, I was mainly stating that I overall wish RISC-V ASM had adopted less "magic", though in general I go for matching binutils behaviour wherever possible. If GCC or other generators are producing code that needs this, I think it does make sense for LLVM to support it.

Given that the ship has already sailed on RISC-V assembly being somewhat magic, a .option strict or similar might be an interesting proposal for someone to pursue. I don't feel strongly enough about it to propose it, but I'd be a supporter. This option would enforce distances of branch targets, avoid auto-converting add to addi depending on its operands etc.

MaskRay · 2021-09-08T17:08:18Z

I think that m68k and x86 (one insn) cannot be used as an example for RISC-V (more than one insn) to follow.

For m68k

  jeq .L0
  ...
.L0:

The assembler may pick either the 2-byte beqs or the 4-byte beqw (+-32KiB) or the 6-byte beql.
The number of instructions does not change.
If the user writes beqs, GNU as may error Error: value 128 out of range.

One can think of beqw as the one RISC-V currently has and jeq as a mnemonic recommended to an assembly-code generator which doesn't track branch distances.

I agree that we should avoid more magic and look ahead instead of looking behind. We don't necessarily use a magic directive like .option pic.
As a (not-so-good) reference: on x86, the user can force 32-bit displacement with either of the two forms (https://sourceware.org/binutils/docs/as/i386_002dMnemonics.html#i386_002dMnemonics):

je.d32 .L0       # deprecated
{disp32} je .L0  # pseudo prefix
.L0:

We can think of an assembler notation to make the user intention explicit.

Yes, the question is whether this is something that should be relied upon as a standard feature in RISC-V assembly or is an implementation-specific extension.

I think it's still viable that the GNU as behavior remains an implementation-specific extension.

aswaterman · 2021-09-08T17:18:15Z

This won’t come as a surprise, but I’m on the side of standardizing the GNU behavior, because it’s the pragmatic thing to do.

I regard this “magic” terminology as both pejorative and aloof. First of all, there’s nothing magic about it… we all understand it. Second, we clearly have compelling use cases for it. Purity is a virtue but isn’t the only one.

This x86 debate is a distraction. The fact is, RISC-V’s GNU port isn’t novel in this regard, but it’s true that it is more aggressive than most, because RISC-V benefits from this scheme disproportionately because of the constants involved.

jrtc27 · 2021-09-08T17:31:15Z

This won’t come as a surprise, but I’m on the side of standardizing the GNU behavior, because it’s the pragmatic thing to do.

I regard this “magic” terminology as both pejorative and aloof.

There's nothing inherently pejorative about magic. The term is used in many situations, not always negative. In this case the main point of the term here is not to say it's inherently bad but that it carries with it some amount of surprise.

First of all, there’s nothing magic about it… we all understand it.

We the ISA specifiers and toolchain developers do. I've never been concerned with whether you or I can understand it, because clearly everyone here does. But we understand all kinds of details and gotchas that your average user does not. My concern is with the average software developer who has been tasked with writing some amount of RISC-V assembly and doesn't have the deep understanding we do; that they will see a BEQ in the instruction set manual, write a BEQ in their source and not see a BEQ in the disassembly when they come to debug the thing but a BNE followed by a J(AL X0). That is certainly surprising to most people and may lead to confusion, but how much is unknown.

Second, we clearly have compelling use cases for it. Purity is a virtue but isn’t the only one.

This is where we differ. It's clearly a use case, but the debate is over (a) how compelling that is (b) whether that is sufficient to warrant the issues caused by additional surprise to inexperienced developers.

You are correct that I and some others here take a more purist approach to these things, but the difference in viewpoints should be regarded as a good thing, since it ensures that, whatever the conclusion of the discussion is, both sides of the debate have been well represented and that the conclusion isn't just "LLVM must blindly follow binutils" (the conclusion may still be to standardise the current binutils behaviour, and maybe even because the pragmatic approach of providing compatibility outweighs other concerns, but at least it will have been done after properly assessing things).

asb · 2021-09-08T17:39:25Z

This won’t come as a surprise, but I’m on the side of standardizing the GNU behavior, because it’s the pragmatic thing to do.

That's generally my position too.

I regard this “magic” terminology as both pejorative and aloof. First of all, there’s nothing magic about it… we all understand it. Second, we clearly have compelling use cases for it. Purity is a virtue but isn’t the only one.

I see what you mean and I didn't mean to come across that way. I apologise.

nick-knight · 2021-09-08T18:03:47Z

I appreciate this healthy debate. I would expect to see a similar tension between programmer luxuries and behavioral simplicity in the design of any programming language.

I would like to tease apart two aspects of this debate: whether the language offers support for branch relaxation, and whether assembler mnemonics like beq, which closely resemble strings in the ISA manual, can only map to the associated instructions in a 1:1 fashion. I'll admit I was surprised when I first saw add with a(n illegal) literal operand "magically" expand into a (legal) addi.

My use-case, which I acknowledge may not be compelling to everyone, is for reliable support of branch relaxation. However, I'm perfectly willing to use special mnemonics/pseudoinstructions/directives/etc. to achieve this behavior. Of course, such a compromise will inevitably require changes to binutils, meaning practical issues of doing engineering work, deprecating features, or breaking compatibility.

If we do not compromise, then it seems we will move toward having GNU and LLVM dialects of RISC-V assembly. Users (like me) of the GNU features will simply vote with our feet.

jim-wilson · 2021-09-13T22:32:13Z

A possible compromise is to create new mnemonics for the relaxable branches, e.g. instead of relaxing beq we could support a new name jeq which is relaxable, and beq is just a beq. However, I would worry about coordination with the architecture review team. I don't think that there are any software people on it, and there is no guarantee that they won't steal jeq from us later on. There is also the problem that there is a large existing base of RISC-V assembly language that assumes that beq is relaxable, and finding and fixing all of the code is not practical. So GCC would probably have to continue to relax beq and handle jeq same as beq, but llvm could only relax jeq, and people that want to use llvm instead of gcc can fix their code to use jeq instead of beq as they find problems.

luismarques · 2021-09-16T16:25:33Z

In general, I feel that:

Both the relaxing and non-relaxing behaviour are useful, so IMO the best solution would make this configurable at some level.
I prefer an assembler directive to new mnemonics. There's no instruction churn, you can change the behaviour of a range of instructions as needed, etc.
Arguably this is a lesser point but ideally we would default to strict (non-relaxing). For compiler-generated code, the compiler would generate the directive to match whatever it wants the assembler to do, which would ensure cross-toolchain compatibility, even if there was a difference in defaults.

Document branch relaxation

de240ee

nick-knight commented Jun 18, 2021

View reviewed changes

jrtc27 reviewed Sep 7, 2021

View reviewed changes

jrtc27 mentioned this pull request Dec 7, 2022

[RISCV] BranchRelaxation calculates overly conservative size when a branch branches over compressible branch instructions llvm/llvm-project#56448

Open

nick-knight mentioned this pull request Jan 31, 2023

The GAS assembler implicitly converts near->far branches #83

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document branch relaxation #58

Document branch relaxation #58

nick-knight commented Sep 29, 2020 •

edited

Loading

nick-knight Jun 18, 2021

aswaterman Jun 18, 2021

nick-knight Jun 19, 2021

nick-knight Jun 18, 2021

aswaterman Jun 18, 2021

nick-knight commented Sep 7, 2021

jrtc27 left a comment

jrtc27 Sep 7, 2021

jim-wilson commented Sep 7, 2021

jrtc27 commented Sep 7, 2021 •

edited

Loading

jrtc27 commented Sep 7, 2021

jim-wilson commented Sep 7, 2021

nick-knight commented Sep 7, 2021

jrtc27 commented Sep 7, 2021

aswaterman commented Sep 7, 2021

jrtc27 commented Sep 7, 2021

topperc commented Sep 7, 2021

nick-knight commented Sep 7, 2021

asb commented Sep 8, 2021

MaskRay commented Sep 8, 2021 •

edited

Loading

aswaterman commented Sep 8, 2021 •

edited

Loading

jrtc27 commented Sep 8, 2021

asb commented Sep 8, 2021

nick-knight commented Sep 8, 2021

jim-wilson commented Sep 13, 2021

luismarques commented Sep 16, 2021

Document branch relaxation #58

Are you sure you want to change the base?

Document branch relaxation #58

Conversation

nick-knight commented Sep 29, 2020 • edited Loading

nick-knight Jun 18, 2021

Choose a reason for hiding this comment

aswaterman Jun 18, 2021

Choose a reason for hiding this comment

nick-knight Jun 19, 2021

Choose a reason for hiding this comment

nick-knight Jun 18, 2021

Choose a reason for hiding this comment

aswaterman Jun 18, 2021

Choose a reason for hiding this comment

nick-knight commented Sep 7, 2021

jrtc27 left a comment

Choose a reason for hiding this comment

jrtc27 Sep 7, 2021

Choose a reason for hiding this comment

jim-wilson commented Sep 7, 2021

jrtc27 commented Sep 7, 2021 • edited Loading

jrtc27 commented Sep 7, 2021

jim-wilson commented Sep 7, 2021

nick-knight commented Sep 7, 2021

jrtc27 commented Sep 7, 2021

aswaterman commented Sep 7, 2021

jrtc27 commented Sep 7, 2021

topperc commented Sep 7, 2021

nick-knight commented Sep 7, 2021

asb commented Sep 8, 2021

MaskRay commented Sep 8, 2021 • edited Loading

aswaterman commented Sep 8, 2021 • edited Loading

jrtc27 commented Sep 8, 2021

asb commented Sep 8, 2021

nick-knight commented Sep 8, 2021

jim-wilson commented Sep 13, 2021

luismarques commented Sep 16, 2021

nick-knight commented Sep 29, 2020 •

edited

Loading

jrtc27 commented Sep 7, 2021 •

edited

Loading

MaskRay commented Sep 8, 2021 •

edited

Loading

aswaterman commented Sep 8, 2021 •

edited

Loading