Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document branch relaxation #58

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nick-knight
Copy link
Contributor

@nick-knight nick-knight commented Sep 29, 2020

The toolchain currently implements various branch relaxations, but this behavior is undocumented. I think this is something the assembly programmer should be aware of.


Unconditional branches are implemented by the `j(al)?r?` pseudoinstructions.
(The underlying instructions are `jalr?`.)
The `j(al)?` targets can be any symbol or address.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it really be any symbol/address? I haven't tried them all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“Any” is indeed an overstatement, since the call/tail macros have a maximum displacement of roughly 2 GB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll read up on this and improve it.


Conditional branches are implemented by the `b(l|g)(t|e)(z|u)?` and `b(eq|ne)z?` pseudoinstructions.
(The underlying instructions are `b(lt|ge)u?` and `b(eq|ne)`.)
Again, the targets can be any symbol or address.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Same as above.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, except there’s a 1 MiB limit.

@nick-knight
Copy link
Contributor Author

I just saw that LLVM folks (including @jrtc27 and @asb) are pushing back against LLVM supporting conditional branch relaxation in assembly programs: https://reviews.llvm.org/D108961

It seems that I misunderstood that conditional branch relaxation is actually a feature of the RISC-V assembly language. I assumed it was --- that this was part of why beqz et al. are considered to be pseudo-instructions --- which motivated me to file this PR against riscv-asm-manual. If the community disagrees, then this PR is inappropriate and should be closed. (And my assembly codes are "broken", and should be rewritten.)

Copy link
Contributor

@jrtc27 jrtc27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw that LLVM folks (including @jrtc27 and @asb) are pushing back against LLVM supporting conditional branch relaxation in assembly programs: https://reviews.llvm.org/D108961

It seems that I misunderstood that conditional branch relaxation is actually a feature of the RISC-V assembly language. I assumed it was --- that this was part of why beqz et al. are considered to be pseudo-instructions --- which motivated me to file this PR against riscv-asm-manual. If the community disagrees, then this PR is inappropriate and should be closed. (And my assembly codes are "broken", and should be rewritten.)

Yes, the question is whether this is something that should be relied upon as a standard feature in RISC-V assembly or is an implementation-specific extension.

1:
```

The `bnez` is further relaxed to `bne`, while `j` is relaxed to `jal` with a relocation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not relaxations. These are pure aliases for specific forms (bne with a zero register, and jal with a zero register).

@jim-wilson
Copy link
Collaborator

A compiler can and should be computing instruction sizes and branch offsets, and then emitting the correct branch instructions. But this is an unreasonable request for hand written assembly code. The only reasonable approach is if the assembler can do this for the programmer.

GNU Binutils has been doing assembler branch relaxation isince the 1990s at least, and maybe earlier, with the Motorola 68000 port perhaps being the first one. And note that assembler branch relaxation is a differenent process from linker relaxation. Assembler branch relaxation may increase the size of code, and does not involve any relocations. Assembler branch relaxation is to let the assembly language programmer use the obvious simple branch instruction, and the assembler figures out how to translate that into actual valid target instructions. For CISC machines, this is usually choosing between short and long forms of the branch instruction. For RISC machine, this usually means emitting one or multiple instructions as necessary. There are dozens of GNU Binutils targets that have branch relaxation support. The RISC-V port works roughly the same as the MIPS port here. Though the MIPS port has extra complications due to delay slots and the branch likely bit, and hence can emit longer sequences than the RISC-V port does.

The LLVM approach is that relaxation is OK as an optimization, but should not be required for correct operation. I think this rule should only apply to linker relaxation. It should not apply to assembler branch relaxation, as this is necessary for humans to write readable assembly language programs.

@jrtc27
Copy link
Contributor

jrtc27 commented Sep 7, 2021

A compiler can and should be computing instruction sizes and branch offsets, and then emitting the correct branch instructions. But this is an unreasonable request for hand written assembly code. The only reasonable approach is if the assembler can do this for the programmer.

GNU Binutils has been doing assembler branch relaxation isince the 1990s at least, and maybe earlier, with the Motorola 68000 port perhaps being the first one. And note that assembler branch relaxation is a differenent process from linker relaxation. Assembler branch relaxation may increase the size of code, and does not involve any relocations. Assembler branch relaxation is to let the assembly language programmer use the obvious simple branch instruction, and the assembler figures out how to translate that into actual valid target instructions. For CISC machines, this is usually choosing between short and long forms of the branch instruction. For RISC machine, this usually means emitting one or multiple instructions as necessary. There are dozens of GNU Binutils targets that have branch relaxation support. The RISC-V port works roughly the same as the MIPS port here. Though the MIPS port has extra complications due to delay slots and the branch likely bit, and hence can emit longer sequences than the RISC-V port does.

The LLVM approach is that relaxation is OK as an optimization, but should not be required for correct operation. I think this rule should only apply to linker relaxation. It should not apply to assembler branch relaxation, as this is necessary for humans to write readable assembly language programs.

MIPS is a terrible example to use, its assembly language is full of pitfalls (.set (no)macro, .set (no)at) and is an example I would use of how not to do things, speaking from experience of working with an extension of MIPS for several years. But even MIPS (a) had it as an option (b) defaulted it to off.

Referring to ancient architectures like the 68000 isn't great either, the world is a very different place to then. The best comparison points are generally other contemporary architectures like AArch64, where, to the best of my knowledge, there is no such equivalent, even though it also has smaller immediates for its conditional branches than its unconditional ones.

Yes, that rule is about linker relaxation. It's unfortunate that the same term is used for opposites. I disagree that it's necessary for readable assembly language programs though; it's rare that it ever matters, and in the case that it does I would argue that it is more surprising that the branch gets relaxed, since the assembly as written does not correspond to the disassembly, and only in certain edge-cases.

@jrtc27
Copy link
Contributor

jrtc27 commented Sep 7, 2021

(and for hand-written assembly it's really rather easy: you ignore the problem entirely, or it doesn't even occur to you, and on the off-chance you end up writing something where a branch target is out of range you see the error and fix your code)

@jim-wilson
Copy link
Collaborator

(and for hand-written assembly it's really rather easy: you ignore the problem entirely, or it doesn't even occur to you, and on the off-chance you end up writing something where a branch target is out of range you see the error and fix your code)

Then later you modify the code again and the branch is in range again, and now your code is unnecessarily larger and slower because there is no easy way to notice when a branch falls back in range again. Hence it is best if the assembler does this for you.

You don't like m68000 and MIPS as examples. How about x86? GNU as will automatically rewrite branch instructions for you, but since this is a CISC, it is a choice between various short and long forms of branches.

ARMv7 conditional branches have 24-bits of offset. RISC-V conditional branches have 12-bits of offset. 24-bits is enough. You would have to write contrived code to exceed that. But 12-bits is not enough, and it is easy to write code that breaks. Hence ARMv7 does not need branch relaxation, but RISC-V does.

@nick-knight
Copy link
Contributor Author

I don't have the experience to feel comfortable taking a stand in this argument. I'd just like to share my team's use-case in case it helps color the discussion. Or perhaps someone can suggest a better approach.

We developed an assembly-code generator that emits tiled loop nests (from numerical methods) with various degrees of unrolling, and we want to use more compact branches where possible. The amount of unrolling is determined dynamically, based on how the generator is invoked. Currently the generator is designed to emit the simplest branches, which the GNU assembler handily expands as needed. Switching to LLVM, and observing different behavior, prompted an internal bug report (and this PR), which recently resulted in the aforementioned LLVM patch.

Conservatively using longer branches hurts performance unacceptably, so "fixing my code" means implementing code-size computation in my code generator. I acknowledge that writing code generators involves reinventing a lot of compiler wheels, but this was one I hoped to not have to, especially teaching my code generator how to detect compressible instructions and to expand things like li automatically.

@jrtc27
Copy link
Contributor

jrtc27 commented Sep 7, 2021

(and for hand-written assembly it's really rather easy: you ignore the problem entirely, or it doesn't even occur to you, and on the off-chance you end up writing something where a branch target is out of range you see the error and fix your code)

Then later you modify the code again and the branch is in range again, and now your code is unnecessarily larger and slower because there is no easy way to notice when a branch falls back in range again. Hence it is best if the assembler does this for you.

Yeah that can technically happen if you're on the edge. But that's so unlikely to happen, normally you're either way under or way over. And if you were happy to rewrite your code to use a long form the first time then clearly code size or performance wasn't a concern for that sequence as otherwise you should have rewritten it to avoid the issue. Plus I could make a similar argument that, with the binutils behaviour, you silently increase code size and instruction count rather than warn the developer that they might want to write their code in a more efficient manner, which makes things worse as you can't look at the assembly and know what the instruction count is going to be.

You don't like m68000 and MIPS as examples. How about x86? GNU as will automatically rewrite branch instructions for you, but since this is a CISC, it is a choice between various short and long forms of branches.

That's the key difference. It doesn't invert the condition and add a new instruction, it just uses a different form of the same instruction, so the disassembly still matches what you wrote. Code size goes up a bit but instruction count does not, which, unless you're thrashing in your I-cache/ITLB or are trying to squeeze your code into the smallest ROM possible, is the more important thing.

ARMv7 conditional branches have 24-bits of offset. RISC-V conditional branches have 12-bits of offset. 24-bits is enough. You would have to write contrived code to exceed that. But 12-bits is not enough, and it is easy to write code that breaks. Hence ARMv7 does not need branch relaxation, but RISC-V does.

I spoke of AArch64 not Armv7, where you only get a 19+2-bit immediate (19 encoded bits, plus 2 implied), and RISC-V has 13 bits since there's an implied 0, not 12 bits, though neither change things all that much. If you're writing code that exceeds a 13-bit offset within a function though I do start to question why on earth you're writing in assembly, that should be such a rare case.

@aswaterman
Copy link
Contributor

If you're writing code that exceeds a 13-bit offset within a function though I do start to question why on earth you're writing in assembly

@nick-knight did offer a legitimate (and not exactly rare) counterexample.

@jrtc27
Copy link
Contributor

jrtc27 commented Sep 7, 2021

If you're writing code that exceeds a 13-bit offset within a function though I do start to question why on earth you're writing in assembly

@nick-knight did offer a legitimate (and not exactly rare) counterexample.

The discussion had so far been about hand-written assembly, not machine-generated (which isn't writing assembly, it's generating it). I agree it's a useful datapoint, and perhaps suggests that a sensible approach would be to support the feature but have it be behind an off-by-default option. That way you avoid the surprise of writing one branch instruction and getting two, despite the instruction you wrote being a valid RISC-V instruction save for the immediate range, but still provide a way to support people who are hand-writing assembly that is up against the branch range limits and don't want to manually expand them, and minimalistic code generators that don't want to count instructions. I'd still rather it weren't supported at all, but I doubt I'd be able to get consensus on that...

@topperc
Copy link
Contributor

topperc commented Sep 7, 2021

That's the key difference. It doesn't invert the condition and add a new instruction, it just uses a different form of the same instruction, so the disassembly still matches what you wrote. Code size goes up a bit but instruction count does not, which, unless you're thrashing in your I-cache/ITLB or are trying to squeeze your code into the smallest ROM possible, is the more important thing.

If I recall correctly, binutils for X86 does have support for emitting two instructions prior to the 386. The Jcc opcodes with more than 1 bytes of displacement were added in the 386. Of course that code isn't really relevant these days, but it is a historical example.

@nick-knight
Copy link
Contributor Author

GAS for Xtensa also supports branch relaxation (on by default, with ways of disabling it), along with a few other nifty assembly-level relaxations:
https://sourceware.org/binutils/docs/as/Xtensa-Relaxation.html
Xtensa is less ancient than some of the other targets mentioned in this thread.

@asb
Copy link
Contributor

asb commented Sep 8, 2021

I just saw that LLVM folks (including @jrtc27 and @asb) are pushing back against LLVM supporting conditional branch relaxation in assembly programs: https://reviews.llvm.org/D108961

Just to clarify my position, I was mainly stating that I overall wish RISC-V ASM had adopted less "magic", though in general I go for matching binutils behaviour wherever possible. If GCC or other generators are producing code that needs this, I think it does make sense for LLVM to support it.

Given that the ship has already sailed on RISC-V assembly being somewhat magic, a .option strict or similar might be an interesting proposal for someone to pursue. I don't feel strongly enough about it to propose it, but I'd be a supporter. This option would enforce distances of branch targets, avoid auto-converting add to addi depending on its operands etc.

@MaskRay
Copy link

MaskRay commented Sep 8, 2021

I think that m68k and x86 (one insn) cannot be used as an example for RISC-V (more than one insn) to follow.

For m68k

  jeq .L0
  ...
.L0:

The assembler may pick either the 2-byte beqs or the 4-byte beqw (+-32KiB) or the 6-byte beql.
The number of instructions does not change.
If the user writes beqs, GNU as may error Error: value 128 out of range.

One can think of beqw as the one RISC-V currently has and jeq as a mnemonic recommended to an assembly-code generator which doesn't track branch distances.

I agree that we should avoid more magic and look ahead instead of looking behind. We don't necessarily use a magic directive like .option pic.
As a (not-so-good) reference: on x86, the user can force 32-bit displacement with either of the two forms (https://sourceware.org/binutils/docs/as/i386_002dMnemonics.html#i386_002dMnemonics):

je.d32 .L0       # deprecated
{disp32} je .L0  # pseudo prefix
.L0:

We can think of an assembler notation to make the user intention explicit.

Yes, the question is whether this is something that should be relied upon as a standard feature in RISC-V assembly or is an implementation-specific extension.

I think it's still viable that the GNU as behavior remains an implementation-specific extension.

@aswaterman
Copy link
Contributor

aswaterman commented Sep 8, 2021

This won’t come as a surprise, but I’m on the side of standardizing the GNU behavior, because it’s the pragmatic thing to do.

I regard this “magic” terminology as both pejorative and aloof. First of all, there’s nothing magic about it… we all understand it. Second, we clearly have compelling use cases for it. Purity is a virtue but isn’t the only one.

This x86 debate is a distraction. The fact is, RISC-V’s GNU port isn’t novel in this regard, but it’s true that it is more aggressive than most, because RISC-V benefits from this scheme disproportionately because of the constants involved.

@jrtc27
Copy link
Contributor

jrtc27 commented Sep 8, 2021

This won’t come as a surprise, but I’m on the side of standardizing the GNU behavior, because it’s the pragmatic thing to do.

I regard this “magic” terminology as both pejorative and aloof.

There's nothing inherently pejorative about magic. The term is used in many situations, not always negative. In this case the main point of the term here is not to say it's inherently bad but that it carries with it some amount of surprise.

First of all, there’s nothing magic about it… we all understand it.

We the ISA specifiers and toolchain developers do. I've never been concerned with whether you or I can understand it, because clearly everyone here does. But we understand all kinds of details and gotchas that your average user does not. My concern is with the average software developer who has been tasked with writing some amount of RISC-V assembly and doesn't have the deep understanding we do; that they will see a BEQ in the instruction set manual, write a BEQ in their source and not see a BEQ in the disassembly when they come to debug the thing but a BNE followed by a J(AL X0). That is certainly surprising to most people and may lead to confusion, but how much is unknown.

Second, we clearly have compelling use cases for it. Purity is a virtue but isn’t the only one.

This is where we differ. It's clearly a use case, but the debate is over (a) how compelling that is (b) whether that is sufficient to warrant the issues caused by additional surprise to inexperienced developers.

You are correct that I and some others here take a more purist approach to these things, but the difference in viewpoints should be regarded as a good thing, since it ensures that, whatever the conclusion of the discussion is, both sides of the debate have been well represented and that the conclusion isn't just "LLVM must blindly follow binutils" (the conclusion may still be to standardise the current binutils behaviour, and maybe even because the pragmatic approach of providing compatibility outweighs other concerns, but at least it will have been done after properly assessing things).

@asb
Copy link
Contributor

asb commented Sep 8, 2021

This won’t come as a surprise, but I’m on the side of standardizing the GNU behavior, because it’s the pragmatic thing to do.

That's generally my position too.

I regard this “magic” terminology as both pejorative and aloof. First of all, there’s nothing magic about it… we all understand it. Second, we clearly have compelling use cases for it. Purity is a virtue but isn’t the only one.

I see what you mean and I didn't mean to come across that way. I apologise.

@nick-knight
Copy link
Contributor Author

I appreciate this healthy debate. I would expect to see a similar tension between programmer luxuries and behavioral simplicity in the design of any programming language.

I would like to tease apart two aspects of this debate: whether the language offers support for branch relaxation, and whether assembler mnemonics like beq, which closely resemble strings in the ISA manual, can only map to the associated instructions in a 1:1 fashion. I'll admit I was surprised when I first saw add with a(n illegal) literal operand "magically" expand into a (legal) addi.

My use-case, which I acknowledge may not be compelling to everyone, is for reliable support of branch relaxation. However, I'm perfectly willing to use special mnemonics/pseudoinstructions/directives/etc. to achieve this behavior. Of course, such a compromise will inevitably require changes to binutils, meaning practical issues of doing engineering work, deprecating features, or breaking compatibility.

If we do not compromise, then it seems we will move toward having GNU and LLVM dialects of RISC-V assembly. Users (like me) of the GNU features will simply vote with our feet.

@jim-wilson
Copy link
Collaborator

A possible compromise is to create new mnemonics for the relaxable branches, e.g. instead of relaxing beq we could support a new name jeq which is relaxable, and beq is just a beq. However, I would worry about coordination with the architecture review team. I don't think that there are any software people on it, and there is no guarantee that they won't steal jeq from us later on. There is also the problem that there is a large existing base of RISC-V assembly language that assumes that beq is relaxable, and finding and fixing all of the code is not practical. So GCC would probably have to continue to relax beq and handle jeq same as beq, but llvm could only relax jeq, and people that want to use llvm instead of gcc can fix their code to use jeq instead of beq as they find problems.

@luismarques
Copy link
Contributor

In general, I feel that:

  • Both the relaxing and non-relaxing behaviour are useful, so IMO the best solution would make this configurable at some level.
  • I prefer an assembler directive to new mnemonics. There's no instruction churn, you can change the behaviour of a range of instructions as needed, etc.
  • Arguably this is a lesser point but ideally we would default to strict (non-relaxing). For compiler-generated code, the compiler would generate the directive to match whatever it wants the assembler to do, which would ensure cross-toolchain compatibility, even if there was a difference in defaults.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants