Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add %lpad_hash for Zicfilp #99

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Add %lpad_hash for Zicfilp #99

wants to merge 1 commit into from

Conversation

kito-cheng
Copy link
Collaborator

NOTE: This PR will keep in draft state until toolchain PoC and psABI spec ready.


Zicfilp has provided two labeling schemes: simple and complex (also known as function signature-based). The simple scheme uses an lpad with a constant 0, which does not require any hashing mechanism. In contrast, the complex labeling scheme computes the MD5 hash from the signature string.

Filling up an MD5 hash value is straightforward for compilers, but it is non-trivial work for humans to maintain. Therefore, we have added new assembler modifiers to compute this value.

See also: riscv/riscv-cfi#151

Zicfilp has provided two labeling schemes: simple and complex (also known as
function signature-based). The simple scheme uses an lpad with a constant 0,
which does not require any hashing mechanism. In contrast, the complex
labeling scheme computes the MD5 hash from the signature string.

Filling up an MD5 hash value is straightforward for compilers, but it is
non-trivial work for humans to maintain. Therefore, we have added new assembler
modifiers to compute this value.

See also: riscv/riscv-cfi#151
@@ -397,9 +397,10 @@ linker relaxation accidentally if user already disable linker relaxation.

Push/pop current options to/from the options stack.

## Assembler Relocation Functions
## Assembler Modifiers
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mylai-mtk
Copy link

Actually, I wonder if this would turn out to be useful, since the function signature string format is (for now) drafted to be the mangled string of the function type, which IMHO is not human-friendly enough to be written/read correctly by a human programmer. Also, AFAIK, there's no convenient tool to mangle function signatures to strings, despite the widespread usage of c++filt for demangling. Under this impression, I doubt there would be people trying use this assembly modifier due to the difficulty of producing those function signature strings.

To get a feel: lpad %lpad_hash("FiiPPcE") vs lpad 0xe088e. While the %lpad_hash() form may be easier to read for someone familiar with the mangling rule (this advantage may be useful in the rare scenario of reviewing assemblies by compiler experts), I don't think it's comprehensible for average assembly developers, not to mention to write it out without the help of tools.

Though I doubt the usefulness of this %lpad_hash() modifier, I do agree that we need a method to ease the pain of obtaining correct label values. Here I propose a possible, but not really ideal method: In my own toolchain prototyping process, I emit symbols containing the label values for all C/C++ functions, so I can easily extract the resulting compiler-generated labels by looking into symbol tables. This emission was originally intended to facilitate function-signature-based PLT generation in linkers, but it turns out that I use it a lot to when patching musl libc assemblies with lpad insns. This "compile-then-inspect-binary" approach is far from straightforward and beautiful, but at least I can trust the values obtained to be correct, if I don't make a mistake when copying them 😜

@ved-rivos
Copy link

ved-rivos commented May 17, 2024

While the %lpad_hash() form may be easier to read for someone familiar with the mangling rule (this advantage may be useful in the rare scenario of reviewing assemblies by compiler experts), I don't think it's comprehensible for average assembly developers, not to mention to write it out without the help of tools.

Should we add a modifier that takes the function prototype string instead of mangled string as input - like %lpad_hash_proto("void (*f)(int, char)").

@mylai-mtk
Copy link

Should we add a modifier that takes the function prototype string instead of mangled string as input - like %lpad_hash_proto("void (*f)(int, char)").

I guess this would be a huge effort for assemblers to implement, since they do not know anything about the C/C++ language, so the corresponding C parser and C++ mangler would need to be pulled in, which is a big cost for a minor convenient feature like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants