Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make literal collection more precise #247

Open
josharian opened this issue May 19, 2019 · 7 comments
Open

make literal collection more precise #247

josharian opened this issue May 19, 2019 · 7 comments

Comments

@josharian
Copy link
Collaborator

I have work in progress improving literal collection. This issue is to discuss design decisions in advance of sending PRs.

  • The current design converts int literals to strings during go-fuzz-build. I'd like to change that, so that the metadata json contains strings and ints, and do the int-to-string conversion lazily on the go-fuzz side. This gives us flexibility about encodings (little-endian, big-endian, varint, ascii, hex) without having to decode and re-encode. Step one would be no behavioral changes but simply moving the conversion. Thoughts or concerns?

  • The current design encodes ints in the smallest number of bytes possible. Thus a uint64 with value 1 gets encoded as a uint8. Now that we use go/packages, we have type information available, so we could encode that 1 as a uint64. Is that preferable? It might mean having multiple 1s of various widths, but it might also increase the chance of matching the underlying structure of the program. It would also mean having to track more precise type in the metadata.

That's a start. I may add questions as I work on the PRs.

cc @dvyukov

@dvyukov
Copy link
Owner

dvyukov commented May 20, 2019

The current design converts int literals to strings during go-fuzz-build. I'd like to change that, so that the metadata json contains strings and ints, and do the int-to-string conversion lazily on the go-fuzz side.

No concerns.
But we should mostly ignore declared literal type I think. This means that if it's a string, but is actually an integer/float, we should encode it as integer/float if we are going to rely on that type during fuzzing in any way.

@dvyukov
Copy link
Owner

dvyukov commented May 20, 2019

The current design encodes ints in the smallest number of bytes possible.

Why can it increase chances of matching the underlying structure of the program? I think we should ignore the exact type in the program. This means that if we have, say int16(42), we should consider we actually have all of int64(42), int32(42), int16(42) and int8(42). It means there is little point in storing more than 1 version of 42 in the file. What am I missing?

@josharian
Copy link
Collaborator Author

This means that if it's a string, but is actually an integer/float, we should encode it as integer/float if we are going to rely on that type during fuzzing in any way.

I don't understand what this means. Can you expand or give an example?

@josharian
Copy link
Collaborator Author

This means that if we have, say int16(42), we should consider we actually have all of int64(42), int32(42), int16(42) and int8(42).

Sounds good to me. This significantly increases the number of literals, but that's ok.

@dvyukov
Copy link
Owner

dvyukov commented May 21, 2019

Sounds good to me. This significantly increases the number of literals, but that's ok.

I think we should not put them all into the file. There is no point. We should just apply transformations at runtime as if they all are there.

@dvyukov
Copy link
Owner

dvyukov commented May 21, 2019

This means that if it's a string, but is actually an integer/float, we should encode it as integer/float if we are going to rely on that type during fuzzing in any way.

I don't understand what this means. Can you expand or give an example?

I mean that the set of transformations we apply to a literal should not depend on the spelled type of the literal. So int8(42), int64(42) and "42" should be transformed the same say.

@dvyukov
Copy link
Owner

dvyukov commented May 21, 2019

I am not sure if this literal collection is a good idea at all.
The alternative would be to extract constants from comparison operations at runtime. And this way we (1) extract only the ones that are actually used (rather then thousands of uninteresting literals that just happen to be in some dependencies, or even if they are relevant may be we have not yet get to the part of the program that uses them); (2) is may simplify integration with some build systems, in some contexts; this .zip artifact is a bit weird; if we have just a binary, it would be much more normal output of a build system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants