Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix possible UB in reader (#186) #203

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gerritsangel
Copy link

This fixes the possible undefined behaviour in Reader::from_reader() (See #186 ).

This causes a small runtime overhead due to the explicit initialization, but as far as I understood this method is only called by the user explicitly, so it may not matter so much.

@snproj
Copy link
Collaborator

snproj commented Nov 9, 2022

Performance hit

Hi! Could I ask if you've run any tests to see what the performance hit is like?

I've run some quick tests (code further below) and it seems like:

  • running in debug mode gives a 67% increase in time taken
  • bench gives about a 90% increase in time taken
  • release build gives around 90% increase in time taken

With different opt-levels:

  • 0: 80%
  • 1: 90%
  • 2: 93% (consistently higher than 1)
  • 3: back to 90%
Test Code

Jammed rather haphazardly into perftest; alt_from_reader() is the proposed changed version:

{
    const ITER_COUNT: u126 = 100000;

    for _ in 0..100 {
        let start = Instant::now();
        for _ in 0..ITER_COUNT {
            let mut reader = Reader::from_reader(&mut &*data, data.len()).unwrap();
        }
        let time_taken = (Instant::now() - start).as_nanos() / ITER_COUNT;

        let start2 = Instant::now();
        for _ in 0..ITER_COUNT {
            let mut reader2 = Reader::alt_from_reader(&mut &*data, data.len()).unwrap();
        }
        let time_taken2 = (Instant::now() - start).as_nanos() / ITER_COUNT;

        println!("{} vs {}: the new method takes {}% more time", time_taken, time_taken2, ((time_taken2 as f64/time_taken as f64)-1.0)*100.0);
    }
}

Will the performance hit matter?

It seems that the purpose of the function is to create the buffer from which we wish to read. Hence, regardless of how many times the function is called, the time spent zero-initializing would scale with the amount of data being read (whether that be streaming and calling this function multiple times, or as one big chunk, calling this function on one large piece of data). Depending on the particular use case of the user, I worry that such a performance hit might indeed be impactful. (Do correct me if I'm wrong, though.)

Possible resolutions

I've discussed within my org and we're thinking of a few possible ways to proceed:

What do you / anyone else think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants