Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-generate benchmarks with genthat, and run them #22

Open
wants to merge 82 commits into
base: master
Choose a base branch
from

Conversation

vogr
Copy link
Contributor

@vogr vogr commented Aug 13, 2021

(Note: this PR requires a patched version of genthat, see PRL-PRG/genthat#162)

This PR makes it possible to extract benchmarks from CRAN packages using genthat, to automatically generate the necessary fields in rebench.conf (including the number of inner-iterations per benchmarks).

All of this can be done in a reproducible way using MRAN (pinned to the day 2020-02-28, one day before the release of R 3.6.3 so that all the packages are compatible with R 3.6.2).

The steps necessary to generate and run the benchmarks are:

  1. Install dependencies, install the packages from which to extract calls (defined in packages.txt): see RBenchmarking/Setup/genthat/README.md (install_genthat.R, install_pkgs.R, extract_testcases.R). All these steps can be automated using the docker image built from Setup/genthat/Dockerfile (details in the README). You should then copy the benchmarks to Benchmarks/genthat-CRAN/generated
    • optionally: check that the results are stable over several iterations (check_against_recorded_retv.sh)
    • optionally: decrease the number of benchmarks by picking a single file per function in each package (pick_one_testcase.sh)
    • optionally (but recommended): compute the number of iterations necessary for each benchmark so that they all run for 200ms with R, as a baseline (with min_nb_iter.R, see Setup/genthat/inner_it/README.md)
  2. Run the benchmarks using rebench. Everything is automated in the file Setup/run.sh, like for the other benchmarks. The configuration file will be generated by genthat_rebenchconf.py from the name of the benchmarks and the number of iterations determined in the previous step.

You do not have to actually run step 1: the generated files are already saved in the repo (as their generation takes a large amount of time, and because it makes sense to "freeze" the benchmarks for reproducibility).

To actually run step 2, the Docker image used for rebench needs some modification (modified from container/benchmark/Dockerfile in https://github.com/reactorlabs/rir) :

ARG CI_COMMIT_SHA
FROM registry.gitlab.com/rirvm/rir_mirror:$CI_COMMIT_SHA
ENV R_LIBS="/opt/r_library"
ENV PATH="$PATH:/opt/rir/external/custom-r/bin"
RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y -qq python3-pip sudo && \
    apt-get clean && rm -rf /var/cache/apt/lists && \
    git clone --depth 1 https://github.com/smarr/ReBench.git /opt/ReBench && cd /opt/ReBench && pip3 install . && \
    mv /usr/local/bin/rebench-denoise /usr/local/bin/rebench-denoise.bkp && cp /usr/bin/false /usr/local/bin/rebench-denoise
RUN git clone --depth 10 https://github.com/vogr/RBenchmarking.git /opt/rbenchmarking && cd /opt/rbenchmarking && git checkout 12573c102bac99b644ea89ec3d59acde129d7b37
RUN /opt/rbenchmarking/Setup/genthat/install_pkgs.R /opt/rbenchmarking/Setup/genthat/packages.txt /opt/r_library

The two last lines were modified: use the modified RBenchmarking branch, and install the R packages necessary to run the benchmarks. Also set R_LIBS accordingly (alternatively, the default folders could be used, but I wanted to prevent collisions with other potential R packages).

To actually run the benchmarks:

# update CI_COMMIT_SHA to match the rir version you want to use
$ docker build -t rir-rebench --build-arg CI_COMMIT_SHA=b3e7e854cc78fa42b6b1748effcd0586e00b9881 .

# run a transient container
$ docker run --rm -it rir-rebench bash

# run only the genthat benchmarks, with Rsh, and don't do reporting
$ /opt/rbenchmarking/Setup/run.sh /opt/rbenchmarking/rebench.conf /opt/rbenchmarking/Benchmarks /opt/rir/build/release "e:PIR-LLVM s:genthat-CRAN -R"

The next step would be to actually decide which packages to extract calls from (currenty only 8 packages were chosen, at random), and to select a relevant subset of the generated file (currently, one file has been kept per function, to have a total of 51 files).

Note: if the PR PRL-PRG/genthat#162 gets merged into master, the script install_genthat.R should be updated to install genthat from the master branch instead of the only-calls branch.

vogr added 30 commits August 3, 2021 14:45
vogr added 29 commits August 10, 2021 10:18
This prevent problems with benchmark named a<-b.R for instance.
the genthat-CRAN directory (else they would be detected by the
configuration scripts).
@o-
Copy link
Contributor

o- commented Aug 16, 2021

very cool. thanks a lot @vogr. do you know how long it takes to run the full thing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants