README.html

<p><a href="https://doi.org/10.5281/zenodo.7342082"><img
src="https://zenodo.org/badge/DOI/10.5281/zenodo.7342082.svg"
alt="DOI" /></a></p>
<h1 id="signatr-artifact">Signatr Artifact</h1>
<p><strong>We also provide pdf and html versions of this README. If
reading locally and not on <a
href="https://github.com/PRL-PRG/sle22-signatr-artifact">github</a>, we
advise to use the html version.</strong></p>
<p>The artifact contains the <code>signatr</code> tool, and the
pipelines to create an R value database and to fuzz R functions with the
database to find type signatures. The pipeline to create a value
database is in <code>pipeline-dbgen</code>. The fuzzing pipeline will
generate the inputs for the <code>sle.Rmd</code> R markdown notebook.
That notebook can then be rendered to get all the results (tables,
figures) we use in the paper.</p>
<p>To use the artifact:</p>
<ol type="1">
<li>Install the docker image (see <a
href="#install-the-docker-image">Install the docker image</a>).
Installing locally is possible but involved. Following the steps
described in the <code>docker-image/Dockerfile</code> should help if
this is the hard path you are choosing!</li>
<li>Experiment with the tool on a small example: see <a
href="#experimenting-with-the-tool">Experimenting the tool</a></li>
<li>Reproduce the analysis pipeline: see <a
href="#the-analysis-pipeline">The analysis pipeline</a></li>
</ol>
<p>The tool is packaged as an R library. It is hosted at <a
href="https://github.com/PRL-PRG/signatr">https://github.com/PRL-PRG/signatr</a>.</p>
<p>The artifact is also provided directly on github:
https://github.com/PRL-PRG/sle22-signatr-artifact</p>
<p>You can get it by entering the following commands in a shell:</p>
<div class="sourceCode" id="cb1"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> git clone git@github.com:PRL-PRG/sle22-signatr-artifact.git</span></code></pre></div>
<h2 id="install-the-docker-image">Install the docker image</h2>
<p>Go in the artifact’s folder:</p>
<div class="sourceCode" id="cb2"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> cd sle22-signatr-artifact</span></code></pre></div>
<p>To install the docker image, you can:</p>
<ul>
<li>pull the docker image with
<code>docker pull prlprg/sle22-signatr</code>, or</li>
<li>build the docker image (it takes time!):</li>
</ul>
<div class="sourceCode" id="cb3"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> cd docker-image</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> make</span></code></pre></div>
<p>After installing the docker image, <strong>make sure</strong> to run
all the following commands in a shell inside the docker image (for
Linux, macOS) from the artifact directory.</p>
<p>To start the docker image, go back to the root directory of the
artifact (<code>sle22-signatr-artifact/</code>) and enter in a
shell:</p>
<div class="sourceCode" id="cb4"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ex">./enter.sh</span></span></code></pre></div>
<p>which should give you a bash shell prompt, like (modulo the
hostname):</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode sh"><code class="sourceCode bash"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="ex">r@eaf63037fd02:/work$</span></span></code></pre></div>
<p>It automatically mounts the content of the folder from which you run
the command into the <code>/work</code> directory in the container.</p>
<p>If you see an output like:</p>
<pre><code>Starting Xvfb...
There is something wrong with the Xvfb server.</code></pre>
<p>try to run it <code>NO_X11</code> environment variable set:</p>
<pre><code>NO_X11=1 ./enter.sh</code></pre>
<p>We also provide a shorter invocation script for Docker
<code>./enter2.sh</code> to run if it still does not work. That one does
not set up permissions so you will have to do <code>sudo</code> for Step
6 in <a href="#the-analysis-pipeline">The analysis pipeline</a>
section.</p>
<h2 id="experimenting-with-the-tool">Experimenting with the tool</h2>
<p>Run the R interpreter <em>inside the docker image</em>. It will start
the patched R interpreter. The tool <em>does not run</em> in the
standard R interpreter.</p>
<p>The following is the screen cast that shows all the commands
executed:</p>
<p><a
href="https://asciinema.org/a/YxDDCvg4SUeEzzUKhfKcDLrCO?idleTimeLimit=1"><img
src="https://asciinema.org/a/YxDDCvg4SUeEzzUKhfKcDLrCO.svg"
alt="asciicast" /></a></p>
<p>In the following listings, <code>$</code> indicates the shell and
<code>&gt;</code> denotes the R REPL.</p>
<div class="sourceCode" id="cb8"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> R</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="ex">R</span> version 4.0.2 <span class="er">(</span><span class="ex">2020-06-22</span><span class="kw">)</span> <span class="ex">--</span> <span class="st">&quot;Taking Off again&quot;</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="ex">...</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a><span class="op">&gt;</span> library<span class="kw">(</span><span class="ex">signatr</span><span class="kw">)</span></span></code></pre></div>
<p>All following commands and instructions should be run in the docker
container.</p>
<h3 id="database">Database</h3>
<p>To generate a database of values, we need some code to run. One way
is to extract it from an existing R package, for example
<code>stringr</code>, which provides regexes:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="sc">&gt;</span> <span class="fu">extract_package_code</span>(<span class="st">&quot;stringr&quot;</span>, <span class="at">output_dir =</span> <span class="st">&quot;demo&quot;</span>)</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>...</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="dv">7</span> examples<span class="sc">/</span>str_detect.Rd.R examples</span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a>...</span></code></pre></div>
<p>This will extract all the runnable snippets from the package
documentation and tests into the given directory. For example:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="sc">&gt;</span> <span class="fu">cat</span>(<span class="fu">readLines</span>(<span class="st">&quot;demo/examples/str_detect.Rd.R&quot;</span>, <span class="at">n =</span> <span class="dv">15</span>), <span class="at">sep =</span> <span class="st">&quot;</span><span class="sc">\n</span><span class="st">&quot;</span>)</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>...</span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>fruit <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="st">&quot;apple&quot;</span>, <span class="st">&quot;banana&quot;</span>, <span class="st">&quot;pear&quot;</span>, <span class="st">&quot;pinapple&quot;</span>)</span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a><span class="fu">str_detect</span>(fruit, <span class="st">&quot;a&quot;</span>)</span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a><span class="fu">str_detect</span>(fruit, <span class="st">&quot;^a&quot;</span>)</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a>...</span></code></pre></div>
<p>Next, we trace the file by running it (in the patched R interpreter)
and recording all the calls, using the
<code>trace_file</code>function:</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="sc">&gt;</span> <span class="fu">trace_file</span>(<span class="st">&quot;demo/examples/str_detect.Rd.R&quot;</span>, <span class="at">db_path =</span> <span class="st">&quot;demo.sxpdb&quot;</span>)</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>        status time                          file    db_path db_size error</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a>elapsed      <span class="dv">0</span> <span class="fl">0.04</span> demo<span class="sc">/</span>examples<span class="sc">/</span>str_detect.Rd.R demo.sxpdb      <span class="dv">20</span>    <span class="cn">NA</span></span></code></pre></div>
<p>The database generation is also automated in the
<code>pipeline-dbgen</code> directory in the artifact, and handles there
tracing on multiple files and merging the results. See <a
href="#generate-the-database">Generate the database</a> for more
details.</p>
<h3 id="fuzzing">Fuzzing</h3>
<p>Once the database is ready, we can start fuzzing the
<code>str_detect</code> function of the <code>stringr</code>
package:</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="sc">&gt;</span> fuzz_results <span class="ot">&lt;-</span> <span class="fu">quick_fuzz</span>(<span class="st">&quot;stringr&quot;</span>, <span class="st">&quot;str_detect&quot;</span>, <span class="st">&quot;demo.sxpdb&quot;</span>, <span class="at">budget =</span> <span class="dv">1000</span>, <span class="at">action =</span> <span class="st">&quot;infer&quot;</span>)</span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>    started a new runner<span class="sc">:</span>PROCESS <span class="st">&#39;R&#39;</span>, running, pid <span class="dv">4157</span></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a>    fuzzing stringr<span class="sc">:::</span>str_detect [<span class="sc">==</span><span class="er">====</span>] <span class="dv">100</span><span class="sc">/</span><span class="dv">100</span> (<span class="dv">100</span>%) 39s</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>    stopped runner<span class="sc">:</span>PROCESS <span class="st">&#39;R&#39;</span>, running, pid <span class="dv">4157</span></span></code></pre></div>
<p>The <code>infer</code> action will infer types for each call argument
and return value using the type annotation language described in <a
href="https://dl.acm.org/doi/abs/10.1145/3428249">Designing types for R,
empirically</a>. It returns an R data frame with the inferred call
signature in the <code>result</code> column:</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="sc">&gt;</span> fuzz_results</span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a><span class="co"># A tibble: 1,000 × 7</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>   args_idx  error                         exit status dispatch     result ts</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a>   <span class="sc">&lt;</span>list<span class="sc">&gt;</span>    <span class="er">&lt;</span>chr<span class="sc">&gt;</span>                        <span class="er">&lt;</span>int<span class="sc">&gt;</span>  <span class="er">&lt;</span>int<span class="sc">&gt;</span> <span class="er">&lt;</span>list<span class="sc">&gt;</span>       <span class="er">&lt;</span>chr<span class="sc">&gt;</span>  <span class="er">&lt;</span>drt<span class="sc">&gt;</span></span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a> <span class="dv">1</span> <span class="sc">&lt;</span>int [<span class="dv">3</span>]<span class="sc">&gt;</span> <span class="st">&quot;Error in UseMethod(</span><span class="sc">\&quot;</span><span class="st">type\…    NA      1 &lt;named list&gt; NA     0.04…</span></span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a><span class="st"> 2 &lt;int [3]&gt; &quot;</span>Error <span class="cf">in</span> stri_detect_regex…    <span class="cn">NA</span>      <span class="dv">1</span> <span class="sc">&lt;</span>named list<span class="sc">&gt;</span> <span class="cn">NA</span>     <span class="fl">0.04</span>…</span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a> <span class="dv">3</span> <span class="sc">&lt;</span>int [<span class="dv">3</span>]<span class="sc">&gt;</span>  <span class="cn">NA</span>                             <span class="cn">NA</span>      <span class="dv">0</span> <span class="sc">&lt;</span>named list<span class="sc">&gt;</span> (logi… <span class="fl">0.04</span>…</span></code></pre></div>
<p>If you are repeating these steps, it is possible that your results
will be different since fuzzing is non-deterministic.</p>
<p>The listing shows three calls: two failed ones (non-zero status) with
an error message, and a successful one with an inferred signature.</p>
<p>You can find all the successful calls for your run of the fuzzer:</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="sc">&gt;</span> dplyr<span class="sc">::</span><span class="fu">filter</span>(fuzz_results, status <span class="sc">==</span> <span class="dv">0</span>)</span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a><span class="co"># A tibble: 112 × 7</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>   args_idx  error  exit status dispatch         result                    ts</span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>   <span class="sc">&lt;</span>list<span class="sc">&gt;</span>    <span class="er">&lt;</span>chr<span class="sc">&gt;</span> <span class="er">&lt;</span>int<span class="sc">&gt;</span>  <span class="er">&lt;</span>int<span class="sc">&gt;</span> <span class="er">&lt;</span>list<span class="sc">&gt;</span>           <span class="er">&lt;</span>chr<span class="sc">&gt;</span>                     <span class="er">&lt;</span>drt<span class="sc">&gt;</span></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a> <span class="dv">1</span> <span class="sc">&lt;</span>int [<span class="dv">3</span>]<span class="sc">&gt;</span> <span class="cn">NA</span>       <span class="cn">NA</span>      <span class="dv">0</span> <span class="sc">&lt;</span>named list [<span class="dv">3</span>]<span class="sc">&gt;</span> (logical, character, log… <span class="fl">0.04</span>…</span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a> <span class="dv">2</span> <span class="sc">&lt;</span>int [<span class="dv">3</span>]<span class="sc">&gt;</span> <span class="cn">NA</span>       <span class="cn">NA</span>      <span class="dv">0</span> <span class="sc">&lt;</span>named list [<span class="dv">3</span>]<span class="sc">&gt;</span> (character, character, l… <span class="fl">0.04</span>…</span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a> <span class="dv">3</span> <span class="sc">&lt;</span>int [<span class="dv">3</span>]<span class="sc">&gt;</span> <span class="cn">NA</span>       <span class="cn">NA</span>      <span class="dv">0</span> <span class="sc">&lt;</span>named list [<span class="dv">3</span>]<span class="sc">&gt;</span> (character, character, d… <span class="fl">0.04</span>…</span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a> <span class="dv">4</span> <span class="sc">&lt;</span>int [<span class="dv">3</span>]<span class="sc">&gt;</span> <span class="cn">NA</span>       <span class="cn">NA</span>      <span class="dv">0</span> <span class="sc">&lt;</span>named list [<span class="dv">3</span>]<span class="sc">&gt;</span> (logical, character, log… <span class="fl">0.04</span>…</span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a> <span class="dv">5</span> <span class="sc">&lt;</span>int [<span class="dv">3</span>]<span class="sc">&gt;</span> <span class="cn">NA</span>       <span class="cn">NA</span>      <span class="dv">0</span> <span class="sc">&lt;</span>named list [<span class="dv">3</span>]<span class="sc">&gt;</span> (logical[], character, l… <span class="fl">0.04</span>…</span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a> <span class="dv">6</span> <span class="sc">&lt;</span>int [<span class="dv">3</span>]<span class="sc">&gt;</span> <span class="cn">NA</span>       <span class="cn">NA</span>      <span class="dv">0</span> <span class="sc">&lt;</span>named list [<span class="dv">3</span>]<span class="sc">&gt;</span> (character, character, l… <span class="fl">0.04</span>…</span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a> <span class="dv">7</span> <span class="sc">&lt;</span>int [<span class="dv">3</span>]<span class="sc">&gt;</span> <span class="cn">NA</span>       <span class="cn">NA</span>      <span class="dv">0</span> <span class="sc">&lt;</span>named list [<span class="dv">3</span>]<span class="sc">&gt;</span> (character, character, d… <span class="fl">0.04</span>…</span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a> <span class="dv">8</span> <span class="sc">&lt;</span>int [<span class="dv">3</span>]<span class="sc">&gt;</span> <span class="cn">NA</span>       <span class="cn">NA</span>      <span class="dv">0</span> <span class="sc">&lt;</span>named list [<span class="dv">3</span>]<span class="sc">&gt;</span> (logical[], character, d… <span class="fl">0.04</span>…</span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a> <span class="dv">9</span> <span class="sc">&lt;</span>int [<span class="dv">3</span>]<span class="sc">&gt;</span> <span class="cn">NA</span>       <span class="cn">NA</span>      <span class="dv">0</span> <span class="sc">&lt;</span>named list [<span class="dv">3</span>]<span class="sc">&gt;</span> (logical[], character, d… <span class="fl">0.04</span>…</span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a><span class="dv">10</span> <span class="sc">&lt;</span>int [<span class="dv">3</span>]<span class="sc">&gt;</span> <span class="cn">NA</span>       <span class="cn">NA</span>      <span class="dv">0</span> <span class="sc">&lt;</span>named list [<span class="dv">3</span>]<span class="sc">&gt;</span> (logical[], character, d… <span class="fl">0.04</span>…</span></code></pre></div>
<p>The <code>args_idx</code> column contains the indices of the values
of the arguments in the database: the actual argument values can be
obtained by looking up the <code>args_idx</code> in the database:</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="sc">&gt;</span> <span class="fu">library</span>(sxpdb)</span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="sc">&gt;</span> db <span class="ot">&lt;-</span> <span class="fu">open_db</span>(<span class="st">&quot;demo.sxpdb&quot;</span>)</span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a><span class="sc">&gt;</span> <span class="fu">get_value_idx</span>(db, <span class="dv">0</span>) <span class="co"># value at index 0</span></span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a>[<span class="dv">1</span>] <span class="st">&quot;a&quot;</span></span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a><span class="sc">&gt;</span> <span class="fu">close</span>(db)</span></code></pre></div>
<p>One advantage of using R is that we can use R’s many data analysis
functions. For example, we can look at the resulting signatures:</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="sc">&gt;</span> dplyr<span class="sc">::</span><span class="fu">count</span>(fuzz_results, result)</span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a><span class="co"># A tibble: 20 × 2</span></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a>   result                                             n</span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a>   <span class="sc">&lt;</span>chr<span class="sc">&gt;</span>                                          <span class="er">&lt;</span>int<span class="sc">&gt;</span></span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a> <span class="dv">1</span> <span class="cn">NA</span>                                               <span class="dv">888</span></span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a> <span class="dv">2</span> (character, character, logical) <span class="sc">=&gt;</span> logical        <span class="dv">28</span></span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a> <span class="dv">3</span> (character, character, double) <span class="sc">=&gt;</span> logical         <span class="dv">21</span></span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a> <span class="dv">4</span> (character, character, logical[]) <span class="sc">=&gt;</span> logical[]    <span class="dv">10</span></span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a> <span class="dv">5</span> (logical, character, logical) <span class="sc">=&gt;</span> logical           <span class="dv">7</span></span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a> <span class="dv">6</span> (logical[], character, logical) <span class="sc">=&gt;</span> logical[]       <span class="dv">7</span></span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a> <span class="dv">7</span> (null, character, logical) <span class="sc">=&gt;</span> logical[]            <span class="dv">7</span></span>
<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a> <span class="dv">8</span> (logical, character, double) <span class="sc">=&gt;</span> logical            <span class="dv">5</span></span>
<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a> <span class="dv">9</span> (logical[], character, double) <span class="sc">=&gt;</span> logical[]        <span class="dv">5</span></span>
<span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a><span class="dv">10</span> (character[], character, logical) <span class="sc">=&gt;</span> logical[]     <span class="dv">4</span></span>
<span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a><span class="dv">11</span> (character[], character, double) <span class="sc">=&gt;</span> logical[]      <span class="dv">3</span></span>
<span id="cb16-16"><a href="#cb16-16" aria-hidden="true" tabindex="-1"></a><span class="dv">12</span> (double, character, logical) <span class="sc">=&gt;</span> logical            <span class="dv">3</span></span>
<span id="cb16-17"><a href="#cb16-17" aria-hidden="true" tabindex="-1"></a><span class="dv">13</span> (null, character, logical[]) <span class="sc">=&gt;</span> logical[]          <span class="dv">3</span></span>
<span id="cb16-18"><a href="#cb16-18" aria-hidden="true" tabindex="-1"></a><span class="dv">14</span> (character, character[], logical) <span class="sc">=&gt;</span> logical[]     <span class="dv">2</span></span>
<span id="cb16-19"><a href="#cb16-19" aria-hidden="true" tabindex="-1"></a><span class="dv">15</span> (null, character[], logical) <span class="sc">=&gt;</span> logical[]          <span class="dv">2</span></span>
<span id="cb16-20"><a href="#cb16-20" aria-hidden="true" tabindex="-1"></a><span class="dv">16</span> (character, character[], double) <span class="sc">=&gt;</span> logical[]      <span class="dv">1</span></span>
<span id="cb16-21"><a href="#cb16-21" aria-hidden="true" tabindex="-1"></a><span class="dv">17</span> (double, character, double) <span class="sc">=&gt;</span> logical             <span class="dv">1</span></span>
<span id="cb16-22"><a href="#cb16-22" aria-hidden="true" tabindex="-1"></a><span class="dv">18</span> (double, character, logical[]) <span class="sc">=&gt;</span> logical[]        <span class="dv">1</span></span>
<span id="cb16-23"><a href="#cb16-23" aria-hidden="true" tabindex="-1"></a><span class="dv">19</span> (logical, character[], logical) <span class="sc">=&gt;</span> logical[]       <span class="dv">1</span></span>
<span id="cb16-24"><a href="#cb16-24" aria-hidden="true" tabindex="-1"></a><span class="dv">20</span> (logical[], character, logical[]) <span class="sc">=&gt;</span> logical[]     <span class="dv">1</span></span></code></pre></div>
<p>This shows that in 3 cases, the fuzzer managed to generate a call
that was successful, and so the signatures of those calls.</p>
<h2 id="the-analysis-pipeline">The analysis pipeline</h2>
<p>The following tutorial demonstrates how to run the analysis pipeline
to reproduce the results of the paper. It consists of a series of steps
that at the end generates the input for the analysis.</p>
<p>In this write up, we will run it on a small subset of the original
packages (cf. <code>data/packages.txt</code>). The reason is that the
size of the data require is fairly large. For example, just the value
database is over 287GB and its generation take over half a day (on a 72
core Intel Xeon 6140 2.30GHz server). Also one would have to download
and install all the packages and their dependencies which again takes
space and time. If you are however interested and have the computational
resource, we will be happy to share the data, please contact the AEC
chair.</p>
<p>There is also a screen cast for this part of the artifact. However,
due to a size limitations, it is not possible to share it directly on <a
href="https://asciinema.org/">asciinema.org</a>. Instead, it is in a
compressed for in the <code>assets</code> directory. To replay it
locally (assuming you have installed the <code>asciinema</code> tool),
please do the following steps:</p>
<div class="sourceCode" id="cb17"><pre
class="sourceCode sh"><code class="sourceCode bash"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="bu">cd</span> assets</span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a><span class="fu">unxz</span> screencast-pipeline.asciinema.xz</span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a><span class="ex">asciinema</span> play <span class="at">-i</span> 1 <span class="at">-s</span> 10 screencast-pipeline.asciinema</span></code></pre></div>
<p>That will play it 10x the actual speed, limiting the idle time to 1
second.</p>
<hr />
<p><strong>Note</strong>: - You will be running code downloaded from a
public repository. Despite that CRAN is a curated repository, it should
be done with caution. Run it inside the container.</p>
<ul>
<li>Most steps takes a few minutes at most, long running ones are
indicated with an estimate.</li>
</ul>
<h3 id="steps">Steps</h3>
<p>The following is essentially what is in the Figure 1 and Figure 2 in
the paper, packaged in scripts for simpler use using GNU parallels for
parallel execution. All steps should be run inside a docker container.
As a reminder, to enter the container, run:</p>
<div class="sourceCode" id="cb18"><pre
class="sourceCode sh"><code class="sourceCode bash"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="ex">./enter.sh</span></span></code></pre></div>
<p>Anytime you want to kill a task, it is good to exit the container and
enter it again so all the child processes are properly killed.</p>
<h3 id="get-the-sample-sxpdb-database">0. get the sample sxpdb
database</h3>
<p>For the experiment we need a value database (sxpdb database) that
will be used for the fuzzing. You can either <a
href="#building-it-yourself">build one yourself</a>, or <a
href="https://owncloud.cesnet.cz/index.php/s/aHprMbas4haELVf">download</a>
one we have prepared using the same steps.</p>
<p>To get the prebuilt, one do the following:</p>
<div class="sourceCode" id="cb19"><pre
class="sourceCode sh"><code class="sourceCode bash"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="bu">cd</span> data</span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a><span class="fu">wget</span> <span class="at">-O</span> cran_db.tar.xz https://owncloud.cesnet.cz/index.php/s/aHprMbas4haELVf/download</span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a><span class="fu">tar</span> xvJf cran_db.tar.xz</span></code></pre></div>
<p>The extracted database has about 10GB.</p>
<h4 id="building-it-yourself">Building it yourself</h4>
<p>The database generation uses <a
href="https://docs.ropensci.org/targets/">targets</a> to orchestrate the
pipeline.</p>
<p>The database for the SLE paper is obtained by tracing 400 packages
from <code>data/packages-typer-400.txt</code>. The packages to be traced
have to be specified in <code>data/packages.txt</code>, which contains a
new-line separated list of packages to include in the corpus.</p>
<p>To start tracing, after opening an R session and specifying an
adequate number of parallel workers:</p>
<div class="sourceCode" id="cb20"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="fu">cp</span> data/packages-typer-400.txt data/packages.txt</span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a><span class="bu">cd</span> pipeline-dbgen</span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a><span class="ex">R</span> <span class="at">-e</span> <span class="st">&#39;targets::tar_make_future(workers = 64)&#39;</span></span></code></pre></div>
<p>The extracted code of the packages will be located in
<code>output/extracted-code</code>. The resulting database will be
generated as <code>output/sxpdb/cran_db</code>. You should move it to
<code>data</code> to follow the next steps. Depending on your machine,
the generation of the database for the 400 packages can take from a few
hours to a few days.</p>
<p>We provide other variants of <code>packages.txt</code>. For instance,
<code>packages-4.txt</code> includes 2 huge and common R packages,
<code>dplyr</code> and <code>ggplot2</code>.</p>
<h3 id="create-a-corpus">1. create a corpus</h3>
<p>The corpus consists of the following:</p>
<ul>
<li>R package sources in <code>data/sources</code></li>
<li>installed R packages <code>data/library</code></li>
<li>extracted code from R packages <code>data/extracted-code</code></li>
<li>corpus metadata file <code>data/corpus.csv</code></li>
</ul>
<p>This is bootstrapped using the <code>data/packages.txt</code>
file.</p>
<p>To create a corpus, run the following:</p>
<div class="sourceCode" id="cb21"><pre
class="sourceCode sh"><code class="sourceCode bash"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="ex">./create-corpus.R</span></span></code></pre></div>
<p>Depending on the number of packages (and their transitive
dependencies), it might take a while. For the sample of 5 packages
(small corpus, though of the very popular packages), it might be ~20
minutes.</p>
<p>It could happen that some dependencies won’t install.</p>
<p>The result should be something like:</p>
<pre><code>data/extracted-code  &lt;--- extracted code from R packages
data/library         &lt;--- installed R packages
data/sources         &lt;--- R package sources
data/corpus.csv      &lt;--- corpus metadata</code></pre>
<h3 id="fuzz-the-installed-functions">2. fuzz the installed
functions</h3>
<p>Next, we will run the fuzzer using the values from the sample
database:</p>
<div class="sourceCode" id="cb23"><pre
class="sourceCode sh"><code class="sourceCode bash"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="ex">./run-fuzz.sh</span></span></code></pre></div>
<p>By default this will sample 100 functions from the
<code>corpus.csv</code> and fuzz each 100 times. Both can adjusted by
setting the <code>FUNS</code> and <code>BUDGET</code> environment
variables. Using all the functions
(e.g. <code>FUNS=$(wc -l data/corpus.csv)</code> and 5000 runs
(e.g. <code>BUDGET=5000</code>), the experiment might take about a day.
That is why we recommend to scale it down so it runs within 30 minutes.
By default, it will run 16 jobs in parallel. The can be changed using
the <code>JOBS</code> environment variable.</p>
<p>The result will be:</p>
<pre><code>data/fuzz            &lt;--- directory with the fuzzer output
data/run-fuzz.csv    &lt;--- metadata about the run, duration, exitcodes, ...</code></pre>
<p>You could view the intermediate results using the
<code>qcat.sh</code> utility. For example:</p>
<div class="sourceCode" id="cb25"><pre
class="sourceCode sh"><code class="sourceCode bash"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="ex">./qcat.R</span> <span class="st">&#39;data/fuzz/dplyr::arg_name&#39;</span></span></code></pre></div>
<p>shall show results for a function <code>arg_name</code> from
<code>dplyr</code> package:</p>
<div class="sourceCode" id="cb26"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a><span class="co"># A tibble: 100 × 9</span></span>
<span id="cb26-2"><a href="#cb26-2" aria-hidden="true" tabindex="-1"></a>    args_idx  error        exit status dispatch     result ts    fun_n…¹ rdb_p…²</span>
<span id="cb26-3"><a href="#cb26-3" aria-hidden="true" tabindex="-1"></a>    <span class="sc">&lt;</span>list<span class="sc">&gt;</span>    <span class="er">&lt;</span>chr<span class="sc">&gt;</span>       <span class="er">&lt;</span>int<span class="sc">&gt;</span>  <span class="er">&lt;</span>int<span class="sc">&gt;</span> <span class="er">&lt;</span>list<span class="sc">&gt;</span>        <span class="er">&lt;</span>int<span class="sc">&gt;</span> <span class="er">&lt;</span>drt<span class="sc">&gt;</span> <span class="er">&lt;</span>chr<span class="sc">&gt;</span>   <span class="er">&lt;</span>chr<span class="sc">&gt;</span></span>
<span id="cb26-4"><a href="#cb26-4" aria-hidden="true" tabindex="-1"></a>  <span class="dv">1</span> <span class="sc">&lt;</span>int [<span class="dv">2</span>]<span class="sc">&gt;</span> <span class="st">&quot;Error in …    NA      1 &lt;named list&gt;     NA 0.08… dplyr:… ../rdb…</span></span>
<span id="cb26-5"><a href="#cb26-5" aria-hidden="true" tabindex="-1"></a><span class="st">  2 &lt;int [2]&gt; &quot;</span>Error <span class="cf">in</span> …    <span class="cn">NA</span>      <span class="dv">1</span> <span class="sc">&lt;</span>named list<span class="sc">&gt;</span>     <span class="cn">NA</span> <span class="fl">0.11</span>… dplyr<span class="sc">:</span>… ..<span class="sc">/</span>rdb…</span>
<span id="cb26-6"><a href="#cb26-6" aria-hidden="true" tabindex="-1"></a>  <span class="dv">3</span> <span class="sc">&lt;</span>int [<span class="dv">2</span>]<span class="sc">&gt;</span> <span class="st">&quot;Error in …    NA      1 &lt;named list&gt;     NA 0.14… dplyr:… ../rdb…</span></span>
<span id="cb26-7"><a href="#cb26-7" aria-hidden="true" tabindex="-1"></a><span class="st">  4 &lt;int [2]&gt; &quot;</span>Error <span class="cf">in</span> …    <span class="cn">NA</span>      <span class="dv">1</span> <span class="sc">&lt;</span>named list<span class="sc">&gt;</span>     <span class="cn">NA</span> <span class="fl">0.15</span>… dplyr<span class="sc">:</span>… ..<span class="sc">/</span>rdb…</span>
<span id="cb26-8"><a href="#cb26-8" aria-hidden="true" tabindex="-1"></a>  <span class="dv">5</span> <span class="sc">&lt;</span>int [<span class="dv">2</span>]<span class="sc">&gt;</span> <span class="st">&quot;Error in …    NA      1 &lt;named list&gt;     NA 0.09… dplyr:… ../rdb…</span></span>
<span id="cb26-9"><a href="#cb26-9" aria-hidden="true" tabindex="-1"></a><span class="st">  6 &lt;int [2]&gt; &quot;</span>Error <span class="cf">in</span> …    <span class="cn">NA</span>      <span class="dv">1</span> <span class="sc">&lt;</span>named list<span class="sc">&gt;</span>     <span class="cn">NA</span> <span class="fl">0.53</span>… dplyr<span class="sc">:</span>… ..<span class="sc">/</span>rdb…</span>
<span id="cb26-10"><a href="#cb26-10" aria-hidden="true" tabindex="-1"></a>  <span class="dv">7</span> <span class="sc">&lt;</span>int [<span class="dv">2</span>]<span class="sc">&gt;</span> <span class="st">&quot;Error in …    NA      1 &lt;named list&gt;     NA 0.11… dplyr:… ../rdb…</span></span>
<span id="cb26-11"><a href="#cb26-11" aria-hidden="true" tabindex="-1"></a><span class="st">  8 &lt;int [2]&gt;  NA            NA      0 &lt;named list&gt;     30 0.09… dplyr:… ../rdb…</span></span>
<span id="cb26-12"><a href="#cb26-12" aria-hidden="true" tabindex="-1"></a><span class="st">  9 &lt;int [2]&gt;  NA            NA      0 &lt;named list&gt;     31 0.09… dplyr:… ../rdb…</span></span>
<span id="cb26-13"><a href="#cb26-13" aria-hidden="true" tabindex="-1"></a><span class="st"> 10 &lt;int [2]&gt;  NA            NA      0 &lt;named list&gt;     32 0.09… dplyr:… ../rdb…</span></span>
<span id="cb26-14"><a href="#cb26-14" aria-hidden="true" tabindex="-1"></a><span class="st">...</span></span></code></pre></div>
<p>It indicates 7 failed calls and 3 good ones. Please note that due to
random sampling your results will likely be different. It is also
possible that there will not be any
<code>data/fuzz/dplyr::arg_name</code> file as the functions are
selected randomly.</p>
<h3 id="type-the-results">3. type the results</h3>
<p>To type the traces, run the following:</p>
<div class="sourceCode" id="cb27"><pre
class="sourceCode sh"><code class="sourceCode bash"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="ex">./run-type.sh</span></span></code></pre></div>
<p>By default, it will run 16 jobs in parallel. The can be changed using
the <code>JOBS</code> environment variable.</p>
<p>The result will be:</p>
<pre><code>data/types            &lt;--- directory with the type output
data/run-type.csv     &lt;--- metadata about the run, duration, exitcodes, ...</code></pre>
<p>We can again peek the results:</p>
<div class="sourceCode" id="cb29"><pre
class="sourceCode sh"><code class="sourceCode bash"><span id="cb29-1"><a href="#cb29-1" aria-hidden="true" tabindex="-1"></a><span class="ex">./qcat.R</span> <span class="st">&#39;data/types/dplyr::arg_name&#39;</span></span></code></pre></div>
<p>which should show types inferred from the fuzzed calls:</p>
<div class="sourceCode" id="cb30"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb30-1"><a href="#cb30-1" aria-hidden="true" tabindex="-1"></a><span class="co"># A tibble: 40 × 3</span></span>
<span id="cb30-2"><a href="#cb30-2" aria-hidden="true" tabindex="-1"></a>   fun_name           id signature</span>
<span id="cb30-3"><a href="#cb30-3" aria-hidden="true" tabindex="-1"></a>   <span class="sc">&lt;</span>chr<span class="sc">&gt;</span>           <span class="er">&lt;</span>int<span class="sc">&gt;</span> <span class="er">&lt;</span>chr<span class="sc">&gt;</span></span>
<span id="cb30-4"><a href="#cb30-4" aria-hidden="true" tabindex="-1"></a> <span class="dv">1</span> dplyr<span class="sc">::</span>arg_name     <span class="dv">8</span> (list<span class="sc">&lt;</span>list<span class="sc">&lt;</span>class<span class="sc">&lt;</span>unit, unit_v2<span class="sc">&gt;</span> <span class="er">|</span> double <span class="sc">|</span> integer<span class="sc">&gt;</span> <span class="er">|</span> …</span>
<span id="cb30-5"><a href="#cb30-5" aria-hidden="true" tabindex="-1"></a> <span class="dv">2</span> dplyr<span class="sc">::</span>arg_name     <span class="dv">9</span> (class<span class="sc">&lt;</span>gList<span class="sc">&gt;</span>, list<span class="sc">&lt;</span>class<span class="sc">&lt;</span>factor<span class="sc">&gt;</span> <span class="er">|</span> double <span class="sc">|</span> integer<span class="sc">&gt;</span>)…</span>
<span id="cb30-6"><a href="#cb30-6" aria-hidden="true" tabindex="-1"></a> <span class="dv">3</span> dplyr<span class="sc">::</span>arg_name    <span class="dv">10</span> (pairlist, list<span class="sc">&lt;</span>character <span class="sc">|</span> double[]<span class="sc">&gt;</span>) <span class="sc">=&gt;</span> class<span class="sc">&lt;</span>glue, …</span>
<span id="cb30-7"><a href="#cb30-7" aria-hidden="true" tabindex="-1"></a> <span class="dv">4</span> dplyr<span class="sc">::</span>arg_name    <span class="dv">13</span> (list<span class="sc">&lt;</span>list<span class="sc">&lt;</span>class<span class="sc">&lt;</span>matrix<span class="sc">&gt;</span> <span class="er">|</span> double[] <span class="sc">|</span> integer <span class="sc">|</span> intege…</span>
<span id="cb30-8"><a href="#cb30-8" aria-hidden="true" tabindex="-1"></a> <span class="dv">5</span> dplyr<span class="sc">::</span>arg_name    <span class="dv">14</span> (character[], list<span class="sc">&lt;</span>character <span class="sc">|</span> logical<span class="sc">&gt;</span>) <span class="sc">=&gt;</span> class<span class="sc">&lt;</span>glue…</span>
<span id="cb30-9"><a href="#cb30-9" aria-hidden="true" tabindex="-1"></a> <span class="dv">6</span> dplyr<span class="sc">::</span>arg_name    <span class="dv">15</span> (list<span class="sc">&lt;</span>class<span class="sc">&lt;</span>unit, unit_v2<span class="sc">&gt;</span><span class="er">&gt;</span>, list<span class="sc">&lt;</span>list<span class="sc">&lt;</span>class<span class="sc">&lt;</span>expectati…</span>
<span id="cb30-10"><a href="#cb30-10" aria-hidden="true" tabindex="-1"></a> <span class="dv">7</span> dplyr<span class="sc">::</span>arg_name    <span class="dv">17</span> (list<span class="sc">&lt;</span>class<span class="sc">&lt;</span>call<span class="sc">&gt;</span><span class="er">&gt;</span>, double[]) <span class="sc">=&gt;</span> class<span class="sc">&lt;</span>glue, character<span class="sc">&gt;</span></span>
<span id="cb30-11"><a href="#cb30-11" aria-hidden="true" tabindex="-1"></a> <span class="dv">8</span> dplyr<span class="sc">::</span>arg_name    <span class="dv">24</span> (list<span class="sc">&lt;</span>class<span class="sc">&lt;</span>margin, simpleUnit, unit, unit_v2<span class="sc">&gt;</span> <span class="er">|</span> class…</span>
<span id="cb30-12"><a href="#cb30-12" aria-hidden="true" tabindex="-1"></a> <span class="dv">9</span> dplyr<span class="sc">::</span>arg_name    <span class="dv">28</span> (class<span class="sc">&lt;</span>matrix<span class="sc">&gt;</span>, list<span class="sc">&lt;</span>class<span class="sc">&lt;</span>expectation_success, expect…</span>
<span id="cb30-13"><a href="#cb30-13" aria-hidden="true" tabindex="-1"></a><span class="dv">10</span> dplyr<span class="sc">::</span>arg_name    <span class="dv">30</span> (double, class<span class="sc">&lt;</span>titleGrob, gTree, grob, gDesc<span class="sc">&gt;</span>) <span class="sc">=&gt;</span> clas…</span></code></pre></div>
<h3 id="fuzz-coverage">4. fuzz coverage</h3>
<p>Computing the function source code coverage from the fuzzed calls is
done by running the following:</p>
<div class="sourceCode" id="cb31"><pre
class="sourceCode sh"><code class="sourceCode bash"><span id="cb31-1"><a href="#cb31-1" aria-hidden="true" tabindex="-1"></a><span class="ex">./run-coverage.sh</span></span></code></pre></div>
<p>This will use the traced data to recreate the calls while using the
<a href="https://covr.r-lib.org/">covr</a> tool to record code coverage.
By default, it will run 16 jobs in parallel. The can be changed using
the <code>JOBS</code> environment variable.</p>
<p>The result will be:</p>
<pre><code>data/coverage          &lt;--- directory with the coverage output
data/run-coverage.csv  &lt;--- metadata about the run, duration, exitcodes, ...</code></pre>
<h3 id="baseline">5. baseline</h3>
<p>To have a comparison, we need to need to get the baseline data.
Instead of fuzzing, we will simply run the extracted code from the
packages. There are three steps:</p>
<ol type="1">
<li><p>run the extracted code to get the traces</p>
<div class="sourceCode" id="cb33"><pre
class="sourceCode sh"><code class="sourceCode bash"><span id="cb33-1"><a href="#cb33-1" aria-hidden="true" tabindex="-1"></a><span class="ex">./run-baseline.sh</span></span>
<span id="cb33-2"><a href="#cb33-2" aria-hidden="true" tabindex="-1"></a><span class="ex">./traces-baseline.R</span></span></code></pre></div>
<p>This might be a bit longer running - about 15 minutes.</p></li>
<li><p>type the traces</p>
<div class="sourceCode" id="cb34"><pre
class="sourceCode sh"><code class="sourceCode bash"><span id="cb34-1"><a href="#cb34-1" aria-hidden="true" tabindex="-1"></a><span class="ex">./run-type-baseline.sh</span></span></code></pre></div></li>
<li><p>compute the coverage from these traces</p>
<div class="sourceCode" id="cb35"><pre
class="sourceCode sh"><code class="sourceCode bash"><span id="cb35-1"><a href="#cb35-1" aria-hidden="true" tabindex="-1"></a><span class="ex">./run-coverage-baseline.sh</span></span></code></pre></div>
<p>This might be a bit longer running - about 15 minutes.</p></li>
</ol>
<p>By default, all will run 16 jobs in parallel. The can be changed
using the <code>JOBS</code> environment variable.</p>
<p>The results will be in</p>
<pre><code>data/baseline            &lt;--- baseline traces
data/baseline-types      &lt;--- baseline types
data/baseline-coverage   &lt;--- baseline coverage
data/run-*-baseline.csv  &lt;--- metadata about the runs, duration, exitcodes, ...</code></pre>
<h3 id="create-a-report">6. create a report</h3>
<p>Finally, to render the results, run:</p>
<div class="sourceCode" id="cb37"><pre
class="sourceCode sh"><code class="sourceCode bash"><span id="cb37-1"><a href="#cb37-1" aria-hidden="true" tabindex="-1"></a><span class="ex">R</span> <span class="at">--slave</span> <span class="at">--quiet</span> <span class="at">-e</span> <span class="st">&#39;rmarkdown::render(&quot;sle.Rmd&quot;)&#39;</span></span></code></pre></div>
<p>This should create a file <code>sle.html</code> which you can open in
a browser (navigate to the directory where you run the
<code>./enter.sh</code>). It also creates three more files: -
<code>experiment-uf.tex</code> the data for the paper -
<code>argsdb-value-distribution.pdf</code> figure 3 in the paper -
<code>uf-call-signatures.pdf</code> figure 4 in the paper</p>
<hr />
<p><strong>Note</strong>:</p>
<ul>
<li>Regarding the coverage, most likely a small number of fuzzed calls
won’t find a new paths, so in the report you will see 0 - as to better
coverage.</li>
</ul>