-
Notifications
You must be signed in to change notification settings - Fork 3
/
3-sourcerercc.Rmd
39 lines (28 loc) · 1.08 KB
/
3-sourcerercc.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
title: "3 - SourcererCC Clone Analysis"
output: html_notebook
---
```{r setup, echo=F, results='hide'}
Sys.setenv(R_NOTEBOOK_HOME = getwd())
```
First, copy the tokenized files into a location where SourcererCC will look for its input (`input/dataset/blocks.file`):
```{bash}
cd $R_NOTEBOOK_HOME
cd tools/SourcererCC/clone-detector
cp ../../../datasets/js/tokenized_files.csv input/dataset/blocks.file
```
Then, run SourcererCC by invoking the controller:
```{bash}
cd $R_NOTEBOOK_HOME
cd tools/SourcererCC/clone-detector
python controller.py
```
> Running this chunk takes some time (minutes to tenths of minutes).
SourcererCC runs distributed across multiple nodes and when finished, the result from the nodes must be joined together into a `sourcerer.csv` file in the dataset. Assuming 2 nodes were used, the following command should be executed:
```{bash}
cd $R_NOTEBOOK_HOME
cd tools/SourcererCC/clone-detector
cat NODE_*/output8.0/query_* > ../../../datasets/js/sourcerer.csv
```
## Next Steps
[Database Import](4-dbimport.nb.html) in file [`4-dbimport.Rmd`](4-dbimport.Rmd).