-
Notifications
You must be signed in to change notification settings - Fork 3
/
6-metadata.Rmd
61 lines (39 loc) · 1.83 KB
/
6-metadata.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
title: "7 - Metadata Acquisition"
output: html_notebook
---
```{r setup, echo=F, results='hide'}
Sys.setenv(R_NOTEBOOK_HOME = getwd())
source("config.R")
source("helpers.R")
```
Metadata from the paper come from two sources, one of them is the GitHub API, which we query about stars, users, etc. and the other one is number of commits per project, which we get from the `ghtorrent` snapshot.
## Metadata from Github
To download the metadata, we provide a function that queries the database and downloads the metadata for projects into a `csv` file. The function may be distributed by using the extra `stride` and `strides` arguments, but for the sake of the VM, we can easily run w/o any distribution:
```{r}
downloadMetadata(DATASET_NAME, DATASET_PATH, GITHUB_TOKENS)
```
When done, the data can be imported, if multiple strides were used, an extra argument with their number must be added:
```{r}
importMetadata(DATASET_NAME, DATASET_PATH)
```
At this point, the `projects` table of the dataset should have extra columns for:
- stars
- subscribers
- number of forks
- number of open issues
Note that for the paper, only the number of stars was used.
## Number of commits
The number of commits is obtained from GHTorrent's tables. Given the downloaded projects and the GHTorrent commits data (`projects_commits.csv`), we can run the `sccpreprocessor` to calculate the number of commits per project:
```{bash}
cd $R_NOTEBOOK_HOME
cd tools/sccpreprocessor/src
java SccPreprocessor commits ../../../datasets/js ../../../ghtorrent/project_commits.csv
```
To import the data, use the `importCommits` function in `functions.R`:
```{r}
importCommits(DATASET_NAME, DATASET_PATH)
```
Which adds the column `commits` to the projects table.
## Next Steps
[Additional Processing](7-processing.nb.html) in file `[7-processing.Rmd`](7-processing.Rmd).