Make mining worker delete commit files that has no merge conflict #169

zegabr · 2023-04-22T18:44:12Z

While doing my research on CSDiff, I had to compare many versions of it, meaning I had to run miningframework multiple times.

For a given time interval, the tool downloads every commit that is non-fast-forward, meaning that even if the commit has 1 files with conflict and 100 files with no conflicts (fast forward merge), all the 101 files will be downloaded. This can easily take about 30GB of the device's if you run the tool with 10 projects for an interval of 1 month.

In my case, I needed only the files where the results between CSDiff and Diff3 were different. To be able to obtain only the info i needed with the current implementation, I had to do some workarounds see this branch.

In summary I:

created one csv for each project here
ran miningframework once for each project script
2.1) deleted every unwanted file using this after each run
then i created another csv with the relevant data

I think this can be done directly via miningframework, probably around here. As the tool have filters for the commits, it could probably have filters for files too.

This would make it possible to get more data for next researches using less memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make mining worker delete commit files that has no merge conflict #169

Make mining worker delete commit files that has no merge conflict #169

zegabr commented Apr 22, 2023

Make mining worker delete commit files that has no merge conflict #169

Make mining worker delete commit files that has no merge conflict #169

Comments

zegabr commented Apr 22, 2023