-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCA+ vs Bellwether #8
Comments
This is kind of how NOT to present results. dump 0.39. report 39. compute variance pd, pf, g for TCA, bellwether for each data sets, write a table with 1+3+3+3 columns
note that for columns 8,9,10 if the delta is a "small effect" then print blank space, otherwise print the delta with a "+" or "-" sign in front computing small effectin the following, i,j are objects with knowledge of (say) pd for TCA+ and bellwether. they can report n (size) mu (mean) and sd (standard deviation) def hedges(i,j,small=0.38):
"""
Hedges effect size test.
Returns true if i,j differ by just a small effect.
I.e. there different is so small as to be dull!
"""
num = (i.n - 1)*i.s**2 + (j.n - 1)*j.s**2
denom = (i.n - 1) + (j.n - 1)
sp = sqrt( num / denom )
delta = abs(i.mu - j.mu) / sp
c = 1 - 3.0 / (4*(i.n + j.n - 2) - 1)
return delta * c < small for justification of the above code, see equations 2,3,4 of and table 9 from https://pdfs.semanticscholar.org/e6be/263f60ccfb294e14422f0e0162b1367063a2.pdf no, its not the best effect size test but it is one justified by an extensive literature review. |
Note that if you just as good as TCA+ but simpler than TCA+ is an over-elaboration |
Got it. I'll update the results accordingly. |
TCA vs. Bellwether: PDs, PFs, and Gs.
Results |
[~]$ python -B ~/git/rahlk/Bellwether/src/TCA+/par_exec.py
Summary (Now compiling....)
Looked at Pd, Pf, G (where g = sqrt(Pd*(1-Pf)); the harmonic mean between Pd and Pf used for sorting)
No clear Bellwether on any of the metrics. We were able to find on a metric called Balance (which was the euclidean distance from ideal score (Pd, Pf) = (1, 0)). Our results
here.
Now, note that out round robin evaluation strategy of finding the best dataset significantly outperforms TCA+
The authors of TCA+ do not report results on this data set. Moreover, they conveniently breeze through the most important part, the TCA algorithm. See sections 3.1 and 3.2 (this was a beast to implement! But offered no major benefits).
Their primary contribution of TCA+ is a proposal on choosing the best normalization strategy.
Comparisons
(I'm sorry, this is a formatted raw dump. I'll create a summary table and paste it here.)
Target Project: ANT
TCA+
Bellwether Method
Target Project: LUCENE
TCA+
Bellwether Method
Target Project: JEDIT
TCA+
Bellwether Method
Target Project: XERCES
TCA+
Bellwether Method
Target Project: XALAN
TCA+
Bellwether Method
Target Project: CAMEL
TCA+
Bellwether Method
Target Project: VELOCITY
TCA+
Bellwether Method
Target Project: POI
TCA+
Bellwether Method
Target Project: LOG4J
TCA+
Bellwether Method
Target Project: IVY
TCA+
Bellwether Method
The text was updated successfully, but these errors were encountered: