-
Notifications
You must be signed in to change notification settings - Fork 15
/
TODO
541 lines (403 loc) · 23.4 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
------------------------------------------------------------*- mode: text -*-
BUGS: see https://github.com/MLopez-Ibanez/irace/issues
TODO R package:
* (Patrick Schratz) Usually people do not recommend r via homebrew because it
is compiled with gcc instead of clang which may cause issues at runtime. A
better approach is to use https://github.com/rmacoslib/r-macos-rtools. Of
course you are free to recommend whatever you want, just a pointer.
* Look at https://cran.r-project.org/web/packages/argparser/index.html or
optparse: https://cran.r-project.org/web/packages/optparse/index.html
See examples/optparse.R
NOTE: Currently error messages cannot be
customized, so we cannot use this. But it could be an inspiration to write
our own class.
* Use
https://github.com/seandavi/F1000R_BiocWorkflows/blob/master/rnwConversion/processesRnw.R
to convert the user-guide to Rmarkdown, then use pkgdown to generate a nice
website.
* Add inst/COPYRIGHTS. See race package.
* Fix check-reverse so we can check reverse dependencies before
uploading to CRAN. Usually devs do it via
https://github.com/r-lib/revdepcheck locally once for every release.
(This runs check-reverse, which check every reverse dependency. This
may take some time to download all packages and check them.)
TODO General (in order of importance):
* Fix bugs in https://github.com/MLopez-Ibanez/irace/issues
* Fix all FIXMEs.
* Consume all budget by using post-selection
* Federico's implementation of DAG / Complexity metric.
* Use both cost and time to calculate wins, then use a prop.test
prop.test(x=c(sum(x <0.5),sum(y<0.5)), n=c(length(x), length(y)))
The Fisher Exact probability test is an excellent non-parametric technique for comparing proportions, when the two independent samples are small in size.
* Sensitivity analysis: try values around the best configuration found, one at
a time.
* Give priority to user-provided configurations or to default/preferred
parameter values when choosing between similarly performing configurations.
* boundMax should be a floating-point number.
* Write a report at the end that checks if the instances are homogeneous and
suggests clusters and sub-configurations. One possibility is to replace
missing data with the mean of the row (across configurations), such that the
metric is optimistic. Create a diagnostic scatter-plot of training versus
testing.
* Automatic split training and testing instances given a percentage.
Stratified version that takes into account features.
* Various initialization modes Latin Hyper-cube, random, exhaustive,
Hammersley point set, etc.
* In elitist irace, target-evaluator is always called on elite configurations,
while the corresponding target-runner is called only once. Hence, the
files/data required by target-evaluator must be kept around for future
iterations. Moreover, the use of --sge-cluster requires using a
target-evaluator, even in cases where the actual result does not change. It
would be better to 1) not always require a target-evaluator for
--sge-cluster (but then we need a way pass the result from qsub to
target-runner to irace); or 2) make optional that target-evaluator is
re-evaluated (but we still need to re-evaluate for MO).
* crayon: Colored Terminal Output
https://cran.r-project.org/package=crayon
* The truncated normal distribution actually gives more probability to the
side further away from the mean (that is, the integral of the PDF is larger
on the side where the distance is larger from the mean). This is not really
what we want when optimizing, since the fact that we have moved from the
center of the interval suggests that moving closer to the extreme may be
desirable. A better approach may be to adjust the PDF such that the integral
of the truncated PDF is symmetric with respect to the mean. This is possible
for the non-truncated normal:
https://en.wikipedia.org/wiki/Skew_normal_distribution but less clear for
the truncated case:
https://en.wikipedia.org/wiki/Truncated_normal_distribution It is also not
clear how to set the parameters of the PDF, such that the desired symmetry
is achieved. Actually, this may be as simple as linearly mapping the actual
range to a symmetric range. That is. If the mean is 0.75 within [0,1], then
use the truncated normal to sample from [0, 1.5], then linearly map any
value from [0.75,1.5] to [0.75,1].
* It could be nice to be able to set a different nb of rounding
digits for each real parameters This number thus would be given in
the parameters.txt file. It should require very few
modifications, only in the sampling (generation uniform and normal)
real "--paramreal=" r (1.5, 4.5) digits(5) | condition
* Compute how many possible parameter configurations (w.r.t. discrete
values) can be sampled.
* Compositional parameter values (parameter values that must sum up to 1 or
100) could be transformed into Euclidean space by using isometric log-ratio
transformation (Egozcue, J.J., et al., 2003. Isometric logratio
transformations for compositional data. Mathematical Geology) or mixture
models (Calibrating continuous multi-objective heuristics using mixture
experiments, José Antonio Vázquez-Rodríguez and Sanja Petrovic)
* Detect the case where all possible parameter configurations can be sampled
and there is still enough budget to do at least one test. In that case, it
is not worth it to sample configurations and do several iterations. It is
better to generate all possible configurations (using expand.grid) and do a
single F/t-race.
* Give an error if the user tries to change certain settings when recovering
(maxExperiments, elite, etc.). Other settings such as hookRun,
trainInstancesDir, instanceFile, etc. are ok to change but we have to be
careful to not override them when recovering.
* Weighted sampling of instances. Use the instance parameters to
provide a biased sampling. Note that one can currently achieve this
simply by repeating the instances in instances.txt and increasing
the number of those that should be sampled more.
* The Friedman test here uses the F distribution instead of the
Chi-square:
http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/friedman.htm
(See also function friedman in package agricolae)
* Quade test as an alternative to Friedman:
http://stats.stackexchange.com/questions/83904/friedman-test-vs-wilcoxon-test
* ANOVA with rank transformation instead of Friedman:
https://seriousstats.wordpress.com/2012/02/14/friedman/
* Maybe separate scenario setup from irace parameters? We should have two
files: scenario.txt and irace-conf.txt. This will make clear what
the user really has to change to define a scenario versus other
parameters for irace.
* A new parameter "--hook-error-mode [exit | stop | retry ]" to
control what to do when a call to target-runner fails. Exit is
exit irace with an error; stop is call something like
pskill(Sys.getpid(), signal=SIGSTOP) to put irace to sleep until
the user send a SIGCONT signal, then retry; and retry is retry
without sleeping before.
* Make more use of /tmp in 'inst/bin/parallel-irace-mpi'
to avoid overloading the shared filesystem.
* When parallel > 1, schedule jobs until all CPUs are full. This is
likely hard to implement. People have done this already with their
own implementation of F-race:
https://bitbucket.org/tunnuz/json2run, so the algorithm is
available.
The difficult part is that one would need to handle manually the
concurrency (launching, balancing, watching processes, collecting
output, etc.) for both a single multi-core machine and a MPI
cluster using the low-level primitives of the parallel and Rmpi
packages. Currently we just call high-level functions that do all
the work for us.
Maybe the futures package can help in this:
https://cran.r-project.org/web/packages/future/vignettes/future-1-overview.html
* Make the output at beginning of iteration conditional on what is
the tuning goal (time or nb evaluation). For instance it can be
confusing for a newbie user to see "timeUsedSoFar" and
"timeEstimate" always equal to zero, without knowing that they have
no meaning for him (if he's tuning for nb evaluation).
* race package has some recent updates. Check with Mauro what he
changed.
* Implement a regression testing testsuite to check that everything
is working as expect before making a release. See:
- tests subdirectory in http://cran.r-project.org/doc/manuals/R-exts.html#Package-subdirectories
- http://cran.r-project.org/web/packages/RUnit/index.html
- http://cran.r-project.org/web/packages/svUnit/index.html
* Boolean command-line parameters require to provide a value
(1/0). It would be nice if they could be enabled just by using
them, that is, in addition to "--mpi 1" and "--mpi 0", accept just
"--mpi".
* Fix issues with digits, rounding and real parameters. If digits=d,
perhaps it is more correct to sample like:
round(rtnorm(1, mean + (10^-d) / 2, lower, upper + 10^-d) - (10^-d)/2, d)
The same applies to runif().
* generateCandidatesNormal() takes too much time with >800 candidates
and >20 parameters (SPEAR testcase).
* Use the options() mechanism of R, that is,
options(irace.debug.level = 1) and getOption("irace.debug.level").
* Add some criteria for the debug levels. For example:
debug-level 1 -> print one-time info about internal parameters, steps and decisions
debug-level 2 -> also print what is executed and received by irace (messes
up the output slightly)
debug-level 3 -> internal information such as memory use
debug-level 4 -> extra verbose internal information
* It may be possible to byte-compile the conditions as they are read.
This may speed up conditionsSatisfied(). However, compiled TRUE conditions
are slower than byte-compiled, so it doesn't seem a big gain to compile those.
* JEREMIE: I would like to try a version of I-Race with a kind of
bounding in the probability vectors, something equivalent somehow
to MaxMin AS for ACO. This could be a better way to avoid
convergence than trying to "flat" the vectors when it is already
too late.
* JEREMIE: The way we handle initial candidates is to simply inject
them among other candidates generated randomly. When we know some
very good initial candidates, it could be interesting to sample the
remaining ones from them. In other words, to consider the
candidates provided as we consider elites between races. This would
easily allows to hot-start a tuning with the results of a previous
one. I have a real need for this.
* JEREMIE: AFAIK, there are very few (no?) state-of-the-art algorithms
in optimization that do not use local search. I am not aware of any
GA, for instance, that cannot be outperformed by its memetic
counter-part. This is a strong motivation to try to include a
local-search component into irace, since irace is performing search
rather globally. I would like to implement a quick&dirty
improvement of the winner at the end, testing values slightly
smaller/bigger, parameter by parameter (at least 'r', 'i' and 'o'
ones). Should not requires much evaluations.
* Somehow adapt how many instances should be seen. Maybe such
choices should be done based on the perceived heterogeneity of the
instances; I think this is an issue we also could discuss. I also
think that one should be able to measure such things e.g. by the
variability of the results of same configurations or by some
measure of variation of ranks on instances for each configuration.
* If we have too many elites, remove some of them. Add a parameter
--max-elites.
* Make the update model convergence take into account the hierarchy
of parameters.
* Error messages start in uppercase and finish with a period.
* Do not add extra parenthesis around simple expressions. Example:
if ( (!file.exists(installDir)) ) has too many parenthesis.
* Add a hook that after sampling post-processes the candidates.
* Enable saving the logfile within a race. Add a parameter that controls the saving frequency.
* Many of our "objects" could be defined as S4 classes. For example:
setClass("Parameters", slots=c(names="character", types="character",
switches = "character", domain = "list",
conditions = "expression",
isFixed = "logical", nbParameters ="numeric",
nbFixed ="numeric", nbVariable = "numeric"))
------------------------------------------------------------------------------
DONE:
* Avoid duplicated elites. [DONE by Manuel]
* Use 'x <- eval(parse(text=paste0("expression","(1,min(ants,10))")))' to parse domains.
[DONE by Manuel]
* Add a programmatic interface to create parameters (or use paradox): [DONE by Manuel]
d = ParametersNew(p_real('initial_temp', 0.02, 5e4, transf="log"),
p_real('restart_temp_ratio', 1e-5, 1, transf="log"),
p_real('visit', 1.001, 3),
p_real('b', quote(visit), quote(visit + 10)),
p_real('accept', -1e3, -5, quote(no_local_search == "ccc")),
p_ord('no_local_search', '1', 'ccc')
)
* Use saveRDS/readRDS instead of load/save (.Rdata) to avoid overriding
userspace. Do the same for ablation logs. [NOT DONE because "Files produced
by `saveRDS` (or `serialize` to a file connection) are not suitable as an
interchange format between machines, for example to download from a
website. The files produced by `save` have a header identifying the file
type and so are better protected against erroneous use."]
* In src/irace/irace.c, use https://github.com/gpakosz/whereami to find the
directory containing the 'irace' executable, then set
".libPaths(c(.libPaths(), EXEC_DIR))" so that we load the package from the
directory holding the executable. [DONE by Manuel]
* use devtools::release() to release
http://r-pkgs.had.co.nz/release.html#release-submission
[DONE by Manuel]
* system2 now has a timeout parameter (R 3.5). Add a scenario parameter
that specifies the maximum time to wait for any run of target-runner to
avoid infinite loops. [DONE by Manuel]
* Error/Warn about unknown irace parameters. [DONE by Manuel]
* irace.Rdata should keep a list with the all the eliteCandidates after each
race. Right now we only keep the best one. [DONE in iterationElites]
* (Patrick Schratz) Yeah I am actually responsible for the CI for ropensci
and deeply involved in this. Usually I’d recommend
https://github.com/ropensci/tic/ to get started. You can test on all major
platforms via Github Actions. [DONE by Manuel]
* https://github.com/ndangtt/irace-to-pimp [DONE by Nguyen Dang]
* https://builder.r-hub.io/ [DONE by Manuel]
* How to run individual tests from the make. Check testthat documentation.
[DONE by Manuel]
* Use testthat package for testing: http://r-pkgs.had.co.nz/tests.html
[DONE by Manuel]
* Add --par integer >= 1 (PAR=), default 1: penalise timeouts or execution
failures with PAR * value. [DONE by Leslie as boundPar]
* Add example of logarithmic/parameter conversions in target-runner.
[We now handle log conversion internally]
* oneParamUpperBound and oneParamLowerBound are too much. We can
have a single function: paramBound that returns
return (as.numeric(parameters[["domain"]][[paramName]]))
then the user can use directly [1] or [2] to select whatever they
want. [More or less DONE]
* Implement some kind of immediate rejection handling, that is, if
target-runner returns +Inf, just discard the candidate immediately
without doing any more tests. [DONE by Manuel]
* It would be useful to report the command-line string passed to irace in the
output (for logging purposes) and save it in the log file (for further
replicability). [DONE by Manuel]
* Print some info about the forbidden rules read. [DONE by Manuel]
* Add command-line option --cluster=[sge|pbs] to choose the type of
cluster. Then, target-runner calls qsub and prints a jobID. jobID must
be a single word ([0-9a-z._-@]+); if not or length(jobID) != 1 or
exit code is != 0, then complain. The option --cluster only chooses
which type of qstat command is invoked. This will leave parsing the
jobID to the user, since it is very fragile. We should put examples
for PBS and SGE in examples/.
[DONE by Manuel]
* Move the README information to the TR (vignette) to avoid duplication.
[DONE by Leslie, Manuel]
* Pass to target-runner/hook-evaluate an instance index that uniquely
identifies each instance. This has two main advantages: 1) we can
schedule jobs in parallel for different instances, e.g., execute
all jobs for the first-test instances in parallel; 2) target-runner can
use the index to assign a unique seed to each instance.
[DONE by Manuel, Leslie]
* Do not print anything about timeEstimate/timeUsedSoFar if it is
never used (is it ever used?)
[DONE by Leslie]
* Add a --verify option that checks whether all files are in place,
all options are valid, and it runs target-runner on a random
configuration on a random instance.
Implement a test mode that allows the user to check his set-up is
correct (checking the configuration can be read, giving the number
of iterations\candidates that would be used, calling target-runner
once...).
[DONE by Leslie]
* Improve R documentation. [DONE by Leslie]
- Properly document all input/output of exported R functions.
- Document tunerConfig and its variables (some of them are not
visible in the command-line, but they should be documented here).
* Read about vignette here: http://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes
What we need to do to convert/use the IRIDIA TR as the vignette?
Move doc/irace.Rnw to inst/doc/. However, the problem is that R
will try to recreate it in CRAN, so we have to include all the
sources. That would mean also the sources for the IRIDIA TR covers.
[DONE by Leslie]
* A new parameter "--target-runner-retries". Number of times to retry a
target-runner call if the output is not ok. Default 0. Inf allowed.
[DONE by Manuel]
* Handle properly fixed parameters in candidatesFile (see FIXME in
readConfigurations.R).
[DONE by Manuel]
* Use irace.error() for user-level errors and irace.assert() for
internal assertions. All stopfinot() should be irace.assert(). Some
stop() should be irace.error(), a few should be irace.assert(). None
of these functions nor cat() require pasting their arguments.
[DONE by Manuel]
* Parameters should be printed and put in the commandline in exactly
the same order given in the parameter file.
[DONE by Manuel]
* Compute and report the heterogeneity of the instances seen so far.
[DONE by Leslie]
* Resample / Randomize only the instances one has seen. When a race
stops, the instances already seen are randomized but the elite
configurations are not reevaluated on them.
[DONE by Leslie]
* Simplify race.R. We do not need most of what is done there.
[DONE by Leslie]
* Convert README to README.md (markdown) so CRAN renders it as HTML.
[DONE by Manuel]
* How to avoid that MPI creates so many log files?
[FIXED in new versions of Rmpi]
* We could simply read a list of configurations with some NA values,
and if any configuration sampled matches those (minus the NA
values), then it is discarded and we resample. If we execute the
restart mechanism before this check, then resampling the same
forbidden configuration will trigger a restart.
[DONE by implementing forbidden configurations as conditions (Manuel)]
* Add '--version'.
[DONE by Manuel]
* Forbidden parameter combinations (following ParamILS/SMAC).
[DONE by Manuel]
* Implement constraints on values that allow the user to specify
restrictions on the values that parameters may take, such as all
candidates must satisfy that a < b. Such constraints would be
specified in a different file "constraints.txt". One way to handle
them would be by sampling and rejection but perhaps more effective
methods could be implemented.
[DONE by Manuel]
* A way to continue a previous irace run.
[DONE by Leslie]
* [DONE] Add errors for stupid parameter files. For example:
x "" r (1e-4, 5e-4) # with digits=2 this is equal to (0, 0)
y "" i (-1, -2) # the first value should be strictly smaller
z "" c ("a", "a") # Two times the same value!
u "" i/r (0, 0) # This is actually a fixed parameter. Fixed
parameters should be represented with a categorical parameter
with exactly one value. This avoids accidental typos.
* [DONE] We should use list$element instead of list[["element"]] for
indexing elements that must exist. This way if we do a typo, the
whole thing fails to run instead of just returning
NULL. Unfortunately, list$__element__ does not work because
__element__ is not a valid R variable name, so I suggest .ID.,
which is valid.
* [DONE by Manuel] We should produce at least the ouput files in R data format
using save()
* A data.frame/matrix with all experiments performed.
* A data.frame/matrix with all candidate results.
* [FIXED] Use the remaining budget always. But how?
* [DONE] Allow to specify candidate configurations. Perhaps from an extra
input file --candidates=candidates.txt, which gives one candidate
per line in a table format:
param1 param2 param3 param4
10 5 none 8.2
9 NA all 8.3
We would have to check that param names correspond to all
parameters given in parameters.txt and that constraints are
satisfied.
* [DONE] The "c" is not "component" but "categorical". We should rename
everything Component to Categorical, just use sed, check the diff
and test. [Jeremie]
* [DONE] If we want to keep the functions for isReal, isInteger, etc. then
you should use them everywhere. I would propose to have a paramType
function that returns the type, and then isReal(type) would just
test (type == "r"). [Jeremie]
* [DONE] Replace the constraints specifications by an R expression
using: eval(parse(text=constraint), list(a=4, b=3))
* In race.R:
- [DONE] I wonder why first.test should be a single number? It seems
reasonable to choose 10 or more, no? I changed it in the check of
the variable first.test
- [DONE] Add a parameter each.test, to set how often the f-test
should be performed, and the corresponding test. For example, if
each.test == 10, the f-test will be performed only after having
seen 10, 20, 30... instances.
* [DONE] Print elite commandlines (that is, pairs switch-value) for easy
copy-paste.
* Remove the c() from all calls to printMsg. [DONE]
* I think the concept of ghosts parameters could be better handled if
we track the size of the domain of a parameter. Then, we do not
sample/update probabilities for parameters whose domain is size
1. Also, "ghost" is confusing, better would be "Fixed" or "Static".[DONE]
* Rename everything as irace instead ifrace (only after merging all
pending branches!). [DONE]
* Reduce the output of the race package. Only report copyright info
once. Do not duplicate information already given by irace.
[DONE by Jeremie]
------------------------------------------------------------------------------