You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Team RGCCA,
I encountered the following error when using rgcca_stability:
Bootstrap samples sanity check...OK
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=09s
Erreur dans x[, y, drop = FALSE] : indice hors limites
Here is an example to reproduce the error:
set.seed(6)
lambdas <- runif(1000, 0.001, 0.05)
data <- apply(matrix(lambdas), 1, FUN = function(x) {rpois(n = 100, lambda = x)}) #generate data with lots of 0 (~RNA-seq count data)
data <- rbind(data1, NA, NA, NA, NA, NA) #add rows of NA
rgcca_res <- rgcca(blocks = list(data,
rnorm(n = 105)),
response = 2,
method = 'sgcca',
sparsity = c(0.3, 1))
stability_res = rgcca_stability(rgcca_res, n_boot = 10)
When trying to reproduce the error, I identified that this behavior only occurs with full rows of missing values (which can happen when blocks are not observed on fully overlapping sets of individuals). In this scenario, the error message is unclear which makes the origin of the error difficult to understand for the user. However, when multiple variables have null sd in the bootstrap samples but there are no full rows of missing data, the behavior is correctly understood and a clear and informative message is sent. The underlying issue could be in rgcca_bootstrap_k or could maybe be caught earlier in generate_resampling.
Do you think this type of error could be caught to avoid misunderstandings?
Thank you :)
Elen
The text was updated successfully, but these errors were encountered:
Hi @Tenenhaus, I thought about it and it relates to some other problems I am facing with TGCCA.
The core of the problem is that we remove variables with null variance since they will not contribute to the objective function, and we might get into trouble if we try to scale such variables. I don't think we really need to remove those variables, they will give zeros in the associated weight vectors anyway. To handle the scaling part, we can take a small epsilon if the std is null to avoid numerical problems, and since the variables would be centered, it would be 0 / epsilon in any case.
It would also solve other problems like defining what is a constant variable for TGCCA or multigroup RGCCA, having bootstrap samples with different variables, a sparsity constant that depends on the number of non constant variables instead of the total number of variables, and outputs having different number of variables than inputs. What do you think about it?
Hi Team RGCCA,
I encountered the following error when using
rgcca_stability
:Here is an example to reproduce the error:
When trying to reproduce the error, I identified that this behavior only occurs with full rows of missing values (which can happen when blocks are not observed on fully overlapping sets of individuals). In this scenario, the error message is unclear which makes the origin of the error difficult to understand for the user. However, when multiple variables have null sd in the bootstrap samples but there are no full rows of missing data, the behavior is correctly understood and a clear and informative message is sent. The underlying issue could be in
rgcca_bootstrap_k
or could maybe be caught earlier ingenerate_resampling
.Do you think this type of error could be caught to avoid misunderstandings?
Thank you :)
Elen
The text was updated successfully, but these errors were encountered: