Incorrect behaviour of biplot vectors when setting `keepX` #311

vinisalazar · 2024-07-16T03:27:57Z

🐞 Describe the bug:

When running biplot on sparse models, the correlation vectors are showing as right angles. This only happens when keepX is specified.

🔍 reprex results from reproducible example including sessioninfo():

Take the example from the biplot documentation page, but use spca with keepX instead of pca.

This is also reproducible by following the code example in the sPLS documentation page.

library(mixOmics)
data(nutrimouse)

# undergo the pca method
scale.pca.lipid <- pca(nutrimouse$lipid, ncomp = 3, scale = TRUE, center = TRUE) 

# do this instead
scale.pca.lipid <- spca(nutrimouse$lipid, keepX = c(5, 5, 5), ncomp = 3, scale = TRUE, center = TRUE) 

biplot(scale.pca.lipid) # produce the biplot

The same behaviour doesn't occur when not specifying keepX:

scale.pca.lipid <- spca(nutrimouse$lipid, ncomp = 3, scale = TRUE, center = TRUE) 

biplot(scale.pca.lipid) # produce the biplot

🤔 Expected behavior:

The correlation vectors should not be displayed in right angles, but rather to present a behaviour similar as to when keepX is not specified.

The text was updated successfully, but these errors were encountered:

vinisalazar · 2024-07-16T03:42:45Z

I noticed that the problem starts as keepX starts distancing from the original number of features:

scale.pca.lipid <- spca(nutrimouse$lipid, ncomp = 3, scale = TRUE, center = TRUE, keepX=rep(ncol(nutrimouse$lipid) - 10, 3)) 
biplot(scale.pca.lipid)

Bowen0715 · 2024-11-15T14:25:42Z

Hi, @vinisalazar, I don’t think this is a bug.

The behavior you’re observing arises due to the fundamental difference between spca (Sparse Principal Component Analysis) and pca (Principal Component Analysis). Unlike pca, which calculates principal components using all variables, spca imposes sparsity by penalizing some coefficients to zero, as determined by the keepX parameter. The number of non-zero coefficients in each Sparse Principal Component (SPC) is directly controlled by keepX.

When keepX is not explicitly specified, spca defaults to rep(ncol(X), ncomp), meaning all variables are used without sparsity. In this case, spca behaves similarly to pca.

Regarding your observation of correlation vectors forming right angles, this occurs because in spca, when a variable's coefficient is zero in one SPC but not in another, the variable is effectively orthogonal to the first SPC. Hence, these "zero-in-one" variables appear as orthogonal in the biplot. You can better understand this by inspecting the variables' coefficients in each SPC.

Here is an example to illustrate this behavior :

scale.pca.lipid <- spca(nutrimouse$lipid, ncomp = 3, scale = TRUE, center = TRUE,
                        keepX = rep(ncol(nutrimouse$lipid) - 10, 3)) 
biplot(scale.pca.lipid)

######################################################

df1 <- selectVar(scale.pca.lipid, comp = 1)$value
df2 <- selectVar(scale.pca.lipid, comp = 2)$value

rownames1 <- rownames(df1)
rownames2 <- rownames(df2)

common_names <- intersect(rownames1, rownames2)
unique_names1 <- setdiff(rownames1, common_names)
unique_names2 <- setdiff(rownames2, common_names)

all_names <- c(common_names, unique_names1, unique_names2)

merged_df <- data.frame(
  variable = all_names,
  component_1 = c(df1[common_names, 1], df1[unique_names1, 1], rep(0, length(unique_names2))),
  component_2 = c(df2[common_names, 1], rep(0, length(unique_names1)), df2[unique_names2, 1])
)

merged_df

vinisalazar · 2024-11-18T00:31:32Z

Thanks @Bowen0715 for the detailed explanation.

ping @evaham1, not sure if this issue should be closed, if it's an expected behaviour.

vinisalazar added the bug Something isn't working label Jul 16, 2024

vinisalazar assigned aljabadi Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect behaviour of biplot vectors when setting `keepX` #311

Incorrect behaviour of biplot vectors when setting `keepX` #311

vinisalazar commented Jul 16, 2024

vinisalazar commented Jul 16, 2024

Bowen0715 commented Nov 15, 2024

vinisalazar commented Nov 18, 2024

Incorrect behaviour of biplot vectors when setting keepX #311

Incorrect behaviour of biplot vectors when setting keepX #311

Comments

vinisalazar commented Jul 16, 2024

vinisalazar commented Jul 16, 2024

Bowen0715 commented Nov 15, 2024

vinisalazar commented Nov 18, 2024

Incorrect behaviour of biplot vectors when setting `keepX` #311

Incorrect behaviour of biplot vectors when setting `keepX` #311