Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect behaviour of biplot vectors when setting keepX #311

Open
vinisalazar opened this issue Jul 16, 2024 · 3 comments
Open

Incorrect behaviour of biplot vectors when setting keepX #311

vinisalazar opened this issue Jul 16, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@vinisalazar
Copy link


🐞 Describe the bug:

When running biplot on sparse models, the correlation vectors are showing as right angles. This only happens when keepX is specified.


🔍 reprex results from reproducible example including sessioninfo():

Take the example from the biplot documentation page, but use spca with keepX instead of pca.

This is also reproducible by following the code example in the sPLS documentation page.

library(mixOmics)
data(nutrimouse)

# undergo the pca method
scale.pca.lipid <- pca(nutrimouse$lipid, ncomp = 3, scale = TRUE, center = TRUE) 

# do this instead
scale.pca.lipid <- spca(nutrimouse$lipid, keepX = c(5, 5, 5), ncomp = 3, scale = TRUE, center = TRUE) 

biplot(scale.pca.lipid) # produce the biplot

Screenshot 2024-07-16 at 13 20 14

The same behaviour doesn't occur when not specifying keepX:

scale.pca.lipid <- spca(nutrimouse$lipid, ncomp = 3, scale = TRUE, center = TRUE) 

biplot(scale.pca.lipid) # produce the biplot

Screenshot 2024-07-16 at 13 23 38


🤔 Expected behavior:

The correlation vectors should not be displayed in right angles, but rather to present a behaviour similar as to when keepX is not specified.

@vinisalazar vinisalazar added the bug Something isn't working label Jul 16, 2024
@vinisalazar
Copy link
Author

I noticed that the problem starts as keepX starts distancing from the original number of features:

scale.pca.lipid <- spca(nutrimouse$lipid, ncomp = 3, scale = TRUE, center = TRUE, keepX=rep(ncol(nutrimouse$lipid) - 10, 3)) 
biplot(scale.pca.lipid)

Screenshot 2024-07-16 at 13 41 40

@Bowen0715
Copy link

Hi, @vinisalazar, I don’t think this is a bug.

The behavior you’re observing arises due to the fundamental difference between spca (Sparse Principal Component Analysis) and pca (Principal Component Analysis). Unlike pca, which calculates principal components using all variables, spca imposes sparsity by penalizing some coefficients to zero, as determined by the keepX parameter. The number of non-zero coefficients in each Sparse Principal Component (SPC) is directly controlled by keepX.

When keepX is not explicitly specified, spca defaults to rep(ncol(X), ncomp), meaning all variables are used without sparsity. In this case, spca behaves similarly to pca.

Regarding your observation of correlation vectors forming right angles, this occurs because in spca, when a variable's coefficient is zero in one SPC but not in another, the variable is effectively orthogonal to the first SPC. Hence, these "zero-in-one" variables appear as orthogonal in the biplot. You can better understand this by inspecting the variables' coefficients in each SPC.

Here is an example to illustrate this behavior :

scale.pca.lipid <- spca(nutrimouse$lipid, ncomp = 3, scale = TRUE, center = TRUE,
                        keepX = rep(ncol(nutrimouse$lipid) - 10, 3)) 
biplot(scale.pca.lipid)

######################################################

df1 <- selectVar(scale.pca.lipid, comp = 1)$value
df2 <- selectVar(scale.pca.lipid, comp = 2)$value

rownames1 <- rownames(df1)
rownames2 <- rownames(df2)

common_names <- intersect(rownames1, rownames2)
unique_names1 <- setdiff(rownames1, common_names)
unique_names2 <- setdiff(rownames2, common_names)

all_names <- c(common_names, unique_names1, unique_names2)

merged_df <- data.frame(
  variable = all_names,
  component_1 = c(df1[common_names, 1], df1[unique_names1, 1], rep(0, length(unique_names2))),
  component_2 = c(df2[common_names, 1], rep(0, length(unique_names1)), df2[unique_names2, 1])
)

merged_df

image

@vinisalazar
Copy link
Author

Thanks @Bowen0715 for the detailed explanation.

ping @evaham1, not sure if this issue should be closed, if it's an expected behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants