plotDiffHeatmap : ambiguity when multiple clusterings could match #381

SamGG · 2024-01-05T20:48:01Z

IIUC, when multiple columns of cluster_codes could match the levels of cluster_id, the first one is selected silently.

CATALYST/R/plotDiffHeatmap.R

Lines 140 to 151 in f3e294e

    
           if (is.null(k)) { 
        
               kids <- levels(y$cluster_id) 
        
               same <- vapply(cluster_codes(x), function(u)  
        
                   identical(levels(u), kids), logical(1)) 
        
               if (!any(same))  
        
                   stop("Couldn't match any clustering", 
        
                       " in input data 'x' with results in 'y'.") 
        
               k <- names(cluster_codes(x))[same][1] 
        
           } else { 
        
               k <- .check_k(x, k) 
        
           } 
        
           x$cluster_id <- cluster_ids(x, k)

IMO, if sum(same) > 1 a warning must be raised show the selected column/clustering, or, an error must be raised to force the user to select the clustering column (k argument).

The text was updated successfully, but these errors were encountered:

HelenaLC · 2024-01-08T10:12:58Z

Thanks for this! I agree that this is a weak point. I think the most suitable place for this would actually be in .check_k here, independent of plotDiffHeatmap. Specifically, all functions accepting a k argument typically call k <- .check_k(sce, k) and then use cluster_ids(sce, k) to retrieve the corresponding assignments. So, .check_k could include a warning when multiple clusterings match k, as you suggested - agreed? I'll try to implement this soon, and thank you kindly again.

SamGG · 2024-01-08T12:57:53Z

Thanks for your feedback. Here are my thoughts, probably not crystal clear.

The vapply code I cited above smartly matches the levels of the y diffcyt object with the levels of the x CATALYST object. This goal could not be addressed by .check_k because .check_k works only x.
In fact, I think I raised the current issue because I wrongly tried to hack the CATALYST object in order to manage multiple independent clustering. Having reviewed the .check_k and cluster_ids functions, it's now clear to me that it is not possible to add a column to cluster_codes. cluster_codes and cluster_ids are intimately linked together. The aims of cluster_codes is to store the hierarchical merging of high a resolution clustering into lower resolution clusterings. There could be only one single high resolution clustering in cluster_codes. If I want to add a clustering (a high resolution clustering, e.g. the result of kmeans), I need to replace cluster_ids AND cluster_codes together. That's what you wrote in section 8.2 of the vignette, but I read it too quickly.

So, I think that:

there is no need to change .check_k.
there should be no reason to get a more than one result in the vapply code, but it would be good to raise an error in the case there is more than one result.
we could try to improve the vignette by a) first stating the link/complementary between cluster_ids and cluster_codes and b) giving a simpler example such as kmeans or encompassing the current code in a function that returns a simple vector of numerical values.

We can discuss this offline if you want. Sorry for my misunderstanding.

HelenaLC mentioned this issue Jan 8, 2024

plotDiffHeatmap: avoid failure when only one cluster remains #382

Closed

SamGG mentioned this issue Jan 8, 2024

Using a different clustering per cell #187

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plotDiffHeatmap : ambiguity when multiple clusterings could match #381

plotDiffHeatmap : ambiguity when multiple clusterings could match #381

SamGG commented Jan 5, 2024

HelenaLC commented Jan 8, 2024 •

edited

Loading

SamGG commented Jan 8, 2024

plotDiffHeatmap : ambiguity when multiple clusterings could match #381

plotDiffHeatmap : ambiguity when multiple clusterings could match #381

Comments

SamGG commented Jan 5, 2024

HelenaLC commented Jan 8, 2024 • edited Loading

SamGG commented Jan 8, 2024

HelenaLC commented Jan 8, 2024 •

edited

Loading