Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cell-by-gene matrix files, not metadata, to count cells of a given type #95

Open
chmreid opened this issue Dec 19, 2019 · 1 comment

Comments

@chmreid
Copy link
Collaborator

chmreid commented Dec 19, 2019

In the find-cell-type-count notebook, the example constructs an ElasticSearch query to find cells matching a given type, then count the number of cells matching that criteria. However, the notebook uses metadata to get cell counts, which is unreliable and confusing (most metadata contains a cell count of 1).

Instead, we should find the matrix file containing the cell-by-gene matrix and count the number of rows in that file in order to obtain the count of the number of cells of that type.

@chmreid chmreid added this to the Q1 2020 Milestone 1 milestone Jan 7, 2020
@chmreid
Copy link
Collaborator Author

chmreid commented Jan 7, 2020

Unfortunately, getting cell counts this way is very inefficient and data-intensive. Getting a cell count requires downloading the matrix file (CSV format) and opening it to determine how many lines it contains. But if an ES query returns thousands of results, we could end up having to download gigabytes of data just to get the cell counts. I really don't understand why this step isn't being done during ingest/upload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant