Use cell-by-gene matrix files, not metadata, to count cells of a given type #95

chmreid · 2019-12-19T20:42:34Z

In the find-cell-type-count notebook, the example constructs an ElasticSearch query to find cells matching a given type, then count the number of cells matching that criteria. However, the notebook uses metadata to get cell counts, which is unreliable and confusing (most metadata contains a cell count of 1).

Instead, we should find the matrix file containing the cell-by-gene matrix and count the number of rows in that file in order to obtain the count of the number of cells of that type.

The text was updated successfully, but these errors were encountered:

chmreid · 2020-01-07T19:21:03Z

Unfortunately, getting cell counts this way is very inefficient and data-intensive. Getting a cell count requires downloading the matrix file (CSV format) and opening it to determine how many lines it contains. But if an ES query returns thousands of results, we could end up having to download gigabytes of data just to get the cell counts. I really don't understand why this step isn't being done during ingest/upload.

chmreid added the added to sprint label Dec 19, 2019

chmreid added this to the Q1 2020 Milestone 1 milestone Jan 7, 2020

chmreid removed the added to sprint label Jan 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use cell-by-gene matrix files, not metadata, to count cells of a given type #95

Use cell-by-gene matrix files, not metadata, to count cells of a given type #95

chmreid commented Dec 19, 2019 •

edited

Loading

chmreid commented Jan 7, 2020

Use cell-by-gene matrix files, not metadata, to count cells of a given type #95

Use cell-by-gene matrix files, not metadata, to count cells of a given type #95

Comments

chmreid commented Dec 19, 2019 • edited Loading

chmreid commented Jan 7, 2020

chmreid commented Dec 19, 2019 •

edited

Loading