You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the find-cell-type-count notebook, the example constructs an ElasticSearch query to find cells matching a given type, then count the number of cells matching that criteria. However, the notebook uses metadata to get cell counts, which is unreliable and confusing (most metadata contains a cell count of 1).
Instead, we should find the matrix file containing the cell-by-gene matrix and count the number of rows in that file in order to obtain the count of the number of cells of that type.
The text was updated successfully, but these errors were encountered:
Unfortunately, getting cell counts this way is very inefficient and data-intensive. Getting a cell count requires downloading the matrix file (CSV format) and opening it to determine how many lines it contains. But if an ES query returns thousands of results, we could end up having to download gigabytes of data just to get the cell counts. I really don't understand why this step isn't being done during ingest/upload.
In the
find-cell-type-count
notebook, the example constructs an ElasticSearch query to find cells matching a given type, then count the number of cells matching that criteria. However, the notebook uses metadata to get cell counts, which is unreliable and confusing (most metadata contains a cell count of 1).Instead, we should find the matrix file containing the cell-by-gene matrix and count the number of rows in that file in order to obtain the count of the number of cells of that type.
The text was updated successfully, but these errors were encountered: