-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update release stats #92
Comments
In addition we have a spreadsheet which is almost but not quite the same format as these tsv files. It'd be good to make sure the solution here is also correct for the spreadsheet (or maybe we can get rid of it?) |
Part of this is the split between "Plates" and "Datasets". I also often have to figure it out by context. Happy to have the output format from the script be made more explicit.
Bytes from stats.py was my first attempt at a size via SQL. It was pointed out that 1) my query was wrong and 2) it doesn't match what
Yes.
This is a difficult one, and likely since Eleanor left hasn't been maintained or even defined.
Again, this is just an easier to read version of
I think we have some diversity here. I'd suggest
👍 for having the solution work for both. I still use the spreadsheet, so until we have everything in one place I'd be 👎 for getting rid of it. |
A few additional comments,
Re Re csv vs spreadhseet, I am pretty sure the headers were matching when I created the tsv files. If that's not the case, I am all for re-aligning it as it should work as cut-n-paste Proposed actions:
|
See IDR/idr.openmicroscopy.org#92 (comment) Use pandas to sum totals Split idrNNNN-aaaa-bbbb/screenA into separate fields
See IDR/idr.openmicroscopy.org#92 (comment) Use pandas to sum totals Split idrNNNN-aaaa-bbbb/screenA into separate fields
See IDR/idr.openmicroscopy.org#92 (comment) Use pandas to sum totals Split idrNNNN-aaaa-bbbb/screenA into separate fields
I think IDR/idr-utils#16 addresses most of the issues raised above related to For |
After each release the stats have to be updated. Most figures can be acquired via
omero fs usage
andstats.py
script.Problem 1:
studies.tsv wants:
Study | Container | Introduced | Internal ID | Sets | Wells | Experiments (wells for screens, imaging experiments for non-screens) | Targets (genes, small molecules, geographic locations, or combination of factors (idr0019, 26, 34, 38) | Acquisitions | 5D Images | Planes | Size (TB) | Size | # of Files | avg. size (MB) | Avg. Image Dim (XYZCT)
From
stats.py
you'll getContainer | ID | Set | Wells | Images | Planes | Bytes
Example:
idr0052-walther-condensinmap/experimentA | 752 | 44 of 54 | 0 | 282 | 699360 | 85.4 GB
What does
44 of 54
sets mean? What isBytes
, does that have to be used forSize (TB)
andSize
?omero fs usage
give you something likeTotal disk usage: 115773571855 bytes in 25 files
. What about this size? And is the25 files
the# of Files
?The workflow doc has an hql query how to get the
Avg. Image Dim (XYZCT)
, but only for projects not for screens.And how to get
Targets
? As this can be multiple things, can't think of an easy/generic script which can go through any annotation.csv and pull the number of unique 'targets'.Problem 2
releases.tsv wants:
Date | Data release | Code version | Sets | Wells | Experiments | Images | Planes | Size (TB) | Files (Million) | DB Size (GB)
From stats.py you'll get some of it:
Container | ID | Set | Wells | Images | Planes | Bytes
Total | | 13044 | 1213175 | 9150589 | 65571290 | 334.2 TB
But where to get
Files (Million)
from? And how to getDB Size (GB)
?/cc @sbesson wasn't really sure where to open the issue, here (stats) or idr-utils (stats.py script).
The text was updated successfully, but these errors were encountered: