-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stats updates #94
Stats updates #94
Conversation
Run the latest version of the stats.py script
Looks good. There might be only one issue. I should have remarked that on IDR/idr0043-uhlen-humanproteinatlas#32 but got myself confused with the IDR release numbers. If you used the command of that PR, then the figures for HPA run 08 are already included ( |
Sorry @dominikl, I should have clarified my intent. This should be adjusting the size of For the imminent |
idr0086-miron-micrographs experimentD prod85 1161 2 0 0 11 10546 0.004104873039 4104873039 34 120.73155997058824 1018 x 602 x 503 x 2 x 1 | ||
idr0087-paci-nuclearimport experimentA prod85 1157 38 0 0 456 50976 0.04848585023 48485850230 1370 35.391131554744526 640 x 640 x 1 x 3 x 37 | ||
idr0048-abdeladim-chroms experimentA prod86 1201 1 0 0 2 4479 0.129647463006 129647463006 127 1020.8461654015748 11034 x 9271 x 747 x 3 x 1 | ||
idr0085-walsh-mfhrem experimentA prod86 1202 3 0 0 7 15206 0.115773571855 115773571855 25 4630.9428742 2076 x 1681 x 1160 x 2 x 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that these two lines were added at the end of the document, but should they not be rather inserted into the correct line to keep ascending order by the study number ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a requirement. My assumption is that this TSV file should extend as studies get released so the natural order will rather to have the Introduced
column in ascending order.
Consumers of the TSV file like https://idr.openmicroscopy.org/about/studies.html should be able to do the filtering and sorting by their column of choice.
As for the studies.tsv, I think when I did the workflow above and inserted my changes into studies.tsv, then I have following diescrepancies with the changes in this PR
The lower line being my changes, the upper one the changes in this PR. |
Correction on the studies.tsv comment #94 (comment) above: When I take just runs 1-7 for the HPA numbers, as indicated in the comment #94 (comment) above, I have as a diff with this PR only
which amounts to a rounding error only afai can see. Edit: No, sorry, there is a discrepancy in the second number from the left, but this is because I did not change that one at all (not sure how to count that) Edit 2: I have recounted the average file size by dividing the size number with number of files, and now it is really just rounding errors. |
After I have run the script from IDR/idr-utils#19 on my studies.tsv file, and removed the empty spaces between lines and overwritten the present lines with the new blcok created by the script, I have a perfect match on I made 2 comments on the other PR IDR/idr-utils#19 (comment) and IDR/idr-utils#19 (comment) |
@pwalczysko so barring the round errors on HPA (which will be updated with |
No objections
I will try tomorrow |
Re
@sbesson Tried following: On
See below
This results in a smooth run of the scipt and creation of the
|
Summary of changes
recompute the stats for studies published between
prod72
andprod86
- depends on the changes stats.py: fix calculation of Sets idr-utils#18Most changes should be minor and only adjust the data size (and the conversion to TB), number of files, average file size and average image dimensions.
adjust manually the raw data size/number of files for
idr0043
using the numbers from Add size/number of files for all the published and upcoming HPA runs idr0043-uhlen-humanproteinatlas#32regenerated the release stats for
prod73
toprod86
using thereleases.py
introduced in Add first version of script computing the aggregated stats for a release idr-utils#19