Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first version of script computing the aggregated stats for a release #19

Merged
merged 8 commits into from
Sep 4, 2020

Conversation

sbesson
Copy link
Member

@sbesson sbesson commented Sep 1, 2020

@pwalczysko
Copy link
Contributor

The script seems to be adding newlines after every run, but not taking these into account when the script is run again on the same file. This results in ampty lines.

...
2020-01-16	prod73	0.7.3	12544	1188147		8862811	62872383	158.30338343134298	21.780547	423

2020-03-03	prod80	0.8.0	12569	1195544		9083954	63756762	158.91306092503	22.181109	431

@pwalczysko
Copy link
Contributor

The column of Experiments? seems to being missed by this script, is that intentional ?

2020-06-30	prod84	0.8.4	12935	1210586	1178081	9104040	64886054	180.8749	23.23	358
2020-07-22	prod85	0.8.5	13027	1210586	1178173	9106609	65211327	181.1312	23.24	359 
2020-01-16	prod73	0.7.3	12544	1188147		8862811	62872383	158.30338343134298	21.780547	423

2020-03-03	prod80	0.8.0	12569	1195544		9083954	63756762	158.91306092503	22.181109	431

The first two rows above are from the previous releases.tsv, the two new lines were added by this script.

@sbesson
Copy link
Member Author

sbesson commented Sep 3, 2020

Yes we are now omitting Experiments and Targets as these concepts need to be redefined (but the schema of the spreadsheet is still used elsewhere). I will look into the empty rows while generating the stats for prod87.

scripts/releases.py Outdated Show resolved Hide resolved
@sbesson
Copy link
Member Author

sbesson commented Sep 4, 2020

The last commit in the script fixes the empty line issue by passing a csv file to the pandas.DataFrame.to_csv method. It also introduces extra logic assuming that the script is run against studies.tsv in the idr.openmicroscopy.org repos by checking whether a releases.tsv file is present in the same directory and appending the new release stat to this file. To test it,

venv/bin/python scripts/releases.py /opt/IDR/idr.openmicroscopy.org/_data/studies.tsv --release-date 2020-09-03 --db-size 412.649968287

should add a line to releases.tsv similar to https://github.com/IDR/idr.openmicroscopy.org/blob/88a32319675166cd0959a445d0eb6bfd31f95c61/_data/releases.tsv#L43

Co-authored-by: Mark Carroll <[email protected]>
parser.add_argument("--format", default="tsv", help=(
"Output format, includes 'string', 'csv', 'tsv' (default), and "
"'json'. "
"'tsv' can be appended to the IDR studies.csv file with no further "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line 80 of the script, which contains the verbose help output, seems misleading. A tsv output can be appended to a studies.csv file... ? Should that not be releases.tsv instead ?

scripts/releases.py Outdated Show resolved Hide resolved
@pwalczysko
Copy link
Contributor

Two suggestions for the help text, otherwise works as described, the empty lines are not produced anymore and the simplification of the script commands (no necessity to specify a path to releases.tsv is making things easier, thank you.

@sbesson
Copy link
Member Author

sbesson commented Sep 4, 2020

Thanks @pwalczysko. Pushed an extra commit to clarify the description of the output and include the new workflow. Once this is signed off, the only remaining step might be to update the Submission workflow accordingly.

@dominikl I assume I should be able to close IDR/idr.openmicroscopy.org#92 with a link to the various PRs or do you know of any remaining to do?

scripts/releases.py Outdated Show resolved Hide resolved
scripts/releases.py Outdated Show resolved Hide resolved
scripts/releases.py Outdated Show resolved Hide resolved
@sbesson sbesson merged commit 871419c into IDR:master Sep 4, 2020
@sbesson sbesson deleted the relases_stats branch September 4, 2020 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants