-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine how release statistics should be stored #30
Comments
Hi,
The only reason these tables existed was for caching purposes, as querying
the old db for this info could take down the site ... especially for ADAMK!
The colour bar on the site (green-amber-red = pass-na-fail) used the
release_summary to build the cache (summary) to avoid expensive calls for
each page load.
I think the release_summary was also used by MetaCPAN, to generate their
view of CPAN Testers for each distro. See release-summary.cgi
Cheers,
Barbie.
…--
Birmingham.pm - http://birmingham.pm.org
YAPC Surveys - http://yapc-surveys.org
Perl Jam - http://perljam.info
On Wed, Jun 20, 2018 at 5:44 PM, Doug Bell ***@***.***> wrote:
Presently, the per-release summary statistics are stored in two tables:
release_data and release_summary. These two tables have the exact same
schema, but slightly different uses:
- The release_data table stores one row per test report. One of the
pass, fail, na, unknown columns will have a 1 in it.
- The release_summary table stores one row per distribution version.
The pass, fail, na, and unknown columns will have the count of each
test report grade.
In essence, the release_summary table is the sum of all the related
release_data rows (this is also technically a duplication of the cpanstats
table (which, technically is a duplication of test_report table with data
extracted from the JSON)).
Now that we have a dedicated database server with a few more CPU cycles
than we had previously, we can look at how we store this data: Do we need
the intermediate state of the release_data table, or can we just store
the release_summary? Or, should we avoid the further step of summing the
values and storing them in release_summary and just keep release_data? Or
can we get rid of these tables entirely and just build this data on-the-fly
from cpanstats?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#30>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AADW1iST92oRnFZz64IV4VQ_dUFEh7bLks5t-nvkgaJpZM4Uvn2k>
.
|
Yep, presently the release summary APIs use the But, we may not need to generate and store the derived data anymore. I find it highly unlikely, for the same reasons you mentioned, but it might be possible to do all of this on-the-fly. But, if it ends up that we do need to generate and store the derived data, we might not need both steps to be stored. It might be possible to drop More likely, the release summary data may be able to be generated on-the-fly from the I'm not confident that any improvements can be made here, but it's something we can look into. The smaller the schema we have, the easier it will be to start deriving all this data for other languages (like Perl 6). Also, if we can derive this data easily, we can offer more query options from the API side. |
Presently, the per-release summary statistics are stored in two tables:
release_data
andrelease_summary
. These two tables have the exact same schema, but slightly different uses:release_data
table stores one row per test report. One of thepass
,fail
,na
,unknown
columns will have a1
in it.release_summary
table stores one row per distribution version. Thepass
,fail
,na
, andunknown
columns will have the count of each test report grade.In essence, the
release_summary
table is the sum of all the relatedrelease_data
rows (this is also technically a duplication of thecpanstats
table (which, technically is a duplication oftest_report
table with data extracted from the JSON)).Now that we have a dedicated database server with a few more CPU cycles than we had previously, we can look at how we store this data: Do we need the intermediate state of the
release_data
table, or can we just store therelease_summary
? Or, should we avoid the further step of summing the values and storing them inrelease_summary
and just keeprelease_data
? Or can we get rid of these tables entirely and just build this data on-the-fly fromcpanstats
?The text was updated successfully, but these errors were encountered: