Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: reduce ops tracked at per-timeline detail #8245

Merged
merged 1 commit into from
Jul 3, 2024

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Jul 3, 2024

Problem

We record detailed histograms for all page_service op types, which mostly aren't very interesting, but make our prometheus scrapes huge.

Closes: #8223

Summary of changes

  • Only track GetPageAtLsn histograms on a per-timeline granularity. For all other operation types, rely on existing node-wide histograms.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added c/storage/pageserver Component: storage: pageserver a/tech_debt Area: related to tech debt labels Jul 3, 2024
Copy link

github-actions bot commented Jul 3, 2024

3006 tests run: 2891 passed, 0 failed, 115 skipped (full report)


Code coverage* (full report)

  • functions: 32.7% (6932 of 21211 functions)
  • lines: 50.0% (54316 of 108576 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
0e54d27 at 2024-07-03T13:18:33.885Z :recycle:

@jcsp jcsp marked this pull request as ready for review July 3, 2024 13:26
@jcsp jcsp requested a review from a team as a code owner July 3, 2024 13:26
@jcsp jcsp requested a review from VladLazar July 3, 2024 13:26
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jcsp jcsp merged commit ea0b22a into main Jul 3, 2024
71 checks passed
@jcsp jcsp deleted the jcsp/issue-8223-reduce-timeline-metrics branch July 3, 2024 16:27
jcsp added a commit that referenced this pull request Jul 3, 2024
## Problem

The metrics we have today aren't convenient for planning around the
impact of timeline archival on costs.

Closes: #8108

## Summary of changes

- Add metric `pageserver_archive_size`, which indicates the logical
bytes of data which we would expect to write into an archived branch.
- Add metric `pageserver_pitr_history_size`, which indicates the
distance between last_record_lsn and the PITR cutoff.

These metrics are somewhat temporary: when we implement #8088 and
associated consumption metric changes, these will reach a final form.
For now, an "archived" branch is just any branch outside of its parent's
PITR window: later, archival will become an explicit state (which will
_usually_ correspond to falling outside the parent's PITR window).

The overall volume of timeline metrics is something to watch, but we are
removing many more in #8245
than this PR is adding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pageserver: reduce per-timeline histogram metrics
2 participants