Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: compaction "key not found" on RelMap keys error under some pg_regress edge case #8237

Open
jcsp opened this issue Jul 2, 2024 · 2 comments
Assignees
Labels
c/storage/pageserver Component: storage: pageserver p/high High priority: use for bugs that need prompt attention, such as crashes or possible corruptions t/bug Issue Type: Bug triaged bugs that were already triaged

Comments

@jcsp
Copy link
Contributor

jcsp commented Jul 2, 2024

This is not seen in real systems, so we presume it is some edge case that pg_regress is exercising.

#8232 reproduces it reliably in debug mode on pg15.

Possible hypothesis: some inconsistency between collect_keyspace and database deletion: collect_keyspace is calling relmap_file_key for each database in the dbdir, but database deletion eliminates all such keys for a database.

@jcsp jcsp added t/bug Issue Type: Bug c/storage/pageserver Component: storage: pageserver p/high High priority: use for bugs that need prompt attention, such as crashes or possible corruptions labels Jul 2, 2024
@jcsp
Copy link
Contributor Author

jcsp commented Jul 4, 2024

put_rel_creation has a branch where it inserts into dbdir if the (dbnode, spcnode) doesn't exist, but in that branch it doesn't call put_relmap_file like a normal database creation.

collect_keyspace assumes that if something is in dbdir then it must also have a relmap file -> boom.

@jcsp
Copy link
Contributor Author

jcsp commented Jul 4, 2024

I was trying to hunt down which part of the regression tests creates this scenario: the offending relation is deleted by the time the tests end, so it's not easy to look it up in pg_class.

The weird thing about it is that it's using spcnode==16502, which is not one of the normal two spcnodes (1663, 1664)

@jcsp jcsp added the triaged bugs that were already triaged label Jul 4, 2024
@jcsp jcsp self-assigned this Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver p/high High priority: use for bugs that need prompt attention, such as crashes or possible corruptions t/bug Issue Type: Bug triaged bugs that were already triaged
Projects
None yet
Development

No branches or pull requests

1 participant