Ceph: Detect small DB partition sizes and unused partitions #974

lathiat · 2024-09-26T09:04:39Z

A common fault in Ceph deployments is that the DB devices are incorrectly configured (missed or allocate from the wrong device), or not big enough. The majority of the time these would be picked up by looking for:

DB partitions which are obviously far too small, e.g. the default 1GB. Ideally we'd report the DB-to-OSD size ratio informationally
Empty partitions that have not been used
Empty space on a disk that is not partition
Volume groups which are not mostly used (basically the same as empty space on a disk)

pponnuvel · 2024-09-26T09:37:24Z

Shouldn't this be handled by the deployer i.e. the Ceph charms and/or the FE team? A hotsos check might be useful but probably not the most effective place for this.

lathiat · 2024-09-26T09:39:01Z

Yes ideally, but in practice it keeps getting missed. So we need to catch it. Both for analysing new deployments but also detecting the issue on old deployments.

lathiat · 2024-09-26T09:49:16Z

It can also happen because the charm will create OSDs with no DB if it can't find any space, so if you add new OSDs, and the old DB devices were full, a customer could silently have this happen. Or that can happen in a field deployment due to a weird issue even if they designed it right.

dosaboy · 2024-09-26T09:57:00Z

@pponnuvel i agree that the charm should be doing this as a first port of call and we should open a bug on the charm to get this done. In the interim, if it is a small enough addition to the checks we could add this to cover the cases where the charm does not yet support it since it has been cropping up repeatedly in deployments and this will help reduce time of analysis by flagging the issue at the start.

dosaboy · 2024-09-26T10:01:04Z

@lathiat this almost looks like several checks and it might make sense to break it into smaller chunks to make it easier to implement

pponnuvel added the plugin:storage label Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ceph: Detect small DB partition sizes and unused partitions #974

Ceph: Detect small DB partition sizes and unused partitions #974

lathiat commented Sep 26, 2024 •

edited

Loading

pponnuvel commented Sep 26, 2024

lathiat commented Sep 26, 2024

lathiat commented Sep 26, 2024

dosaboy commented Sep 26, 2024

dosaboy commented Sep 26, 2024

Ceph: Detect small DB partition sizes and unused partitions #974

Ceph: Detect small DB partition sizes and unused partitions #974

Comments

lathiat commented Sep 26, 2024 • edited Loading

pponnuvel commented Sep 26, 2024

lathiat commented Sep 26, 2024

lathiat commented Sep 26, 2024

dosaboy commented Sep 26, 2024

dosaboy commented Sep 26, 2024

lathiat commented Sep 26, 2024 •

edited

Loading