Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[microceph][ubuntu] Add changes for microceph collections #3291

Merged
merged 1 commit into from
Jul 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 39 additions & 31 deletions sos/report/plugins/ceph_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,45 +41,53 @@ class Ceph_Common(Plugin, RedHatPlugin, UbuntuPlugin):

# This check will enable the plugin regardless of being
# containerized or not
files = ('/etc/ceph/ceph.conf',)
files = ('/etc/ceph/ceph.conf',
'/var/snap/microceph/*',)

def setup(self):
all_logs = self.get_option("all_logs")

self.add_file_tags({
'.*/ceph.conf': 'ceph_conf',
'/var/log/ceph(.*)?/ceph.log.*': 'ceph_log',
})

if not all_logs:
self.add_copy_spec("/var/log/calamari/*.log",)
microceph_pkg = self.policy.package_manager.pkg_by_name('microceph')
if not microceph_pkg:
self.add_file_tags({
'.*/ceph.conf': 'ceph_conf',
'/var/log/ceph(.*)?/ceph.log.*': 'ceph_log',
})

if not all_logs:
self.add_copy_spec("/var/log/calamari/*.log",)
else:
self.add_copy_spec("/var/log/calamari",)

self.add_copy_spec([
"/var/log/ceph/**/ceph.log",
"/var/log/ceph/**/ceph.audit.log*",
"/var/log/calamari/*.log",
"/etc/ceph/",
"/etc/calamari/",
"/var/lib/ceph/tmp/",
])

self.add_forbidden_path([
"/etc/ceph/*keyring*",
"/var/lib/ceph/*keyring*",
"/var/lib/ceph/*/*keyring*",
"/var/lib/ceph/*/*/*keyring*",
"/var/lib/ceph/osd",
"/var/lib/ceph/mon",
# Excludes temporary ceph-osd mount location like
# /var/lib/ceph/tmp/mnt.XXXX from sos collection.
"/var/lib/ceph/tmp/*mnt*",
"/etc/ceph/*bindpass*"
])
else:
self.add_copy_spec("/var/log/calamari",)

self.add_copy_spec([
"/var/log/ceph/**/ceph.log",
"/var/log/ceph/**/ceph.audit.log*",
"/var/log/calamari/*.log",
"/etc/ceph/",
"/etc/calamari/",
"/var/lib/ceph/tmp/",
])
self.add_copy_spec([
"/var/snap/microceph/common/logs/ceph.log",
"/var/snap/microceph/common/logs/ceph.audit.log",
])

self.add_cmd_output([
"ceph -v",
])

self.add_forbidden_path([
"/etc/ceph/*keyring*",
"/var/lib/ceph/*keyring*",
"/var/lib/ceph/*/*keyring*",
"/var/lib/ceph/*/*/*keyring*",
"/var/lib/ceph/osd",
"/var/lib/ceph/mon",
# Excludes temporary ceph-osd mount location like
# /var/lib/ceph/tmp/mnt.XXXX from sos collection.
"/var/lib/ceph/tmp/*mnt*",
"/etc/ceph/*bindpass*"
])

# vim: set et ts=4 sw=4 :
59 changes: 38 additions & 21 deletions sos/report/plugins/ceph_mon.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@
class CephMON(Plugin, RedHatPlugin, UbuntuPlugin):
"""
This plugin serves to collect information on monitor nodes within a Ceph
cluster. It is designed to collect from several versions of Ceph, including
those versions that serve as the basis for RHCS 4 and RHCS 5.
or microceph cluster. It is designed to collect from several versions of
Ceph, including versions that serve as the basis for RHCS 4 and RHCS 5.

Older versions of Ceph will have collections from locations such as
/var/log/ceph, whereas newer versions (as of this plugin's latest update)
Expand All @@ -37,32 +37,49 @@ class CephMON(Plugin, RedHatPlugin, UbuntuPlugin):
# but by default they are not capable of running various ceph commands in
# this plugin - the `ceph` binary is functional directly on the host
containers = ('ceph-(.*-)?mon.*',)
files = ('/var/lib/ceph/mon/', '/var/lib/ceph/*/mon*')
files = ('/var/lib/ceph/mon/', '/var/lib/ceph/*/mon*',
'/var/snap/microceph/common/data/mon/*')
ceph_version = 0

def setup(self):

self.ceph_version = self.get_ceph_version()

self.add_file_tags({
'.*/ceph.conf': 'ceph_conf',
"/var/log/ceph/(.*/)?ceph-.*mon.*.log": 'ceph_mon_log'
})

self.add_forbidden_path([
"/etc/ceph/*keyring*",
"/var/lib/ceph/**/*keyring*",
# Excludes temporary ceph-osd mount location like
# /var/lib/ceph/tmp/mnt.XXXX from sos collection.
"/var/lib/ceph/**/tmp/*mnt*",
"/etc/ceph/*bindpass*"
])
microceph_pkg = self.policy.package_manager.pkg_by_name('microceph')
if not microceph_pkg:
self.add_file_tags({
'.*/ceph.conf': 'ceph_conf',
"/var/log/ceph/(.*/)?ceph-.*mon.*.log": 'ceph_mon_log'
})

self.add_forbidden_path([
"/etc/ceph/*keyring*",
"/var/lib/ceph/**/*keyring*",
# Excludes temporary ceph-osd mount location like
# /var/lib/ceph/tmp/mnt.XXXX from sos collection.
"/var/lib/ceph/**/tmp/*mnt*",
"/etc/ceph/*bindpass*"
])

self.add_copy_spec([
"/run/ceph/**/ceph-mon*",
"/var/lib/ceph/**/kv_backend",
"/var/log/ceph/**/*ceph-mon*.log"
])

self.add_copy_spec([
"/run/ceph/**/ceph-mon*",
"/var/lib/ceph/**/kv_backend",
"/var/log/ceph/**/*ceph-mon*.log"
])
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating all of the copy specs, could we calculate a path prefix instead?

e.g.

if microceph:
    data_dir = '/var/snap/microceph/common/data'
else:
    data_dir = '/var/lib/ceph'
  
self.add_copy_spec([
    os.path.join(data_dir, 'mon/*/store.db')
])

something like this used to be done for containers although I think it was since mostly replaced with the logic for looking for paths inside containers instead. so most of the code that did similar seems gone now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was avoiding touching any of the existing ceph code, so apart from an else i tried to not change it. I can do it this way, sure..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files we collect aren't the same, for eg the calamari ones are not present in the microceph env, so for eg, in the ceph_common now, (am using the snap detection method here)

        microceph_pkg = self.policy.package_manager.pkg_by_name('microceph')
        if not microceph_pkg:
<snip>

            if not all_logs:
                self.add_copy_spec("/var/log/calamari/*.log",)
            else:
                self.add_copy_spec("/var/log/calamari",)

            self.add_copy_spec([
                "/var/log/ceph/**/ceph.log",
                "/var/log/ceph/**/ceph.audit.log*",
                "/var/log/calamari/*.log",
                "/etc/ceph/",
                "/etc/calamari/",
                "/var/lib/ceph/tmp/",
            ])

            self.add_forbidden_path([
                "/etc/ceph/*keyring*",
                "/var/lib/ceph/*keyring*",
                "/var/lib/ceph/*/*keyring*",
                "/var/lib/ceph/*/*/*keyring*",
                "/var/lib/ceph/osd",
                "/var/lib/ceph/mon",
                # Excludes temporary ceph-osd mount location like
                # /var/lib/ceph/tmp/mnt.XXXX from sos collection.
                "/var/lib/ceph/tmp/*mnt*",
                "/etc/ceph/*bindpass*"
            ])
        else:
            self.add_copy_spec([
                "/var/snap/microceph/common/logs/ceph.log",
                "/var/snap/microceph/common/logs/ceph.audit.log",
            ])

The forbidden paths are also unnecessary for microceph since we don't collect anything but those two logs. So since the microceph log collections are smaller than ceph, and that holds true for the other ceph plugins too, I've separated them out, hence the duplicate add_copy_spec() calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @lathiat, if we have common file paths, then it would be easier to change one line, rather than 2 lines foe anything we are collecting. It just means less error-prone and less items to change if there are nay changes required in the future.

imo, if a file doesn't exist, then there isn't a real problem, it will be skipped by sos anyway. So, having a common set should reduce the amount of items we add

Copy link
Contributor Author

@nkshirsagar nkshirsagar Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at ceph_common as an example, I've tried making the change recommended in the review to have a common PATH variable and use only one code path for the file collections, but I still feel the code is eventually cleaner and easier to follow with the separation between the two plugins with one big "if".

The microceph stuff in ceph_common plugin as of now only collects two log files as opposed to several more in the ceph path,

            self.add_copy_spec([
                "/var/snap/microceph/common/logs/ceph.log",
                "/var/snap/microceph/common/logs/ceph.audit.log",
            ])

The ceph_common plugin as of now collects more stuff, so for eg consider, the only thing in common is marked in <==

        if not microceph_pkg:
            self.add_file_tags({
                '.*/ceph.conf': 'ceph_conf',
                '/var/log/ceph(.*)?/ceph.log.*': 'ceph_log',
            })

        if not all_logs:
                self.add_copy_spec("/var/log/calamari/*.log",)
            else:
                self.add_copy_spec("/var/log/calamari",)

            self.add_copy_spec([
                "/var/log/ceph/**/ceph.log", <==
                "/var/log/ceph/**/ceph.audit.log*", <==
                "/var/log/calamari/*.log",
                "/etc/ceph/",
                "/etc/calamari/",
                "/var/lib/ceph/tmp/",
            ])

            self.add_forbidden_path([
                "/etc/ceph/*keyring*",
                "/var/lib/ceph/*keyring*",
                "/var/lib/ceph/*/*keyring*",
                "/var/lib/ceph/*/*/*keyring*",
                "/var/lib/ceph/osd",
                "/var/lib/ceph/mon",
                # Excludes temporary ceph-osd mount location like
                # /var/lib/ceph/tmp/mnt.XXXX from sos collection.
                "/var/lib/ceph/tmp/*mnt*",

So I think it's still more reasonable to separate it out with a "if" instead of trying to have common code. It's also easier in the future to make changes to either plugin if there's a clean separation.

If I look at ceph_osd, as another eg,

            self.add_copy_spec([
                "/var/snap/microceph/common/data/osd/*",
                "/var/snap/microceph/common/logs/*ceph-osd*.log",
            ])

It's not necessary that microceph would collect a subset of the ceph plugin, the plugins may differ in the files they collect (and they do even now) because the /var/snap/microceph/common/data/osd/* folder in microceph would collect as of now all these files, (minus the keyring which is in the forbidden path)

root@demonax:/var/snap/microceph/common/data/osd/ceph-0# ls
bfm_blocks  bfm_blocks_per_key  bfm_bytes_per_block  bfm_size  block  bluefs  ceph_fsid  fsid  keyring  kv_backend  magic  mkfs_done  ready  require_osd_release  type  whoami
root@demonax:/var/snap/microceph/common/data/osd/ceph-0# 

I think if we have a common path, it would become more complex in the future maintaining this special situation we have, where we are triggering one of two plugins from the one plugin file. So I'd much rather have a clean separation instead of trying to run common code using path variables.

Please let me know whether this makes sense and if the separation is acceptable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, makes sense to me

self.add_forbidden_path([
"/var/snap/microceph/common/**/*keyring*",
"/var/snap/microceph/current/**/*keyring*",
"/var/snap/microceph/common/data/mon/*/store.db",
"/var/snap/microceph/common/state/*",
])

self.add_copy_spec([
"/var/snap/microceph/common/data/mon/*",
"/var/snap/microceph/common/logs/*ceph-mon*.log",
"/var/snap/microceph/current/conf/*",
])

self.add_cmd_output("ceph report", tags="ceph_report")
self.add_cmd_output([
Expand Down
123 changes: 69 additions & 54 deletions sos/report/plugins/ceph_osd.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,62 +31,77 @@ class CephOSD(Plugin, RedHatPlugin, UbuntuPlugin):
plugin_name = 'ceph_osd'
profiles = ('storage', 'virt', 'container')
containers = ('ceph-(.*-)?osd.*',)
files = ('/var/lib/ceph/osd/', '/var/lib/ceph/*/osd*')
files = ('/var/lib/ceph/osd/', '/var/lib/ceph/*/osd*',
'/var/snap/microceph/common/data/osd/*')

def setup(self):

self.add_file_tags({
"/var/log/ceph/(.*/)?ceph-(.*-)?osd.*.log": 'ceph_osd_log',
})

self.add_forbidden_path([
"/etc/ceph/*keyring*",
"/var/lib/ceph/**/*keyring*",
# Excludes temporary ceph-osd mount location like
# /var/lib/ceph/tmp/mnt.XXXX from sos collection.
"/var/lib/ceph/**/tmp/*mnt*",
"/etc/ceph/*bindpass*"
])

# Only collect OSD specific files
self.add_copy_spec([
"/run/ceph/**/ceph-osd*",
"/var/lib/ceph/**/kv_backend",
"/var/log/ceph/**/ceph-osd*.log",
"/var/log/ceph/**/ceph-volume*.log",
])

self.add_cmd_output([
"ceph-disk list",
"ceph-volume lvm list"
])

cmds = [
"bluestore bluefs available",
"config diff",
"config show",
"dump_blacklist",
"dump_blocked_ops",
"dump_historic_ops_by_duration",
"dump_historic_slow_ops",
"dump_mempools",
"dump_ops_in_flight",
"dump_op_pq_state",
"dump_osd_network",
"dump_reservations",
"dump_watchers",
"log dump",
"perf dump",
"perf histogram dump",
"objecter_requests",
"ops",
"status",
"version",
]

self.add_cmd_output(
[f"ceph daemon {i} {c}" for i in self.get_socks() for c in cmds]
)
microceph_pkg = self.policy.package_manager.pkg_by_name('microceph')
if not microceph_pkg:
self.add_file_tags({
"/var/log/ceph/(.*/)?ceph-(.*-)?osd.*.log": 'ceph_osd_log',
})

self.add_forbidden_path([
"/etc/ceph/*keyring*",
"/var/lib/ceph/**/*keyring*",
# Excludes temporary ceph-osd mount location like
# /var/lib/ceph/tmp/mnt.XXXX from sos collection.
"/var/lib/ceph/**/tmp/*mnt*",
"/etc/ceph/*bindpass*"
])

# Only collect OSD specific files
self.add_copy_spec([
"/run/ceph/**/ceph-osd*",
"/var/lib/ceph/**/kv_backend",
"/var/log/ceph/**/ceph-osd*.log",
"/var/log/ceph/**/ceph-volume*.log",
])

self.add_cmd_output([
"ceph-disk list",
"ceph-volume lvm list"
])

cmds = [
"bluestore bluefs available",
"config diff",
"config show",
"dump_blacklist",
"dump_blocked_ops",
"dump_historic_ops_by_duration",
"dump_historic_slow_ops",
"dump_mempools",
"dump_ops_in_flight",
"dump_op_pq_state",
"dump_osd_network",
"dump_reservations",
"dump_watchers",
"log dump",
"perf dump",
"perf histogram dump",
"objecter_requests",
"ops",
"status",
"version",
]

self.add_cmd_output(
[f"ceph daemon {i} {c}" for i in self.get_socks() for c in cmds]
)

else:
# Only collect microceph files, don't run any commands
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we want to run any commands? I would expect to still want all the OSD commands regardless of whether we are using microceph or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in microceph, I can run the ceph commands (on the mon nodes) but not the ceph daemon commands on the OSD nodes that the OSD nodes run for ceph_osd collections.

ubuntu@demonax:~$ sudo -s
root@demonax:/home/ubuntu# ceph -s
  cluster:
    id:     fea9b2ee-20f0-45c1-838e-03bc1d6667bf
    health: HEALTH_OK
 
  services:
    mon: 1 daemons, quorum demonax (age 5w)
    mgr: demonax(active, since 5w)
    osd: 3 osds: 3 up (since 5w), 3 in (since 5w)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 577 KiB
    usage:   240 MiB used, 11 TiB / 11 TiB avail
    pgs:     1 active+clean
 
root@demonax:/home/ubuntu# ps -efa | grep osd
root       34627       1  0 Jun02 ?        00:00:00 /bin/sh /snap/microceph/338/commands/osd.start
root       36645       1  0 Jun02 ?        02:15:10 ceph-osd --cluster ceph --id 0
root       38275       1  0 Jun02 ?        02:17:26 ceph-osd --cluster ceph --id 1
root       39929       1  0 Jun02 ?        02:16:48 ceph-osd --cluster ceph --id 2
root      195108  195052  0 09:23 pts/1    00:00:00 grep --color=auto osd
root@demonax:/home/ubuntu# ceph daemon osd.0 help
Can't get admin socket path: "ceph-conf" not found

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we want to run any commands? I would expect to still want all the OSD commands regardless of whether we are using microceph or not.

actually I think this might be a bug in microceph so I've raised canonical/microceph#160 , if its supposed to work then I'll change the code in ceph_osd to also run the ceph daemon commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll submit a new PR later to add the ceph daemon command collections once canonical/microceph#160 gets resolved.

self.add_forbidden_path([
"/var/snap/microceph/common/**/*keyring*",
"/var/snap/microceph/current/**/*keyring*",
"/var/snap/microceph/common/state/*",
])

self.add_copy_spec([
"/var/snap/microceph/common/data/osd/*",
"/var/snap/microceph/common/logs/*ceph-osd*.log",
])

def get_socks(self):
"""
Expand Down