-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[microceph][ubuntu] Add changes for microceph collections #3291
Conversation
Congratulations! One of the builds has completed. 🍾 You can install the built RPMs by following these steps:
Please note that the RPMs should be used only in a testing environment. |
|
Thanks Jose! Will do.. This is a draft PR and it needs more work, I will mark it ready for review once it's completed.
Yes I'll use this appropriately, am still figuring out what to collect without the -all-logs.
iiuc, the usual ceph plugins won't be triggered in a microceph environment. Microceph is ceph running in a snap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With regards to a new plugin for microceph
or not due to the overlap with existing ceph plugins, would it not be possible (or even preferrable) to instead update those existing plugins to enable when the microceph snap is installed?
I was of the opinion its better to have a separate plugin for microceph, and avoid changing the stable ceph plugin code to incorporate microceph in those same (split) plugins. But on the flip side, we'd have to maintain microceph and patch it anytime something changes in the ceph plugin, and also the more I think about it, the split seems unavoidable, otherwise we'd end up duplicating the info in the sos when someone runs sos on the microceph mon node, and then the osd node. Also upon further reflection, its better patching the ceph plugins to incorporate microceph too, so we would not have to then additionally patch sosreport analysis tools (hotsos, xsos, etc) for microceph information. So if its allright having the microceph stuff in the usual ceph folders, like sos_commands/ceph_common , sos_commands/ceph_osd , sos_commands/ceph_mon etc, then I'll modify the PR to make the split, and revert to patching the ceph plugins. |
I can see arguments for both. But I am slightly more inclined to incorporating microceph into the existing ones as the majority of cmds that microceph currently has (or would have in the future) will always be a subset of the existing ones. One thing I'd add is we probably want to keep the microceph commands' outputs in a separate directory. For example, consider a physical machine running a ceph-{mon, mgr, osd} of a cluster also running a different microceph deployed Ceph cluster. So we wouldn't want to put those under ceph_{mon, mgr, osd} directories. So maybe a |
So one of the reasons I didn't want to incorporate microceph into the existing ceph plugins is we would need to trigger 2 plugins from one file. I'm also not sure of how to create the sos_commands/microceph folder to dump all its files in, instead of the default of sos_commands/plugin_name, which would end up dumping microceph files and outputs in sos_commands/ceph_* .. I suppose there's some postprocessing I could do perhaps but I need to investigate further.. Since microceph would also run possibly its mon/osd/mgr/rgw on different nodes, I suppose the split into microceph_mon , microceph_osd, etc is necessary and cannot be avoided. Assuming we're adding microceph collections in the ceph plugins, we'd need to take into consideration when the ceph_* plugin has been triggered due to ceph, or due to microceph, or both, because as Pon pointed out, microceph install can also exist additionally on a regular ceph node. So would that basically be an if condition (if the regular ceph path exists, for eg, vs the microceph path) separating the two file and command collections in the same plugin? I see the lxd plugin determining if a snap is installed, could I use something like that for determining microceph snap being installed?
|
If |
Some testing done for microceph,
|
I am not sure whether we need to collect rgw/mds logs for microceph. The only thing the microceph plugin would collect are the concerned log files (no daemon commands would run in microceph env), which can be asked for individually instead of having them run a sosreport. So for now I am marking this PR ready for review for the microceph changes for mon,mgr and osd. I am sure as microceph evolves, there will be additional changes required, and we can handle the mds/rgw bits at that time. Please let me know what you think @pponnuvel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good, just a note about path checks. We can use Plugin.path_exists()
here to account for non-/ sysroots, such as when sos is run in a container.
sos/report/plugins/ceph_common.py
Outdated
|
||
if not all_logs: | ||
self.add_copy_spec("/var/log/calamari/*.log",) | ||
if not self.path_exists('/var/snap/microceph/common'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack this can be done and in fact was thinking about this approach in the last part of #3291 (comment) , I will change it to this approach
) | ||
|
||
else: | ||
# Only collect microceph files, don't run any commands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we want to run any commands? I would expect to still want all the OSD commands regardless of whether we are using microceph or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in microceph, I can run the ceph commands (on the mon nodes) but not the ceph daemon commands on the OSD nodes that the OSD nodes run for ceph_osd collections.
ubuntu@demonax:~$ sudo -s
root@demonax:/home/ubuntu# ceph -s
cluster:
id: fea9b2ee-20f0-45c1-838e-03bc1d6667bf
health: HEALTH_OK
services:
mon: 1 daemons, quorum demonax (age 5w)
mgr: demonax(active, since 5w)
osd: 3 osds: 3 up (since 5w), 3 in (since 5w)
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 577 KiB
usage: 240 MiB used, 11 TiB / 11 TiB avail
pgs: 1 active+clean
root@demonax:/home/ubuntu# ps -efa | grep osd
root 34627 1 0 Jun02 ? 00:00:00 /bin/sh /snap/microceph/338/commands/osd.start
root 36645 1 0 Jun02 ? 02:15:10 ceph-osd --cluster ceph --id 0
root 38275 1 0 Jun02 ? 02:17:26 ceph-osd --cluster ceph --id 1
root 39929 1 0 Jun02 ? 02:16:48 ceph-osd --cluster ceph --id 2
root 195108 195052 0 09:23 pts/1 00:00:00 grep --color=auto osd
root@demonax:/home/ubuntu# ceph daemon osd.0 help
Can't get admin socket path: "ceph-conf" not found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we want to run any commands? I would expect to still want all the OSD commands regardless of whether we are using microceph or not.
actually I think this might be a bug in microceph so I've raised canonical/microceph#160 , if its supposed to work then I'll change the code in ceph_osd to also run the ceph daemon commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll submit a new PR later to add the ceph daemon command collections once canonical/microceph#160 gets resolved.
"/var/lib/ceph/**/kv_backend", | ||
"/var/log/ceph/**/*ceph-mon*.log" | ||
]) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of duplicating all of the copy specs, could we calculate a path prefix instead?
e.g.
if microceph:
data_dir = '/var/snap/microceph/common/data'
else:
data_dir = '/var/lib/ceph'
self.add_copy_spec([
os.path.join(data_dir, 'mon/*/store.db')
])
something like this used to be done for containers although I think it was since mostly replaced with the logic for looking for paths inside containers instead. so most of the code that did similar seems gone now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was avoiding touching any of the existing ceph code, so apart from an else i tried to not change it. I can do it this way, sure..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The files we collect aren't the same, for eg the calamari ones are not present in the microceph env, so for eg, in the ceph_common now, (am using the snap detection method here)
microceph_pkg = self.policy.package_manager.pkg_by_name('microceph')
if not microceph_pkg:
<snip>
if not all_logs:
self.add_copy_spec("/var/log/calamari/*.log",)
else:
self.add_copy_spec("/var/log/calamari",)
self.add_copy_spec([
"/var/log/ceph/**/ceph.log",
"/var/log/ceph/**/ceph.audit.log*",
"/var/log/calamari/*.log",
"/etc/ceph/",
"/etc/calamari/",
"/var/lib/ceph/tmp/",
])
self.add_forbidden_path([
"/etc/ceph/*keyring*",
"/var/lib/ceph/*keyring*",
"/var/lib/ceph/*/*keyring*",
"/var/lib/ceph/*/*/*keyring*",
"/var/lib/ceph/osd",
"/var/lib/ceph/mon",
# Excludes temporary ceph-osd mount location like
# /var/lib/ceph/tmp/mnt.XXXX from sos collection.
"/var/lib/ceph/tmp/*mnt*",
"/etc/ceph/*bindpass*"
])
else:
self.add_copy_spec([
"/var/snap/microceph/common/logs/ceph.log",
"/var/snap/microceph/common/logs/ceph.audit.log",
])
The forbidden paths are also unnecessary for microceph since we don't collect anything but those two logs. So since the microceph log collections are smaller than ceph, and that holds true for the other ceph plugins too, I've separated them out, hence the duplicate add_copy_spec() calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @lathiat, if we have common file paths, then it would be easier to change one line, rather than 2 lines foe anything we are collecting. It just means less error-prone and less items to change if there are nay changes required in the future.
imo, if a file doesn't exist, then there isn't a real problem, it will be skipped by sos anyway. So, having a common set should reduce the amount of items we add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at ceph_common as an example, I've tried making the change recommended in the review to have a common PATH variable and use only one code path for the file collections, but I still feel the code is eventually cleaner and easier to follow with the separation between the two plugins with one big "if".
The microceph stuff in ceph_common plugin as of now only collects two log files as opposed to several more in the ceph path,
self.add_copy_spec([
"/var/snap/microceph/common/logs/ceph.log",
"/var/snap/microceph/common/logs/ceph.audit.log",
])
The ceph_common plugin as of now collects more stuff, so for eg consider, the only thing in common is marked in <==
if not microceph_pkg:
self.add_file_tags({
'.*/ceph.conf': 'ceph_conf',
'/var/log/ceph(.*)?/ceph.log.*': 'ceph_log',
})
if not all_logs:
self.add_copy_spec("/var/log/calamari/*.log",)
else:
self.add_copy_spec("/var/log/calamari",)
self.add_copy_spec([
"/var/log/ceph/**/ceph.log", <==
"/var/log/ceph/**/ceph.audit.log*", <==
"/var/log/calamari/*.log",
"/etc/ceph/",
"/etc/calamari/",
"/var/lib/ceph/tmp/",
])
self.add_forbidden_path([
"/etc/ceph/*keyring*",
"/var/lib/ceph/*keyring*",
"/var/lib/ceph/*/*keyring*",
"/var/lib/ceph/*/*/*keyring*",
"/var/lib/ceph/osd",
"/var/lib/ceph/mon",
# Excludes temporary ceph-osd mount location like
# /var/lib/ceph/tmp/mnt.XXXX from sos collection.
"/var/lib/ceph/tmp/*mnt*",
So I think it's still more reasonable to separate it out with a "if" instead of trying to have common code. It's also easier in the future to make changes to either plugin if there's a clean separation.
If I look at ceph_osd, as another eg,
self.add_copy_spec([
"/var/snap/microceph/common/data/osd/*",
"/var/snap/microceph/common/logs/*ceph-osd*.log",
])
It's not necessary that microceph would collect a subset of the ceph plugin, the plugins may differ in the files they collect (and they do even now) because the /var/snap/microceph/common/data/osd/* folder in microceph would collect as of now all these files, (minus the keyring which is in the forbidden path)
root@demonax:/var/snap/microceph/common/data/osd/ceph-0# ls
bfm_blocks bfm_blocks_per_key bfm_bytes_per_block bfm_size block bluefs ceph_fsid fsid keyring kv_backend magic mkfs_done ready require_osd_release type whoami
root@demonax:/var/snap/microceph/common/data/osd/ceph-0#
I think if we have a common path, it would become more complex in the future maintaining this special situation we have, where we are triggering one of two plugins from the one plugin file. So I'd much rather have a clean separation instead of trying to run common code using path variables.
Please let me know whether this makes sense and if the separation is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, makes sense to me
Collect the microceph data in the ceph plugins. Signed-off-by: Nikhil Kshirsagar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, based on all the discussions
mgr also needs additions for microceph, so I will add that in a separate PR later |
This work needs #3312 because without the wildcard support for paths in files, it will not trigger the microceph plugin. Initial testing was mistakenly done with -o passed to sos report, which resulted in the plugins triggering in spite of the fact that they should not trigger, due to the wildcard used in
Tested to confirm on a microceph node that microceph data isnt collected in current sos main.
|
Add a new plugin to enable collecting Ubuntu microceph data
Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines