Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run gprofiler without root/sudo #936

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

pcarella1
Copy link

@pcarella1 pcarella1 commented Dec 5, 2024

Description

This PR adds support for running gprofiler without root/sudo as discussed in issue 905. There are several assumptions and components that I will mention here.

This PR requires a change in the granulate-utils repo that defines the run_in_ns_wrapper function found in PR 265 on that repo Granulate/granulate-utils#265.

Assumptions when running without root:

  1. When running without root, the user must use --pids to select user owned processes only.
  2. The user must direct the log and pids files to a user owned directory (e.g. with --log-file and --pid-file parameters).
  3. The user must set certain system parameters such as kernel.perf_even_paranoid as needed to allow gprofiler to run.
  4. Some of the corner cases that require fallback rw exec directories for POSSIBLE_AP_DIRS may not be resolved.

Components:

  1. Replaced exit/error when is_root check fails in verify_precondiftions and replaced it with this message: "Not running as root, and therefore functionality is limited. Profile is limted to only processes owned by this user that are passed with --pids. Some additional configuration (e.g. perf_event_paranoid) may be configured to operate without root."
  2. Created run_in_ns_wrapper function which bypasses the code to enter name spaces when not root (as we assume we're always in the correct namespace for the processes being profiled)
  3. Added a parameter to the pgrep_maps function to ignore permissions errors. Each time a profiler calls this function, it will check if root and if not, will pass "True".
  4. Redirected the default value of TEMPORARY_STORAGE_PATH to the resources directory.
  5. Added mkdir_owned_user function which is used in main where the TEMPORARY_STORAGE_PATH creates gprofiler_tmp, so that it doesn't throw an error when we aren't root, but still ensures the directory is owned by the current user.

Potential issues:

  1. Is there anything I should add/change in the message that is displayed when the is_root check in verify_preconditions fails? Also, is print() to stdout correct here?
  2. Is it fine to redirect TEMPORARY_STORAGE_PATH to the resources directory even in the default case, or should I add a check to only do this when not root?
  3. Do we need to resolve the fallback rw exec directories for POSSIBLE_AP_DIRS?
  4. I've tested this on two systems and it works on both, but on one of them I receive this warning (though gprofiler completes and produces valid results). I discuss this more in the corresponding granulate-utils PR since that is the source of the error.

[2024-12-02 19:59:55,557] WARNING: gprofiler.profilers.java: Failed to enable proc_events listener for exited Java processes (this does not prevent Java profiling)
Traceback (most recent call last):
File "granulate_utils/linux/proc_events.py", line 222, in start
PermissionError: [Errno 1] Operation not permitted

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "gprofiler/profilers/java.py", line 1395, in start
proc_events.register_exit_callback(self._proc_exit_callback)
File "granulate_utils/linux/proc_events.py", line 272, in wrapper
File "granulate_utils/linux/ns.py", line 305, in run_in_ns_wrapper
File "granulate_utils/linux/ns.py", line 299, in run_in_ns_wrapper
File "granulate_utils/linux/proc_events.py", line 260, in _start_listener
File "granulate_utils/linux/proc_events.py", line 225, in start
PermissionError: This process doesn't have permissions to bind/connect to the process events connector

Related Issue

#905

Motivation and Context

Users of this gprofiler have requested this feature as some cloud instances do not have root access, but still want to profile user owned processes.

How Has This Been Tested?

I ran stress-ng and targeted gprofiler to the stress-ng pids without sudo. It successfully produced flamegraphs
Sample command line: ./build/x86_64/gprofiler --pids 1421864 -o results/ -d 15 --log-file ./gprofiler.log --pid-file ./gprofiler.pid

I have tested this on x86 using scripts/build_x86_64_executable.sh script. Centos 9 Stream w/ kernel 6.6

Also tested using sudo targeting specific pid(s) and system-wide, and it still works.

Was not able to run tests/test.sh as it required apt-get/debian environment.

Screenshots

Checklist:

The code is linted.

I have not updated the README.md doc here. Might need some guidance.

  • I have read the CONTRIBUTING document.
  • I have updated the relevant documentation.
  • I have added tests for new logic.

…exit. Added run_in_ns_wrapper to only run in namespace when root is detected. Updated pgrep_maps to provide parameter that ignores permission errors when not root.
…er_tmp directory. Added pids_to_process to discover_appropriate_perf_event(), so it will not error out on perf record -a while not root. Changed TEMPORARY_STORAGE_PATH to the resources directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file defines is_root, that you've moved to granulate-utils. Remove it here and use the copy from granulate-utils? We avoid maintaining 2 copies.

@@ -82,6 +77,13 @@ def resource_path(relative_path: str = "") -> str:
raise Exception(f"Resource {relative_path!r} not found!") from e


TEMPORARY_STORAGE_PATH = (
f"{resource_path(GPROFILER_DIRECTORY_NAME)}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/tmp is writable for non-root users, and it's best practice to keep our temporary files there. Plus, this path is used as the prefix for async-profiler paths, for which we should keep the path constant across runs of gProfiler - and by using resource_path, you cause it to change.

You might've hit permission errors because you've already ran gProfiler as root, so /tmp/gprofiler_tmp is owned by root now. If you remove that directory, then for a non-root process it would go fine to create a /tmp/gprofiler_tmp directory.

Copy link
Author

@pcarella1 pcarella1 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that mkdir_owned_user expects the parent directory to be owned by the user, which /tmp is not. I can alter mkdir_owned_user to expect the parent to be owned by either the user or root.

Ahh I see this is made irrelevant by your later comment that mkdir_owned_user is not necessary.

I've changed this and it now works fine in /tmp as non-root

and (
line.endswith(b"/maps: No such file or directory")
or line.endswith(b"/maps: No such process")
or b"Permission denied" in line
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please make the check more strict, e.g b"/maps: Permission denied" in line, like previous checks
  2. You're not handling ignore_permission_errors here?

Comment on lines 141 to 145
if is_root():
ignore_permission_errors = False
else:
ignore_permission_errors = True
return pgrep_maps(self.DETECTED_RUBY_PROCESSES_REGEX, ignore_permission_errors)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much more concise:

Suggested change
if is_root():
ignore_permission_errors = False
else:
ignore_permission_errors = True
return pgrep_maps(self.DETECTED_RUBY_PROCESSES_REGEX, ignore_permission_errors)
return pgrep_maps(self.DETECTED_RUBY_PROCESSES_REGEX, ignore_permission_errors=not is_root())

@@ -351,7 +353,7 @@ def pgrep_exe(match: str) -> List[Process]:
return procs


def pgrep_maps(match: str) -> List[Process]:
def pgrep_maps(match: str, ignore_permission_errors: bool = False) -> List[Process]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this parameter, and instead - call is_root here to decide (instead of having all callers forced to call this and pass as an argument)

return statbuf.st_uid == os.geteuid() and statbuf.st_gid == os.getegid()


def mkdir_owned_user(path: Union[str, Path], mode: int = 0o755) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is not needed when we're not root. So - either add a wrapper, which checks if we're root and calls it OR simply mkdirs, or if there's 1-2 calls sites, add such if directly:

if is_root():
   mkdir_owned_root(...)
else:
   mkdir(...)

@@ -95,7 +99,7 @@ def discover_appropriate_perf_event(tmp_dir: Path, stop_event: Event) -> Support
is_dwarf=False,
inject_jit=False,
extra_args=current_extra_args,
processes_to_profile=None,
processes_to_profile=pids,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean anything if we pass only specific PIDs here, permission wise?

@Jongy
Copy link
Contributor

Jongy commented Dec 10, 2024

Is there anything I should add/change in the message that is displayed when the is_root check in verify_preconditions fails? Also, is print() to stdout correct here?

I prefer that the rootless mode will be opt-in, and not a default in case of non-root user (which allows misconfigurations to continue misusing the profiler).
So a flag like --rootless that enabled this behavior & checks that we're not root. And the verify_preconditions() function will suggest using --rootless.

Is it fine to redirect TEMPORARY_STORAGE_PATH to the resources directory even in the default case, or should I add a check to only do this when not root?

I left some comments on the PR about this topic.

Do we need to resolve the fallback rw exec directories for POSSIBLE_AP_DIRS?

Not sure I got you here.

I've tested this on two systems and it works on both, but on one of them I receive this warning (though gprofiler completes and produces valid results). I discuss this more in the corresponding granulate-utils PR since that is the source of the error.

Yeah, it's fine - as I commented on granulate-utils, proc_events are expected to fail in rootless.

… is_root function (now in granulate-utils). Added mkdir_owned_root_wrapper. Moved TEMPORARY_STORAGE_PATH back to /tmp. pgrep_maps root check now inside function
@pcarella1
Copy link
Author

I added a commit that should address all of your comments, but let me know if there is anything I should change. In particular with the new --rootless option and in mkdir_owned_root_wrapper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants