Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#5505 drpt2ir: Integrate Streaming Decoding for Intel-PT Trace #6179

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

dolanzhao
Copy link
Contributor

@dolanzhao dolanzhao commented Jun 28, 2023

This update enhances the drpt2ir module to decode Intel Processor Trace (PT) data using streaming decoding, and optimizes the syscall_pt_trace tool to use a shared perf event file for all syscalls per thread.

  • (1) Integrate streaming decoding
    The drpt2ir module now supports streaming decoding of PT data, which allows it to decode data as it is received without waiting for the entire data stream. However, this method results in each chunk of decoded data missing some instructions at the end, so it cannot be used to decode discontinuous chunks of data that require a precise instruction list. However, it works well for decoding syscall's kernel PT data because the recorded instructions are typically wrapped by noise instructions, so the missing part at the end of each chunk does not affect the middle instructions.

  • (2) Optimized online syscall's kernel PT data recording
    The syscall_pt_trace tool has been optimized to use a shared handle for all syscalls. This means that the tool only needs to open one perf event file, instead of opening a separate file for each syscall. This reduces the overhead of online tracing, and makes it more efficient to record syscall kernel PT data.

Issue: #5505

@dolanzhao dolanzhao changed the title I5505 pt stream decoding i#5505 drpt2ir: Integrate Streaming Decoding for Intel-PT Trace Jun 28, 2023
@dolanzhao dolanzhao marked this pull request as ready for review June 28, 2023 16:14
@dolanzhao dolanzhao requested a review from abhinav92003 June 28, 2023 16:14
clients/drcachesim/drpt2trace/drpt2trace.cpp Outdated Show resolved Hide resolved
clients/drcachesim/drpt2trace/pt2ir.cpp Show resolved Hide resolved
clients/drcachesim/drpt2trace/pt2ir.cpp Outdated Show resolved Hide resolved
clients/drcachesim/drpt2trace/pt2ir.cpp Show resolved Hide resolved
clients/drcachesim/drpt2trace/pt2ir.h Outdated Show resolved Hide resolved
clients/drcachesim/drpt2trace/pt2ir.cpp Outdated Show resolved Hide resolved
clients/drcachesim/drpt2trace/pt2ir.cpp Outdated Show resolved Hide resolved
clients/drcachesim/drpt2trace/pt2ir.cpp Outdated Show resolved Hide resolved
clients/drcachesim/drpt2trace/pt2ir.cpp Outdated Show resolved Hide resolved
DROPTION_SCOPE_ALL, "kernel_tracing_unified_perf", false,
"Open one perf file per thread for kernel tracing.",
"By default, each syscall in offline tracing open its own perf event file. When "
"enabled, a single perf event file is used per thread, streamlining syscall tracing "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe drawbacks of keeping a single perf event file open. E.g. what if there are many threads in the app.

START_PACKED_STRUCTURE
struct _pt_metadata_ext_t {
/**
* Indicates whether all PT raw data chunk is collected via a single perf file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we guarantee that we'll have a single perf file with this strategy? If there are too many threads in the app, we may need to close an inactive thread's perf FD to allow an active thread to get a trace of its upcoming system call.

@@ -799,6 +799,12 @@ droption_t<bool> op_enable_kernel_tracing(
"syscall's PT and metadata to files in -outdir/kernel.raw/ for later offline "
"analysis. And this feature is available only on Intel CPUs that support Intel@ "
"Processor Trace.");
droption_t<bool> op_kernel_tracing_unified_perf(
DROPTION_SCOPE_ALL, "kernel_tracing_unified_perf", false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename this to kernel_tracing_persist_perf_fd or alternatively kernel_tracing_close_perf_fd_after_syscall.

/**
* It indicates whether pt2ir decodes the PT raw trace using streaming decoding.
*/
bool stream_mode;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can an application open any number of perf FDs?

When -kernel_tracing_unified_perf is set, are we assuming that we'll never close the perf event FD for the threads? Is there any limit to how many perf event FDs we can keep open? If there is a limit, then we may want to close the perf event FD for inactive threads (threads not executing at all or threads not executing a system call right now) when we want to free up an FD for a different active thread.

@@ -330,6 +349,16 @@ struct pt2ir_config_t {
sb_config.time_mult = metadata.time_mult;
sb_config.time_zero = metadata.time_zero;

/* When system call PT data is gathered through a unified perf file, each system
* call's PT data might not always begin or conclude with a full PT packet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add under what conditions does PT generate partial PT packets?

* @param pt_data The PT raw trace.
* @param pt_data_size The size of PT raw trace.
* @param drir The drir object.
* @return pt2ir_convert_status_t. If the conversion is successful, the function
* @return pt2ir_convert_status_t. If the convertion is successful, the function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/convertion/conversion/

.* total function return address markers
.* total function argument markers
.* total function return value markers
.* total physical address \+ virtual address marker pairs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other templatex files that avoid adding a separate line for each of these.


/* An Intel PT packet decoder.
* The type mirrors the struct pt_packet_decoder found in
* third_party/libipt/libipt/internal/include/pt_packet_decoder.h.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directory structure may be unknown to upstream.

*/

/* An Intel PT packet decoder.
* The type mirrors the struct pt_packet_decoder found in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekbruening what's the recommended approach for such structs. Do we add them to a separate file with some different copyright banner?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it needs to remain separate and satisfy its license which requires keeping the copyright and license header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants