\subpage page_home
diff --git a/api/docs/license.dox b/api/docs/license.dox
index 1a9f20ea5ca..511c0bce5b2 100644
--- a/api/docs/license.dox
+++ b/api/docs/license.dox
@@ -1,5 +1,5 @@
/* **********************************************************
- * Copyright (c) 2010-2021 Google, Inc. All rights reserved.
+ * Copyright (c) 2010-2024 Google, Inc. All rights reserved.
* Copyright (c) 2009-2010 VMware, Inc. All rights reserved.
* **********************************************************/
@@ -46,7 +46,7 @@ on this page is licensed under the following BSD license:
\verbatim
-Copyright (c) 2010-2013 Google, Inc. licensed under the terms of the BSD. All other rights reserved.
+Copyright (c) 2010-2024 Google, Inc. licensed under the terms of the BSD. All other rights reserved.
Copyright (c) 2000-2010 VMware, Inc. licensed under the terms of the BSD. All other rights reserved.
@@ -599,6 +599,189 @@ DAMAGES.
END OF TERMS AND CONDITIONS
\endverbatim
+
+***************************************************************************
+\section sec_lgpl3_licenses drsyms Extension use of elfutils: LGPL 3
+
+The \p drsyms Extension (see \ref page_drsyms) on Linux is linked with
+static libraries from the [elfutils
+project](https://sourceware.org/elfutils/). The source code for
+elfutils is available at git://sourceware.org/git/elfutils.git. We
+choose the LGPL 3 license (elfutils offers that as a choice) for our
+use of these libraries. The \p drsyms Extension and the elfutils static
+libraries are provided as libraries distinct from the rest of
+DynamoRIO. The details of the LGPL 3 license are below:
+
+\verbatim
+ GNU LESSER GENERAL PUBLIC LICENSE
+ Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc.
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+
+ This version of the GNU Lesser General Public License incorporates
+the terms and conditions of version 3 of the GNU General Public
+License, supplemented by the additional permissions listed below.
+
+ 0. Additional Definitions.
+
+ As used herein, "this License" refers to version 3 of the GNU Lesser
+General Public License, and the "GNU GPL" refers to version 3 of the GNU
+General Public License.
+
+ "The Library" refers to a covered work governed by this License,
+other than an Application or a Combined Work as defined below.
+
+ An "Application" is any work that makes use of an interface provided
+by the Library, but which is not otherwise based on the Library.
+Defining a subclass of a class defined by the Library is deemed a mode
+of using an interface provided by the Library.
+
+ A "Combined Work" is a work produced by combining or linking an
+Application with the Library. The particular version of the Library
+with which the Combined Work was made is also called the "Linked
+Version".
+
+ The "Minimal Corresponding Source" for a Combined Work means the
+Corresponding Source for the Combined Work, excluding any source code
+for portions of the Combined Work that, considered in isolation, are
+based on the Application, and not on the Linked Version.
+
+ The "Corresponding Application Code" for a Combined Work means the
+object code and/or source code for the Application, including any data
+and utility programs needed for reproducing the Combined Work from the
+Application, but excluding the System Libraries of the Combined Work.
+
+ 1. Exception to Section 3 of the GNU GPL.
+
+ You may convey a covered work under sections 3 and 4 of this License
+without being bound by section 3 of the GNU GPL.
+
+ 2. Conveying Modified Versions.
+
+ If you modify a copy of the Library, and, in your modifications, a
+facility refers to a function or data to be supplied by an Application
+that uses the facility (other than as an argument passed when the
+facility is invoked), then you may convey a copy of the modified
+version:
+
+ a) under this License, provided that you make a good faith effort to
+ ensure that, in the event an Application does not supply the
+ function or data, the facility still operates, and performs
+ whatever part of its purpose remains meaningful, or
+
+ b) under the GNU GPL, with none of the additional permissions of
+ this License applicable to that copy.
+
+ 3. Object Code Incorporating Material from Library Header Files.
+
+ The object code form of an Application may incorporate material from
+a header file that is part of the Library. You may convey such object
+code under terms of your choice, provided that, if the incorporated
+material is not limited to numerical parameters, data structure
+layouts and accessors, or small macros, inline functions and templates
+(ten or fewer lines in length), you do both of the following:
+
+ a) Give prominent notice with each copy of the object code that the
+ Library is used in it and that the Library and its use are
+ covered by this License.
+
+ b) Accompany the object code with a copy of the GNU GPL and this license
+ document.
+
+ 4. Combined Works.
+
+ You may convey a Combined Work under terms of your choice that,
+taken together, effectively do not restrict modification of the
+portions of the Library contained in the Combined Work and reverse
+engineering for debugging such modifications, if you also do each of
+the following:
+
+ a) Give prominent notice with each copy of the Combined Work that
+ the Library is used in it and that the Library and its use are
+ covered by this License.
+
+ b) Accompany the Combined Work with a copy of the GNU GPL and this license
+ document.
+
+ c) For a Combined Work that displays copyright notices during
+ execution, include the copyright notice for the Library among
+ these notices, as well as a reference directing the user to the
+ copies of the GNU GPL and this license document.
+
+ d) Do one of the following:
+
+ 0) Convey the Minimal Corresponding Source under the terms of this
+ License, and the Corresponding Application Code in a form
+ suitable for, and under terms that permit, the user to
+ recombine or relink the Application with a modified version of
+ the Linked Version to produce a modified Combined Work, in the
+ manner specified by section 6 of the GNU GPL for conveying
+ Corresponding Source.
+
+ 1) Use a suitable shared library mechanism for linking with the
+ Library. A suitable mechanism is one that (a) uses at run time
+ a copy of the Library already present on the user's computer
+ system, and (b) will operate properly with a modified version
+ of the Library that is interface-compatible with the Linked
+ Version.
+
+ e) Provide Installation Information, but only if you would otherwise
+ be required to provide such information under section 6 of the
+ GNU GPL, and only to the extent that such information is
+ necessary to install and execute a modified version of the
+ Combined Work produced by recombining or relinking the
+ Application with a modified version of the Linked Version. (If
+ you use option 4d0, the Installation Information must accompany
+ the Minimal Corresponding Source and Corresponding Application
+ Code. If you use option 4d1, you must provide the Installation
+ Information in the manner specified by section 6 of the GNU GPL
+ for conveying Corresponding Source.)
+
+ 5. Combined Libraries.
+
+ You may place library facilities that are a work based on the
+Library side by side in a single library together with other library
+facilities that are not Applications and are not covered by this
+License, and convey such a combined library under terms of your
+choice, if you do both of the following:
+
+ a) Accompany the combined library with a copy of the same work based
+ on the Library, uncombined with any other library facilities,
+ conveyed under the terms of this License.
+
+ b) Give prominent notice with the combined library that part of it
+ is a work based on the Library, and explaining where to find the
+ accompanying uncombined form of the same work.
+
+ 6. Revised Versions of the GNU Lesser General Public License.
+
+ The Free Software Foundation may publish revised and/or new versions
+of the GNU Lesser General Public License from time to time. Such new
+versions will be similar in spirit to the present version, but may
+differ in detail to address new problems or concerns.
+
+ Each version is given a distinguishing version number. If the
+Library as you received it specifies that a certain numbered version
+of the GNU Lesser General Public License "or any later version"
+applies to it, you have the option of following the terms and
+conditions either of that published version or of any later version
+published by the Free Software Foundation. If the Library as you
+received it does not specify a version number of the GNU Lesser
+General Public License, you may choose any version of the GNU Lesser
+General Public License ever published by the Free Software Foundation.
+
+ If the Library as you received it specifies that a proxy can decide
+whether future versions of the GNU Lesser General Public License shall
+apply, that proxy's public statement of acceptance of any version is
+permanent authorization for you to choose that version for the
+Library.
+
+\endverbatim
+
+
***************************************************************************
\section sec_gpl_licenses Code Coverage genhtml: GPL 2
diff --git a/api/docs/release.dox b/api/docs/release.dox
index 1fa147bd9c7..d73b1ed0848 100644
--- a/api/docs/release.dox
+++ b/api/docs/release.dox
@@ -1,5 +1,5 @@
/* ******************************************************************************
- * Copyright (c) 2010-2023 Google, Inc. All rights reserved.
+ * Copyright (c) 2010-2024 Google, Inc. All rights reserved.
* Copyright (c) 2011 Massachusetts Institute of Technology All rights reserved.
* Copyright (c) 2008-2010 VMware, Inc. All rights reserved.
* ******************************************************************************/
@@ -142,8 +142,15 @@ changes:
refers to timestamps and direct switches, which is what most users should want.
- Rename the macro INSTR_CREATE_mul_sve to INSTR_CREATE_mul_sve_imm to
differentiate it from the other SVE MUL instructions.
+ - Renamed a protected data member in #dynamorio::drmemtrace::analyzer_tmpl_t from
+ merged_interval_snapshots_ to whole_trace_interval_snapshots_ (may be relevant for
+ users sub-classing analyzer_tmpl_t).
+ - Converted #dynamorio::drmemtrace::analysis_tool_tmpl_t::interval_state_snapshot_t
+ into a class with all its data members marked private with public accessor functions.
Further non-compatibility-affecting changes include:
+ - Added DWARF-5 support to the drsyms library by linking in 4 static libraries
+ from elfutils. These libraries have LGPL licenses.
- Added raw2trace support to inject system call kernel trace templates collected from
elsewhere (e.g., QEMU, Gem5) into the user-space drmemtrace traces at the
corresponding system call number marker. This is done by specifying the path to the
@@ -189,6 +196,22 @@ Further non-compatibility-affecting changes include:
- Added opportunity to run multiple drcachesim analysis tools simultaneously.
- Added support of loading separately-built analysis tools to drcachesim dynamically.
- Added instr_is_opnd_store_source().
+ - Added kernel context switch sequence injection support to the drmemtrace scheduler.
+ - Added dr_running_under_dynamorio().
+ - Added instr_get_category_name() API that returns the string version (as char*) of a
+ category.
+ - Added #dynamorio::drmemtrace::TRACE_MARKER_TYPE_VECTOR_LENGTH marker to indicate the
+ current vector length for architectures with a hardware defined or runtime changeable
+ vector length (such as AArch64's SVE scalable vectors).
+ - Added a new drmemtrace analyzer option \p -interval_instr_count that enables trace
+ analyzer interval results for every given count of instrs in each shard. This mode
+ does not support merging the shard interval snapshots to output the whole-trace
+ interval snapshots. Instead, the print_interval_results() API is called separately
+ for each shard with the interval state snapshots of that shard.
+ - Added a new finalize_interval_snapshots() API to
+ #dynamorio::drmemtrace::analysis_tool_t to allow the tool to make holistic
+ adjustments to the interval snapshots after all have been generated, and before
+ they are used for merging across shards (potentially), and printing the results.
**************************************************
@@ -791,7 +814,7 @@ Further non-compatibility-affecting changes include:
executed on along with an optional simulator scheduling feature to
schedule threads on simulated cores to match the recorded execution on
physical cpus.
- - Added #DR_DISALLOW_UNSAFE_STATIC and dr_disallow_unsafe_static_behavior()
+ - Added #DR_DISALLOW_UNSAFE_STATIC and dr_allow_unsafe_static_behavior()
for sanity checks to help support statically-linked clients.
- Added drmgr_register_pre_syscall_event_user_data() and
drmgr_unregister_pre_syscall_event_user_data() to enable passing of user data.
diff --git a/api/docs/test_suite.dox b/api/docs/test_suite.dox
index 81dd856f9e0..c6f6e396ca7 100644
--- a/api/docs/test_suite.dox
+++ b/api/docs/test_suite.dox
@@ -1,5 +1,5 @@
/* ******************************************************************************
- * Copyright (c) 2010-2021 Google, Inc. All rights reserved.
+ * Copyright (c) 2010-2024 Google, Inc. All rights reserved.
* ******************************************************************************/
/*
@@ -71,7 +71,76 @@ Our CI setups provide "trybot" functionality for nearly every platform via pull
## Debugging Tests on Github Actions Runner
-Test failures that happen only on Github Actions and are not reproducible locally can be hard to debug. Fortunately, there's a way to SSH into a Github Actions runner to debug the test. This can be done using `tmate`: https://github.com/marketplace/actions/debugging-with-tmate. Follow instructions on the page to make a temporary change to the Github Actions workflow config in your branch, and use the link output by `tmate` to ssh into the runner. You can install `gdb` if needed on the runner. `tmate` also allows web shell access; note that you may need to press `q` one time if the web page doesn't show anything.
+Test failures that happen only on Github Actions and are not reproducible
+locally can be hard to debug. Fortunately, there's a way to SSH into a Github
+Actions runner to debug the test. This can be done using `tmate`:
+https://github.com/marketplace/actions/debugging-with-tmate.
+
+Using tmate requires sudo on the Actions runners which is only available with
+pull requests on branches within the repository; it will not work with pull
+requests created from external forks of the repository.
+
+First, identify the Actions workflow file which contains the job with the
+failure. From the Actions run page with the failing test (usually reached from
+the links in the job runs for a pull request), click on "Workflow file" in the
+bottom of the left sidebar. It will present the file contents with its path at
+the top. It will be something like ".github/workflows/ci-windows.yml". That is
+the path within the git repository.
+
+Next, go and edit that file in your branch. Delete all jobs except the failing
+one (just remove those lines from the file). For the failing one, add these 3
+lines as a new step right after the "Run Suite" step but before the "Send
+failure email step". Be sure to match the surrounding indentation as
+indentation matters for .yml files.
+
+ - name: Setup tmate session
+ if: ${{ failure() }}
+ uses: mxschmitt/action-tmate@v3
+
+Next, delete all the other workflow files in the ".github/workflows/" directory.
+This will save you time and save resources in general, to only run the single
+target job.
+
+You can see an example of these workflow file deletions and edits in this
+commit:
+https://github.com/DynamoRIO/dynamorio/pull/6414/commits/08b96200cdb9fd4d39a4c89e2aa9eafed92027f4
+
+If you want to focus on just one test, you can use a label like the
+`TMATE_DEBUG` label in the linked commit to run only that one test, but that is
+not necessary. Pasting the key lines for that here:
+
+In runsuite.cmake after the arg parsing:
+
+```
+set(extra_ctest_args INCLUDE_LABEL TMATE_DEBUG) # TEMPORARY
+```
+
+At the bottom of suite/tests/CMakeLists.txt, add the label to the target test:
+
+```
+set_tests_properties(code_api|tool.drcacheoff.burst_traceopts PROPERTIES
+ LABELS TMATE_DEBUG) # TEMPORARY
+```
+
+Commit these changes with a title that starts with "DO NOT COMMIT" so it’s clear
+these are temporary debugging changes, and send to Github with `git review`.
+
+Now go to your pull request page and click on the details for the target
+workflow. Wait for it to reach the "Setup tmate session" step (this may take
+from a few minutes to 15-20 minutes depending on the job; you can look at prior
+instances of jobs to see how long they typically take). It will print a command
+like " ssh JhJp879nThXEKUAPWEGuatJ3J@sfo2.tmate.io". Run that command and you
+will have an interactive shell in the base build directory.
+
+By default, it may start the ssh in tmux review mode. You would need to quit
+out of that to get to the terminal. The shell should last until the action
+times out which is after 6 hours, or until you terminate the initial connection.
+
+The connection is through tmux, so you can create new panes and shells using
+tmux commands. You can install `gdb` if needed.
+
+`tmate` also allows web shell access; note that you may need to press `q` one
+time if the web page doesn't show anything.
# Regression Test Suite
@@ -315,6 +384,18 @@ The comments at the top of runsuite_ssh.cmake describe additional options.
Unfortunately our test suite is not as clean as it could be. Some tests can be flaky and while they pass on the machines of the existing developers and on our automated test machines, they may fail occasionally on a new machine. Please search the issue tracker before filing a new issue to see if a test failure has already been seen once before. We welcome contributions to fix flaky tests.
+Flaky tests are marked in one of two ways:
+
+ - Append "_FLAKY" to the test's name
+ - See [suite/tests/CMakeLists.txt](https://github.com/DynamoRIO/dynamorio/blob/master/suite/tests/CMakeLists.txt) for examples
+ - Mention the test in runsuite_wrapper.pl
+ - See [suite/runsuite_wrapper.pl](https://github.com/DynamoRIO/dynamorio/blob/master/suite/runsuite_wrapper.pl) for examples
+
+In both cases make sure an issue is filed to fix the test and mention the issue
+at the place the test is marked as flaky.
+
+The latter is preferred for tests that should be fixed first.
+
## Missing Tests
Some features that were tested in our pre-cmake infrastructure have not been ported to cmake. We welcome contributions in this area:
diff --git a/api/docs/tool.gendox b/api/docs/tool.gendox
index f3a90dfa5b6..9ecf3890d89 100644
--- a/api/docs/tool.gendox
+++ b/api/docs/tool.gendox
@@ -83,23 +83,30 @@ should point at the local documentation provided with the release package.
/**
\page page_drstrace System Call Tracer for Windows
-\p drstrace is a system call tracing tool for Windows. It is part of the
-
Dr. Memory tool suite. It is also
-included with DynamoRIO versions 5.0.0 and higher. If this documentation
-is part of a DynamoRIO public release,
this link should
+\p drstrace is a system call tracing tool for Windows.
+It is part of the
+
Dr. Memory tool suite. It is also
+included with DynamoRIO versions 5.0.0 and higher.
+
+If this documentation is part of a DynamoRIO public release,
+
this link should
point at the local documentation provided with the release package.
+
This one points to the online
+documentation.
*/
/**
\page page_drltrace Library Call Tracer
\p drltrace is a library call tracing tool for all platforms. It is part of the
-
Dr. Memory tool suite. It is also
-included with DynamoRIO versions 5.0.0 and higher. If this documentation
-is part of a DynamoRIO public release,
this link should
-point at the local documentation provided with the release package.
+
Dr. Memory tool suite. It is also
+included with DynamoRIO versions 5.0.0 and higher.
+
+If this documentation is part of a DynamoRIO public release,
+
this link
+ should point at the local documentation provided with the release package.
+
This one points to the online
+documentation.
*/
/**
@@ -107,10 +114,13 @@ point at the local documentation provided with the release package.
\p symquery is a symbol querying tool that operates on Linux, Mac, and
Windows and supports the Windows PDB, Linux ELF, Mac Mach-O, and Windows
-PECOFF formats with DWARF2 line information. It is part of the
Dr. Memory tool suite. It is also included
-with DynamoRIO versions 5.0.0 and higher. If this documentation is part of
-a DynamoRIO public release,
this link
+PECOFF formats with DWARF2 line information. It is part of the
+
Dr. Memory tool suite. It is also included
+with DynamoRIO versions 5.0.0 and higher.
+
+If this documentation is part of a DynamoRIO public release,
+
this link
should point at the local documentation provided with the release package.
+
This one points to the online
+documentation.
*/
diff --git a/api/docs/workflow.dox b/api/docs/workflow.dox
index 84c1e5aab9b..468639508b9 100644
--- a/api/docs/workflow.dox
+++ b/api/docs/workflow.dox
@@ -1,5 +1,5 @@
/* ******************************************************************************
- * Copyright (c) 2010-2021 Google, Inc. All rights reserved.
+ * Copyright (c) 2010-2024 Google, Inc. All rights reserved.
* ******************************************************************************/
/*
@@ -53,13 +53,13 @@ Clone the repository, either via ssh if you've set up ssh keys in your
Github profile:
~~~{.unparsed}
-git clone git@github.com:DynamoRIO/dynamorio.git
+git clone --recurse-submodules -j4 git@github.com:DynamoRIO/dynamorio.git
~~~
Or via https:
~~~{.unparsed}
-git clone https://github.com/DynamoRIO/dynamorio.git
+git clone --recurse-submodules -j4 https://github.com/DynamoRIO/dynamorio.git
~~~
# Configuring Author Information and Aliases
diff --git a/clients/drcachesim/CMakeLists.txt b/clients/drcachesim/CMakeLists.txt
index 540e789200b..75b366ed495 100644
--- a/clients/drcachesim/CMakeLists.txt
+++ b/clients/drcachesim/CMakeLists.txt
@@ -277,7 +277,7 @@ target_link_libraries(drcachesim drmemtrace_simulator drmemtrace_reuse_distance
drmemtrace_histogram drmemtrace_reuse_time drmemtrace_basic_counts
drmemtrace_opcode_mix drmemtrace_syscall_mix drmemtrace_view drmemtrace_func_view
drmemtrace_raw2trace directory_iterator drmemtrace_invariant_checker
- drmemtrace_schedule_stats)
+ drmemtrace_schedule_stats drmemtrace_record_filter)
if (UNIX)
target_link_libraries(drcachesim dl)
endif ()
@@ -512,6 +512,9 @@ macro(add_drmemtrace name type)
if (liblz4)
target_link_libraries(${name} lz4)
endif ()
+ if (RISCV64)
+ target_link_libraries(${name} atomic)
+ endif ()
add_dependencies(${name} api_headers)
install_target(${name} ${INSTALL_CLIENTS_LIB})
endmacro()
@@ -819,7 +822,7 @@ if (BUILD_TESTS)
drmemtrace_histogram drmemtrace_reuse_time drmemtrace_basic_counts
drmemtrace_opcode_mix drmemtrace_syscall_mix drmemtrace_view drmemtrace_func_view
drmemtrace_raw2trace directory_iterator drmemtrace_invariant_checker
- drmemtrace_schedule_stats drmemtrace_analyzer)
+ drmemtrace_schedule_stats drmemtrace_analyzer drmemtrace_record_filter)
if (UNIX)
target_link_libraries(tool.drcachesim.core_sharded dl)
endif ()
diff --git a/clients/drcachesim/analysis_tool.h b/clients/drcachesim/analysis_tool.h
index 16c4df7e8a8..24306cd7534 100644
--- a/clients/drcachesim/analysis_tool.h
+++ b/clients/drcachesim/analysis_tool.h
@@ -1,5 +1,5 @@
/* **********************************************************
- * Copyright (c) 2016-2023 Google, Inc. All rights reserved.
+ * Copyright (c) 2016-2024 Google, Inc. All rights reserved.
* **********************************************************/
/*
@@ -189,14 +189,24 @@ template
class analysis_tool_tmpl_t {
print_results() = 0;
/**
- * Struct that stores details of a tool's state snapshot at an interval. This is
+ * Type that stores details of a tool's state snapshot at an interval. This is
* useful for computing and combining interval results. Tools should inherit from
- * this struct to define their own state snapshot structs. Tools do not need to
- * supply any values to construct this base struct; they can simply use the
+ * this type to define their own state snapshot types. Tools do not need to
+ * supply any values to construct this base class; they can simply use the
* default constructor. The members of this base class will be set by the
- * framework automatically.
+ * framework automatically, and must not be modified by the tool at any point.
+ * XXX: Perhaps this should be a class with private data members.
*/
- struct interval_state_snapshot_t {
+ class interval_state_snapshot_t {
+ // Allow the analyzer framework access to private data members to set them
+ // during trace interval analysis. Tools have read-only access via the public
+ // accessor functions.
+ // Note that we expect X to be same as RecordType. But friend declarations
+ // cannot refer to partial specializations so we go with the separate template
+ // parameter X.
+ template friend class analyzer_tmpl_t;
+
+ public:
// This constructor is only for convenience in unit tests. The tool does not
// need to provide these values, and can simply use the default constructor
// below.
@@ -204,63 +214,98 @@ template class analysis_tool_tmpl_t {
uint64_t interval_end_timestamp,
uint64_t instr_count_cumulative,
uint64_t instr_count_delta)
- : shard_id(shard_id)
- , interval_id(interval_id)
- , interval_end_timestamp(interval_end_timestamp)
- , instr_count_cumulative(instr_count_cumulative)
- , instr_count_delta(instr_count_delta)
+ : shard_id_(shard_id)
+ , interval_id_(interval_id)
+ , interval_end_timestamp_(interval_end_timestamp)
+ , instr_count_cumulative_(instr_count_cumulative)
+ , instr_count_delta_(instr_count_delta)
{
}
+ // This constructor should be used by tools that subclass
+ // interval_state_snapshot_t. The data members will be set by the framework
+ // automatically when the tool returns a pointer to their created object from
+ // generate_*interval_snapshot or combine_interval_snapshots.
interval_state_snapshot_t()
{
}
+ virtual ~interval_state_snapshot_t() = default;
+ int64_t
+ get_shard_id() const
+ {
+ return shard_id_;
+ }
+ uint64_t
+ get_interval_id() const
+ {
+ return interval_id_;
+ }
+ uint64_t
+ get_interval_end_timestamp() const
+ {
+ return interval_end_timestamp_;
+ }
+ uint64_t
+ get_instr_count_cumulative() const
+ {
+ return instr_count_cumulative_;
+ }
+ uint64_t
+ get_instr_count_delta() const
+ {
+ return instr_count_delta_;
+ }
+
+ static constexpr int64_t WHOLE_TRACE_SHARD_ID = -1;
+
+ private:
// The following fields are set automatically by the analyzer framework after
// the tool returns the interval_state_snapshot_t* in the
// generate_*interval_snapshot APIs. So they'll be available to the tool in
- // the combine_interval_snapshots and print_interval_results APIs.
+ // the finalize_interval_snapshots(), combine_interval_snapshots() (for the
+ // parameter snapshots), and print_interval_results() APIs via the above
+ // public accessor functions.
// Identifier for the shard to which this interval belongs. Currently, shards
// map only to threads, so this is the thread id. Set to WHOLE_TRACE_SHARD_ID
// for the whole trace interval snapshots.
- int64_t shard_id = 0;
- uint64_t interval_id = 0;
+ int64_t shard_id_ = 0;
+ uint64_t interval_id_ = 0;
// Stores the timestamp (exclusive) when the above interval ends. Note
// that this is not the last timestamp actually seen in the trace interval,
// but simply the abstract boundary of the interval. This will be aligned
// to the specified -interval_microseconds.
- uint64_t interval_end_timestamp = 0;
-
- // Count of instructions: cumulative till this interval, and the incremental
- // delta in this interval vs the previous one. May be useful for tools to
- // compute PKI (per kilo instruction) metrics; obviates the need for each
- // tool to duplicate this.
- uint64_t instr_count_cumulative = 0;
- uint64_t instr_count_delta = 0;
+ uint64_t interval_end_timestamp_ = 0;
- static constexpr int64_t WHOLE_TRACE_SHARD_ID = -1;
-
- virtual ~interval_state_snapshot_t() = default;
+ // Count of instructions: cumulative till this interval's end, and the
+ // incremental delta in this interval vs the previous one. May be useful for
+ // tools to compute PKI (per kilo instruction) metrics; obviates the need for
+ // each tool to duplicate this.
+ uint64_t instr_count_cumulative_ = 0;
+ uint64_t instr_count_delta_ = 0;
};
/**
* Notifies the analysis tool that the given trace \p interval_id has ended so
- * that it can generate a snapshot of its internal state in a struct derived
+ * that it can generate a snapshot of its internal state in a type derived
* from \p interval_state_snapshot_t, and return a pointer to it. The returned
- * pointer will be provided to the tool in later combine_interval_snapshots()
+ * pointer will be provided to the tool in later finalize_interval_snapshots(),
* and print_interval_result() calls.
*
* \p interval_id is a positive ordinal of the trace interval that just ended.
- * Trace intervals have a length equal to the \p -interval_microseconds specified
- * to the framework. Trace intervals are measured using the value of the
- * #TRACE_MARKER_TYPE_TIMESTAMP markers. The provided \p interval_id
- * values will be monotonically increasing but may not be continuous,
- * i.e. the tool may not see some \p interval_id if the trace did not have
- * any activity in that interval.
+ * Trace intervals have a length equal to either \p -interval_microseconds or
+ * \p -interval_instr_count. Time-based intervals are measured using the value
+ * of the #TRACE_MARKER_TYPE_TIMESTAMP markers. Instruction count intervals are
+ * measured in terms of shard-local instrs.
*
- * The returned \p interval_state_snapshot_t* will be passed to the
- * combine_interval_snapshots() API which is invoked by the framework to merge
- * multiple \p interval_state_snapshot_t from different shards in the parallel
- * mode of the analyzer.
+ * The provided \p interval_id values will be monotonically increasing. For
+ * \p -interval_microseconds intervals, these values may not be continuous,
+ * i.e. the tool may not see some \p interval_id if the trace did not have any
+ * activity in that interval.
+ *
+ * After all interval state snapshots are generated, the list of all returned
+ * \p interval_state_snapshot_t* is passed to finalize_interval_snapshots()
+ * to allow the tool the opportunity to make any holistic adjustments to the
+ * snapshots.
*
* Finally, the print_interval_result() API is invoked with a list of
* \p interval_state_snapshot_t* representing interval snapshots for the
@@ -277,6 +322,40 @@ template class analysis_tool_tmpl_t {
{
return nullptr;
}
+ /**
+ * Finalizes the interval snapshots in the given \p interval_snapshots list.
+ * This callback provides an opportunity for tools to make any holistic
+ * adjustments to the snapshot list now that we have all of them together. This
+ * may include, for example, computing the diff with the previous snapshot.
+ *
+ * Tools can modify the individual snapshots and also the list of snapshots itself.
+ * If some snapshots are removed, release_interval_snapshot() will not be invoked
+ * for them and the tool is responsible to de-allocate the resources. Adding new
+ * snapshots to the list is undefined behavior; tools should operate only on the
+ * provided snapshots which were generated in prior generate_*interval_snapshot
+ * calls.
+ *
+ * Tools cannot modify any data set by the framework in the base
+ * \p interval_state_snapshot_t; note that only read-only access is allowed anyway
+ * to those private data members via public accessor functions.
+ *
+ * In the parallel mode, this is invoked for each list of shard-local snapshots
+ * before they are possibly merged to create whole-trace snapshots using
+ * combine_interval_snapshots() and passed to print_interval_result(). In the
+ * serial mode, this is invoked with the list of whole-trace snapshots before it
+ * is passed to print_interval_results().
+ *
+ * This is an optional API. If a tool chooses to not override this, the snapshot
+ * list will simply continue unmodified.
+ *
+ * Returns whether it was successful.
+ */
+ virtual bool
+ finalize_interval_snapshots(
+ std::vector &interval_snapshots)
+ {
+ return true;
+ }
/**
* Invoked by the framework to combine the shard-local \p interval_state_snapshot_t
* objects pointed at by \p latest_shard_snapshots, to create the combined
@@ -302,6 +381,10 @@ template class analysis_tool_tmpl_t {
* \p interval_end_timestamp)
* - or if the tool mixes cumulative and delta metrics: some field-specific logic that
* combines the above two strategies.
+ *
+ * Note that after the given snapshots have been combined to create the whole-trace
+ * snapshot using this API, any change made by the tool to the snapshot contents will
+ * not have any effect.
*/
virtual interval_state_snapshot_t *
combine_interval_snapshots(
@@ -314,14 +397,14 @@ template class analysis_tool_tmpl_t {
* Prints the interval results for the given series of interval state snapshots in
* \p interval_snapshots.
*
- * This is currently invoked with the list of whole-trace interval snapshots (for
- * the parallel mode, these are the snapshots created by merging the shard-local
- * snapshots).
+ * This is invoked with the list of whole-trace interval snapshots (for the
+ * parallel mode, these are the snapshots created by merging the shard-local
+ * snapshots). For the \p -interval_instr_count snapshots in parallel mode, this is
+ * invoked separately for the snapshots of each shard.
*
* The framework should be able to invoke this multiple times, possibly with a
* different list of interval snapshots. So it should avoid free-ing memory or
- * changing global state. This is to keep open the possibility of the framework
- * printing interval results for each shard separately in future.
+ * changing global state.
*/
virtual bool
print_interval_results(
@@ -334,6 +417,10 @@ template class analysis_tool_tmpl_t {
* by \p interval_snapshot is no longer needed by the framework. The tool may
* de-allocate it right away or later, as it needs. Returns whether it was
* successful.
+ *
+ * Note that if the tool removed some snapshot from the list passed to
+ * finalize_interval_snapshots(), then release_interval_snapshot() will not be
+ * invoked for that snapshot.
*/
virtual bool
release_interval_snapshot(interval_state_snapshot_t *interval_snapshot)
@@ -387,7 +474,8 @@ template class analysis_tool_tmpl_t {
/**
* Invoked once for each trace shard prior to calling parallel_shard_memref() for
* that shard, this allows a tool to create data local to a shard. The \p
- * shard_index is a unique identifier allowing shard data to be stored into a global
+ * shard_index is the 0-based ordinal of the shard, serving as a unique identifier
+ * allowing shard data to be stored into a global
* table if desired (typically for aggregation use in print_results()). The \p
* worker_data is the return value of parallel_worker_init() for the worker thread
* who will exclusively operate on this shard. The \p shard_stream allows tools to
@@ -439,10 +527,10 @@ template class analysis_tool_tmpl_t {
/**
* Notifies the analysis tool that the given trace \p interval_id in the shard
* represented by the given \p shard_data has ended, so that it can generate a
- * snapshot of its internal state in a struct derived from \p
+ * snapshot of its internal state in a type derived from \p
* interval_state_snapshot_t, and return a pointer to it. The returned pointer will
- * be provided to the tool in later combine_interval_snapshots() and
- * print_interval_result() calls.
+ * be provided to the tool in later combine_interval_snapshots(),
+ * finalize_interval_snapshots(), and print_interval_result() calls.
*
* Note that the provided \p interval_id is local to the shard that is
* represented by the given \p shard_data, and not the whole-trace interval. The
@@ -451,30 +539,22 @@ template class analysis_tool_tmpl_t {
* shard-local \p interval_state_snapshot_t corresponding to that whole-trace
* interval.
*
- * \p interval_id is a positive ordinal of the trace interval that just ended.
- * Trace intervals have a length equal to the \p -interval_microseconds specified
- * to the framework. Trace intervals are measured using the value of the
- * #TRACE_MARKER_TYPE_TIMESTAMP markers. The provided \p interval_id
- * values will be monotonically increasing but may not be continuous,
- * i.e. the tool may not see some \p interval_id if the trace shard did not
- * have any activity in that interval.
+ * The \p interval_id field is defined similar to the same field in
+ * generate_interval_snapshot().
*
- * The returned \p interval_state_snapshot_t* will be passed to the
- * combine_interval_snapshot() API which is invoked by the framework to merge
- * multiple \p interval_state_snapshot_t from different shards in the parallel
- * mode of the analyzer.
- *
- * Finally, the print_interval_result() API is invoked with a list of
- * \p interval_state_snapshot_t* representing interval snapshots for the
- * whole trace. In the parallel mode of the analyzer, this list is computed by
- * combining the shard-local \p interval_state_snapshot_t using the tool's
- * combine_interval_snapshot() API.
+ * The returned \p interval_state_snapshot_t* is treated in the same manner as
+ * the same in generate_interval_snapshot(), with the following additions:
*
- * The tool must not de-allocate the state snapshot until
- * release_interval_snapshot() is invoked by the framework.
+ * In case of \p -interval_microseconds in the parallel mode: after
+ * finalize_interval_snapshots() has been invoked, the \p interval_state_snapshot_t*
+ * objects generated at the same time period across different shards are passed to
+ * the combine_interval_snapshot() API by the framework to merge them to create the
+ * whole-trace interval snapshots. The print_interval_result() API is then invoked
+ * with the list of whole-trace \p interval_state_snapshot_t* thus obtained.
*
- * An example use case of this API is to create a time series of some output
- * metric over the whole trace.
+ * In case of \p -interval_instr_count in the parallel mode: no merging across
+ * shards is done, and the print_interval_results() API is invoked for each list
+ * of shard-local \p interval_state_snapshot_t*.
*/
virtual interval_state_snapshot_t *
generate_shard_interval_snapshot(void *shard_data, uint64_t interval_id)
diff --git a/clients/drcachesim/analyzer.cpp b/clients/drcachesim/analyzer.cpp
index c158c00cd84..97159ed342a 100644
--- a/clients/drcachesim/analyzer.cpp
+++ b/clients/drcachesim/analyzer.cpp
@@ -1,5 +1,5 @@
/* **********************************************************
- * Copyright (c) 2016-2023 Google, Inc. All rights reserved.
+ * Copyright (c) 2016-2024 Google, Inc. All rights reserved.
* **********************************************************/
/*
@@ -32,13 +32,6 @@
#include "analyzer.h"
-#ifdef WINDOWS
-# define WIN32_LEAN_AND_MEAN
-# include
-#else
-# include
-#endif
-
#include
#include
@@ -47,7 +40,6 @@
#include
#include
#include
-#include
#include
#include
#include
@@ -122,6 +114,13 @@ analyzer_t::record_is_timestamp(const memref_t &record)
record.marker.marker_type == TRACE_MARKER_TYPE_TIMESTAMP;
}
+template <>
+bool
+analyzer_t::record_is_instr(const memref_t &record)
+{
+ return type_is_instr(record.instr.type);
+}
+
template <>
memref_t
analyzer_t::create_wait_marker()
@@ -182,6 +181,13 @@ record_analyzer_t::record_is_timestamp(const trace_entry_t &record)
return record.type == TRACE_TYPE_MARKER && record.size == TRACE_MARKER_TYPE_TIMESTAMP;
}
+template <>
+bool
+record_analyzer_t::record_is_instr(const trace_entry_t &record)
+{
+ return type_is_instr(static_cast(record.type));
+}
+
template <>
trace_entry_t
record_analyzer_t::create_wait_marker()
@@ -223,7 +229,7 @@ template
bool
analyzer_tmpl_t::init_scheduler(
const std::string &trace_path, memref_tid_t only_thread, int verbosity,
- typename sched_type_t::scheduler_options_t *options)
+ typename sched_type_t::scheduler_options_t options)
{
verbosity_ = verbosity;
if (trace_path.empty()) {
@@ -242,14 +248,14 @@ analyzer_tmpl_t::init_scheduler(
if (only_thread != INVALID_THREAD_ID) {
workload.only_threads.insert(only_thread);
}
- return init_scheduler_common(workload, options);
+ return init_scheduler_common(workload, std::move(options));
}
template
bool
analyzer_tmpl_t::init_scheduler(
std::unique_ptr reader, std::unique_ptr reader_end,
- int verbosity, typename sched_type_t::scheduler_options_t *options)
+ int verbosity, typename sched_type_t::scheduler_options_t options)
{
verbosity_ = verbosity;
if (!reader || !reader_end) {
@@ -257,20 +263,21 @@ analyzer_tmpl_t::init_scheduler(
return false;
}
std::vector readers;
- // With no modifiers or only_threads the tid doesn't matter.
- readers.emplace_back(std::move(reader), std::move(reader_end), /*tid=*/1);
+ // Use a sentinel for the tid so the scheduler will use the memref record tid.
+ readers.emplace_back(std::move(reader), std::move(reader_end),
+ /*tid=*/INVALID_THREAD_ID);
std::vector regions;
if (skip_instrs_ > 0)
regions.emplace_back(skip_instrs_ + 1, 0);
typename sched_type_t::input_workload_t workload(std::move(readers), regions);
- return init_scheduler_common(workload, options);
+ return init_scheduler_common(workload, std::move(options));
}
template
bool
analyzer_tmpl_t::init_scheduler_common(
typename sched_type_t::input_workload_t &workload,
- typename sched_type_t::scheduler_options_t *options)
+ typename sched_type_t::scheduler_options_t options)
{
for (int i = 0; i < num_tools_; ++i) {
if (parallel_ && !tools_[i]->parallel_shard_supported()) {
@@ -282,25 +289,37 @@ analyzer_tmpl_t::init_scheduler_common(
sched_inputs[0] = std::move(workload);
typename sched_type_t::scheduler_options_t sched_ops;
+ int output_count = worker_count_;
if (shard_type_ == SHARD_BY_CORE) {
// Subclass must pass us options and set worker_count_ to # cores.
- if (options == nullptr || worker_count_ <= 0) {
+ if (worker_count_ <= 0) {
error_string_ = "For -core_sharded, core count must be > 0";
return false;
}
- sched_ops = *options;
+ sched_ops = std::move(options);
if (sched_ops.quantum_unit == sched_type_t::QUANTUM_TIME)
sched_by_time_ = true;
+ if (!parallel_) {
+ // output_count remains the # of virtual cores, but we have just
+ // one worker thread. The scheduler multiplexes the output_count output
+ // cores onto a single stream for us with this option:
+ sched_ops.single_lockstep_output = true;
+ worker_count_ = 1;
+ }
} else if (parallel_) {
sched_ops = sched_type_t::make_scheduler_parallel_options(verbosity_);
+ sched_ops.read_inputs_in_init = options.read_inputs_in_init;
if (worker_count_ <= 0)
worker_count_ = std::thread::hardware_concurrency();
+ output_count = worker_count_;
} else {
sched_ops = sched_type_t::make_scheduler_serial_options(verbosity_);
+ sched_ops.read_inputs_in_init = options.read_inputs_in_init;
worker_count_ = 1;
+ output_count = 1;
}
- int output_count = worker_count_;
- if (scheduler_.init(sched_inputs, output_count, sched_ops) !=
+ sched_mapping_ = options.mapping;
+ if (scheduler_.init(sched_inputs, output_count, std::move(sched_ops)) !=
sched_type_t::STATUS_SUCCESS) {
ERRMSG("Failed to initialize scheduler: %s\n",
scheduler_.get_error_string().c_str());
@@ -309,6 +328,14 @@ analyzer_tmpl_t::init_scheduler_common(
for (int i = 0; i < worker_count_; ++i) {
worker_data_.push_back(analyzer_worker_data_t(i, scheduler_.get_stream(i)));
+ if (options.read_inputs_in_init) {
+ // The docs say we can query the filetype up front.
+ uint64_t filetype = scheduler_.get_stream(i)->get_filetype();
+ VPRINT(this, 2, "Worker %d filetype %" PRIx64 "\n", i, filetype);
+ if (TESTANY(OFFLINE_FILE_TYPE_CORE_SHARDED, filetype)) {
+ shard_type_ = SHARD_BY_CORE;
+ }
+ }
}
return true;
@@ -318,7 +345,7 @@ template
analyzer_tmpl_t::analyzer_tmpl_t(
const std::string &trace_path, analysis_tool_tmpl_t **tools,
int num_tools, int worker_count, uint64_t skip_instrs, uint64_t interval_microseconds,
- int verbosity)
+ uint64_t interval_instr_count, int verbosity)
: success_(true)
, num_tools_(num_tools)
, tools_(tools)
@@ -326,11 +353,18 @@ analyzer_tmpl_t::analyzer_tmpl_t(
, worker_count_(worker_count)
, skip_instrs_(skip_instrs)
, interval_microseconds_(interval_microseconds)
+ , interval_instr_count_(interval_instr_count)
, verbosity_(verbosity)
{
+ if (interval_microseconds_ > 0 && interval_instr_count_ > 0) {
+ success_ = false;
+ error_string_ = "Cannot enable both kinds of interval analysis";
+ return;
+ }
// The scheduler will call reader_t::init() for each input file. We assume
// that won't block (analyzer_multi_t separates out IPC readers).
- if (!init_scheduler(trace_path, INVALID_THREAD_ID, verbosity)) {
+ typename sched_type_t::scheduler_options_t sched_ops;
+ if (!init_scheduler(trace_path, INVALID_THREAD_ID, verbosity, std::move(sched_ops))) {
success_ = false;
error_string_ = "Failed to create scheduler";
return;
@@ -376,28 +410,16 @@ template
uint64_t
analyzer_tmpl_t::get_current_microseconds()
{
-#ifdef UNIX
- struct timeval time;
- if (gettimeofday(&time, nullptr) != 0)
- return 0;
- return time.tv_sec * 1000000 + time.tv_usec;
-#else
- SYSTEMTIME sys_time;
- GetSystemTime(&sys_time);
- FILETIME file_time;
- if (!SystemTimeToFileTime(&sys_time, &file_time))
- return 0;
- return file_time.dwLowDateTime +
- (static_cast(file_time.dwHighDateTime) << 32);
-#endif
+ return get_microsecond_timestamp();
}
template
uint64_t
-analyzer_tmpl_t::compute_interval_id(uint64_t first_timestamp,
- uint64_t latest_timestamp)
+analyzer_tmpl_t::compute_timestamp_interval_id(
+ uint64_t first_timestamp, uint64_t latest_timestamp)
{
assert(first_timestamp <= latest_timestamp);
+ assert(interval_microseconds_ > 0);
// We keep the interval end timestamps independent of the first timestamp of the
// trace. For the parallel mode, where we need to merge intervals from different
// shards that were active during the same final whole-trace interval, having aligned
@@ -408,17 +430,34 @@ analyzer_tmpl_t::compute_interval_id(uint64_t first_time
first_timestamp / interval_microseconds_ + 1;
}
+template
+uint64_t
+analyzer_tmpl_t::compute_instr_count_interval_id(
+ uint64_t cur_instr_count)
+{
+ assert(interval_instr_count_ > 0);
+ if (cur_instr_count == 0)
+ return 1;
+ // We want all memory access entries following an instr to stay in the same
+ // interval as the instr, so we increment interval_id at instr entries. Also,
+ // we want the last instr in each interval to have an ordinal that's a multiple
+ // of interval_instr_count_.
+ return (cur_instr_count - 1) / interval_instr_count_ + 1;
+}
+
template
uint64_t
analyzer_tmpl_t::compute_interval_end_timestamp(
uint64_t first_timestamp, uint64_t interval_id)
{
+ assert(interval_microseconds_ > 0);
assert(interval_id >= 1);
uint64_t end_timestamp =
(first_timestamp / interval_microseconds_ + interval_id) * interval_microseconds_;
// Since the interval's end timestamp is exclusive, the end_timestamp would actually
// fall under the next interval.
- assert(compute_interval_id(first_timestamp, end_timestamp) == interval_id + 1);
+ assert(compute_timestamp_interval_id(first_timestamp, end_timestamp) ==
+ interval_id + 1);
return end_timestamp;
}
@@ -427,19 +466,33 @@ bool
analyzer_tmpl_t::advance_interval_id(
typename scheduler_tmpl_t::stream_t *stream,
analyzer_shard_data_t *shard, uint64_t &prev_interval_index,
- uint64_t &prev_interval_init_instr_count)
+ uint64_t &prev_interval_init_instr_count, bool at_instr_record)
{
- if (interval_microseconds_ == 0) {
+ uint64_t next_interval_index = 0;
+ if (interval_microseconds_ > 0) {
+ next_interval_index = compute_timestamp_interval_id(stream->get_first_timestamp(),
+ stream->get_last_timestamp());
+ } else if (interval_instr_count_ > 0) {
+ // The interval callbacks are invoked just prior to the process_memref or
+ // parallel_shard_memref callback for the first instr of the new interval; This
+ // keeps the instr's memory accesses in the same interval as the instr.
+ next_interval_index =
+ compute_instr_count_interval_id(stream->get_instruction_ordinal());
+ } else {
return false;
}
- uint64_t next_interval_index =
- compute_interval_id(stream->get_first_timestamp(), stream->get_last_timestamp());
if (next_interval_index != shard->cur_interval_index) {
assert(next_interval_index > shard->cur_interval_index);
prev_interval_index = shard->cur_interval_index;
prev_interval_init_instr_count = shard->cur_interval_init_instr_count;
shard->cur_interval_index = next_interval_index;
- shard->cur_interval_init_instr_count = stream->get_instruction_ordinal();
+ // If the next record to be presented to the tools is an instr record, we need to
+ // adjust for the fact that the record has already been read from the stream.
+ // Since we know that the next record is a part of the new interval and
+ // cur_interval_init_instr_count is supposed to be the count just prior to the
+ // new interval, we need to subtract one count for the instr.
+ shard->cur_interval_init_instr_count =
+ stream->get_instruction_ordinal() - (at_instr_record ? 1 : 0);
return true;
}
return false;
@@ -452,7 +505,7 @@ analyzer_tmpl_t::process_serial(analyzer_worker_data_t &
std::vector user_worker_data(num_tools_);
worker.shard_data[0].tool_data.resize(num_tools_);
- if (interval_microseconds_ != 0)
+ if (interval_microseconds_ != 0 || interval_instr_count_ != 0)
worker.shard_data[0].cur_interval_index = 1;
for (int i = 0; i < num_tools_; ++i) {
worker.error = tools_[i]->initialize_stream(worker.stream);
@@ -469,7 +522,12 @@ analyzer_tmpl_t::process_serial(analyzer_worker_data_t &
uint64_t cur_micros = sched_by_time_ ? get_current_microseconds() : 0;
typename sched_type_t::stream_status_t status =
worker.stream->next_record(record, cur_micros);
- if (status != sched_type_t::STATUS_OK) {
+ if (status == sched_type_t::STATUS_WAIT) {
+ record = create_wait_marker();
+ } else if (status == sched_type_t::STATUS_IDLE) {
+ assert(shard_type_ == SHARD_BY_CORE);
+ record = create_idle_marker();
+ } else if (status != sched_type_t::STATUS_OK) {
if (status != sched_type_t::STATUS_EOF) {
if (status == sched_type_t::STATUS_REGION_INVALID) {
worker.error =
@@ -478,21 +536,24 @@ analyzer_tmpl_t::process_serial(analyzer_worker_data_t &
worker.error =
"Failed to read from trace: " + worker.stream->get_stream_name();
}
- } else if (interval_microseconds_ != 0) {
- process_interval(worker.shard_data[0].cur_interval_index,
- worker.shard_data[0].cur_interval_init_instr_count,
- &worker,
- /*parallel=*/false);
+ } else if (interval_microseconds_ != 0 || interval_instr_count_ != 0) {
+ if (!process_interval(worker.shard_data[0].cur_interval_index,
+ worker.shard_data[0].cur_interval_init_instr_count,
+ &worker,
+ /*parallel=*/false, /*at_instr_record=*/false) ||
+ !finalize_interval_snapshots(&worker, /*parallel=*/false))
+ return;
}
return;
}
uint64_t prev_interval_index;
uint64_t prev_interval_init_instr_count;
- if (record_is_timestamp(record) &&
+ if ((record_is_timestamp(record) || record_is_instr(record)) &&
advance_interval_id(worker.stream, &worker.shard_data[0], prev_interval_index,
- prev_interval_init_instr_count) &&
+ prev_interval_init_instr_count,
+ record_is_instr(record)) &&
!process_interval(prev_interval_index, prev_interval_init_instr_count,
- &worker, /*parallel=*/false)) {
+ &worker, /*parallel=*/false, record_is_instr(record))) {
return;
}
for (int i = 0; i < num_tools_; ++i) {
@@ -515,11 +576,12 @@ analyzer_tmpl_t::process_shard_exit(
VPRINT(this, 1, "Worker %d finished trace shard %s\n", worker->index,
worker->stream->get_stream_name().c_str());
worker->shard_data[shard_index].exited = true;
- if (interval_microseconds_ != 0 &&
- !process_interval(worker->shard_data[shard_index].cur_interval_index,
- worker->shard_data[shard_index].cur_interval_init_instr_count,
- worker,
- /*parallel=*/true, shard_index))
+ if ((interval_microseconds_ != 0 || interval_instr_count_ != 0) &&
+ (!process_interval(worker->shard_data[shard_index].cur_interval_index,
+ worker->shard_data[shard_index].cur_interval_init_instr_count,
+ worker,
+ /*parallel=*/true, /*at_instr_record=*/false, shard_index) ||
+ !finalize_interval_snapshots(worker, /*parallel=*/true, shard_index)))
return false;
for (int i = 0; i < num_tools_; ++i) {
if (!tools_[i]->parallel_shard_exit(
@@ -536,8 +598,9 @@ analyzer_tmpl_t::process_shard_exit(
}
template
-void
-analyzer_tmpl_t::process_tasks(analyzer_worker_data_t *worker)
+bool
+analyzer_tmpl_t::process_tasks_internal(
+ analyzer_worker_data_t *worker)
{
std::vector user_worker_data(num_tools_);
@@ -573,16 +636,14 @@ analyzer_tmpl_t::process_tasks(analyzer_worker_data_t *w
worker->error =
"Failed to read from trace: " + worker->stream->get_stream_name();
}
- return;
+ return false;
}
- int shard_index = shard_type_ == SHARD_BY_CORE
- ? worker->index
- : worker->stream->get_input_stream_ordinal();
+ int shard_index = worker->stream->get_shard_index();
if (worker->shard_data.find(shard_index) == worker->shard_data.end()) {
VPRINT(this, 1, "Worker %d starting on trace shard %d stream is %p\n",
worker->index, shard_index, worker->stream);
worker->shard_data[shard_index].tool_data.resize(num_tools_);
- if (interval_microseconds_ != 0)
+ if (interval_microseconds_ != 0 || interval_instr_count_ != 0)
worker->shard_data[shard_index].cur_interval_index = 1;
for (int i = 0; i < num_tools_; ++i) {
worker->shard_data[shard_index].tool_data[i].shard_data =
@@ -600,12 +661,13 @@ analyzer_tmpl_t::process_tasks(analyzer_worker_data_t *w
}
uint64_t prev_interval_index;
uint64_t prev_interval_init_instr_count;
- if (record_is_timestamp(record) &&
+ if ((record_is_timestamp(record) || record_is_instr(record)) &&
advance_interval_id(worker->stream, &worker->shard_data[shard_index],
- prev_interval_index, prev_interval_init_instr_count) &&
+ prev_interval_index, prev_interval_init_instr_count,
+ record_is_instr(record)) &&
!process_interval(prev_interval_index, prev_interval_init_instr_count, worker,
- /*parallel=*/true, shard_index)) {
- return;
+ /*parallel=*/true, record_is_instr(record), shard_index)) {
+ return false;
}
for (int i = 0; i < num_tools_; ++i) {
if (!tools_[i]->parallel_shard_memref(
@@ -615,24 +677,27 @@ analyzer_tmpl_t::process_tasks(analyzer_worker_data_t *w
VPRINT(this, 1, "Worker %d hit shard memref error %s on trace shard %s\n",
worker->index, worker->error.c_str(),
worker->stream->get_stream_name().c_str());
- return;
+ return false;
}
}
if (record_is_thread_final(record) && shard_type_ != SHARD_BY_CORE) {
- if (!process_shard_exit(worker, shard_index))
- return;
+ if (!process_shard_exit(worker, shard_index)) {
+ return false;
+ }
}
}
if (shard_type_ == SHARD_BY_CORE) {
if (worker->shard_data.find(worker->index) != worker->shard_data.end()) {
- if (!process_shard_exit(worker, worker->index))
- return;
+ if (!process_shard_exit(worker, worker->index)) {
+ return false;
+ }
}
}
for (const auto &keyval : worker->shard_data) {
if (!keyval.second.exited) {
- if (!process_shard_exit(worker, keyval.second.shard_index))
- return;
+ if (!process_shard_exit(worker, keyval.second.shard_index)) {
+ return false;
+ }
}
}
for (int i = 0; i < num_tools_; ++i) {
@@ -641,7 +706,28 @@ analyzer_tmpl_t::process_tasks(analyzer_worker_data_t *w
worker->error = error;
VPRINT(this, 1, "Worker %d hit worker exit error %s\n", worker->index,
error.c_str());
- return;
+ return false;
+ }
+ }
+ return true;
+}
+
+template
+void
+analyzer_tmpl_t::process_tasks(analyzer_worker_data_t *worker)
+{
+ if (!process_tasks_internal(worker)) {
+ if (sched_mapping_ == sched_type_t::MAP_TO_ANY_OUTPUT) {
+ // Avoid a hang in the scheduler if we leave our current input stranded.
+ // XXX: Better to just do a global exit and not let the other threads
+ // keep running? That breaks the current model where errors are
+ // propagated to the user to decide what to do.
+ // We could perhaps add thread synch points to have other threads
+ // exit earlier: but maybe some uses cases consider one shard error
+ // to not affect others and not be fatal?
+ if (worker->stream->set_active(false) != sched_type_t::STATUS_OK) {
+ ERRMSG("Failed to set failing worker to inactive; may hang");
+ }
}
}
}
@@ -658,20 +744,21 @@ analyzer_tmpl_t::combine_interval_snapshots(
result = tools_[tool_idx]->combine_interval_snapshots(latest_shard_snapshots,
interval_end_timestamp);
if (result == nullptr) {
- error_string_ = "combine_interval_snapshots unexpectedly returned nullptr";
+ error_string_ = "combine_interval_snapshots unexpectedly returned nullptr: " +
+ tools_[tool_idx]->get_error_string();
return false;
}
- result->instr_count_delta = 0;
- result->instr_count_cumulative = 0;
+ result->instr_count_delta_ = 0;
+ result->instr_count_cumulative_ = 0;
for (auto snapshot : latest_shard_snapshots) {
if (snapshot == nullptr)
continue;
// As discussed in the doc for analysis_tool_t::combine_interval_snapshots,
// we combine all shard's latest snapshots for cumulative metrics, whereas
// we combine only the shards active in current interval for delta metrics.
- result->instr_count_cumulative += snapshot->instr_count_cumulative;
- if (snapshot->interval_end_timestamp == interval_end_timestamp)
- result->instr_count_delta += snapshot->instr_count_delta;
+ result->instr_count_cumulative_ += snapshot->instr_count_cumulative_;
+ if (snapshot->interval_end_timestamp_ == interval_end_timestamp)
+ result->instr_count_delta_ += snapshot->instr_count_delta_;
}
return true;
}
@@ -679,11 +766,9 @@ analyzer_tmpl_t::combine_interval_snapshots(
template
bool
analyzer_tmpl_t::merge_shard_interval_results(
- // intervals[shard_idx] is a queue of interval_state_snapshot_t*
- // representing the interval snapshots for that shard. This is a queue as we
- // process the intervals here in a FIFO manner. Using a queue also makes code
- // a bit simpler.
- std::vector::interval_state_snapshot_t *>>
&intervals,
// This function will write the resulting whole-trace intervals to
@@ -698,6 +783,7 @@ analyzer_tmpl_t::merge_shard_interval_results(
// numbered by the earliest shard's timestamp.
uint64_t earliest_ever_interval_end_timestamp = std::numeric_limits::max();
size_t shard_count = intervals.size();
+ std::vector at_idx(shard_count, 0);
bool any_shard_has_results_left = true;
std::vector::interval_state_snapshot_t *>
last_snapshot_per_shard(shard_count, nullptr);
@@ -706,11 +792,11 @@ analyzer_tmpl_t::merge_shard_interval_results(
// one with the earliest interval-end timestamp.
uint64_t earliest_interval_end_timestamp = std::numeric_limits::max();
for (size_t shard_idx = 0; shard_idx < shard_count; ++shard_idx) {
- if (intervals[shard_idx].empty())
+ if (at_idx[shard_idx] == intervals[shard_idx].size())
continue;
- earliest_interval_end_timestamp =
- std::min(earliest_interval_end_timestamp,
- intervals[shard_idx].front()->interval_end_timestamp);
+ earliest_interval_end_timestamp = std::min(
+ earliest_interval_end_timestamp,
+ intervals[shard_idx][at_idx[shard_idx]]->interval_end_timestamp_);
}
// We're done if no shard has any interval left unprocessed.
if (earliest_interval_end_timestamp == std::numeric_limits::max()) {
@@ -725,10 +811,10 @@ analyzer_tmpl_t::merge_shard_interval_results(
// Update last_snapshot_per_shard for shards that were active during this
// interval, which have a timestamp == earliest_interval_end_timestamp.
for (size_t shard_idx = 0; shard_idx < shard_count; ++shard_idx) {
- if (intervals[shard_idx].empty())
+ if (at_idx[shard_idx] == intervals[shard_idx].size())
continue;
uint64_t cur_interval_end_timestamp =
- intervals[shard_idx].front()->interval_end_timestamp;
+ intervals[shard_idx][at_idx[shard_idx]]->interval_end_timestamp_;
assert(cur_interval_end_timestamp >= earliest_interval_end_timestamp);
if (cur_interval_end_timestamp > earliest_interval_end_timestamp)
continue;
@@ -741,8 +827,8 @@ analyzer_tmpl_t::merge_shard_interval_results(
return false;
}
}
- last_snapshot_per_shard[shard_idx] = intervals[shard_idx].front();
- intervals[shard_idx].pop();
+ last_snapshot_per_shard[shard_idx] = intervals[shard_idx][at_idx[shard_idx]];
+ ++at_idx[shard_idx];
}
// Merge last_snapshot_per_shard to form the result of the current
// whole-trace interval.
@@ -759,10 +845,10 @@ analyzer_tmpl_t::merge_shard_interval_results(
cur_merged_interval))
return false;
// Add the merged interval to the result list of whole trace intervals.
- cur_merged_interval->shard_id = analysis_tool_tmpl_t<
+ cur_merged_interval->shard_id_ = analysis_tool_tmpl_t<
RecordType>::interval_state_snapshot_t::WHOLE_TRACE_SHARD_ID;
- cur_merged_interval->interval_end_timestamp = earliest_interval_end_timestamp;
- cur_merged_interval->interval_id = compute_interval_id(
+ cur_merged_interval->interval_end_timestamp_ = earliest_interval_end_timestamp;
+ cur_merged_interval->interval_id_ = compute_timestamp_interval_id(
earliest_ever_interval_end_timestamp, earliest_interval_end_timestamp);
merged_intervals.push_back(cur_merged_interval);
}
@@ -776,31 +862,77 @@ analyzer_tmpl_t::merge_shard_interval_results(
return true;
}
+template
+void
+analyzer_tmpl_t::populate_unmerged_shard_interval_results()
+{
+ for (auto &worker : worker_data_) {
+ for (auto &shard_data : worker.shard_data) {
+ assert(static_cast(shard_data.second.tool_data.size()) == num_tools_);
+ for (int tool_idx = 0; tool_idx < num_tools_; ++tool_idx) {
+ key_tool_shard_t tool_shard_key = { tool_idx,
+ shard_data.second.shard_index };
+ per_shard_interval_snapshots_[tool_shard_key] = std::move(
+ shard_data.second.tool_data[tool_idx].interval_snapshot_data);
+ }
+ }
+ }
+}
+
+template
+void
+analyzer_tmpl_t::populate_serial_interval_results()
+{
+ assert(whole_trace_interval_snapshots_.empty());
+ whole_trace_interval_snapshots_.resize(num_tools_);
+ assert(worker_data_.size() == 1);
+ assert(worker_data_[0].shard_data.size() == 1 &&
+ worker_data_[0].shard_data.count(0) == 1);
+ assert(static_cast(worker_data_[0].shard_data[0].tool_data.size()) ==
+ num_tools_);
+ for (int tool_idx = 0; tool_idx < num_tools_; ++tool_idx) {
+ whole_trace_interval_snapshots_[tool_idx] = std::move(
+ worker_data_[0].shard_data[0].tool_data[tool_idx].interval_snapshot_data);
+ }
+}
+
template
bool
analyzer_tmpl_t::collect_and_maybe_merge_shard_interval_results()
{
- // all_intervals[tool_idx][shard_idx] contains a queue of the
+ assert(interval_microseconds_ != 0 || interval_instr_count_ != 0);
+ if (!parallel_) {
+ populate_serial_interval_results();
+ return true;
+ }
+ if (interval_instr_count_ > 0) {
+ // We do not merge interval state snapshots across shards. See comment by
+ // per_shard_interval_snapshots for more details.
+ populate_unmerged_shard_interval_results();
+ return true;
+ }
+ // all_intervals[tool_idx][shard_idx] contains a vector of the
// interval_state_snapshot_t* that were output by that tool for that shard.
- std::vector::interval_state_snapshot_t *>>>
all_intervals(num_tools_);
for (const auto &worker : worker_data_) {
for (const auto &shard_data : worker.shard_data) {
+ assert(static_cast(shard_data.second.tool_data.size()) == num_tools_);
for (int tool_idx = 0; tool_idx < num_tools_; ++tool_idx) {
all_intervals[tool_idx].emplace_back(std::move(
shard_data.second.tool_data[tool_idx].interval_snapshot_data));
}
}
}
- assert(merged_interval_snapshots_.empty());
- merged_interval_snapshots_.resize(num_tools_);
+ assert(whole_trace_interval_snapshots_.empty());
+ whole_trace_interval_snapshots_.resize(num_tools_);
for (int tool_idx = 0; tool_idx < num_tools_; ++tool_idx) {
// We need to do this separately per tool because all tools may not
// generate an interval_state_snapshot_t for the same intervals (even though
// the framework notifies all tools of all intervals).
if (!merge_shard_interval_results(all_intervals[tool_idx],
- merged_interval_snapshots_[tool_idx],
+ whole_trace_interval_snapshots_[tool_idx],
tool_idx)) {
return false;
}
@@ -848,12 +980,20 @@ analyzer_tmpl_t::run()
}
}
}
- if (interval_microseconds_ != 0) {
+ if (interval_microseconds_ != 0 || interval_instr_count_ != 0) {
return collect_and_maybe_merge_shard_interval_results();
}
return true;
}
+static void
+print_output_separator()
+{
+
+ std::cerr << "\n=========================================================="
+ "=================\n";
+}
+
template
bool
analyzer_tmpl_t::print_stats()
@@ -865,25 +1005,84 @@ analyzer_tmpl_t::print_stats()
error_string_ = tools_[i]->get_error_string();
return false;
}
- if (interval_microseconds_ != 0 && !merged_interval_snapshots_.empty()) {
- // merged_interval_snapshots_ may be empty depending on the derived class's
- // implementation of collect_and_maybe_merge_shard_interval_results.
- if (!merged_interval_snapshots_[i].empty() &&
- !tools_[i]->print_interval_results(merged_interval_snapshots_[i])) {
+ if (i + 1 < num_tools_) {
+ // Separate tool output.
+ print_output_separator();
+ }
+ }
+ // Now print interval results.
+ // Should not have both whole-trace or per-shard interval snapshots.
+ assert(whole_trace_interval_snapshots_.empty() ||
+ per_shard_interval_snapshots_.empty());
+ // We may have whole-trace intervals snapshots for instr count intervals in serial
+ // mode, and for timestamp (microsecond) intervals in both serial and parallel mode.
+ if (!whole_trace_interval_snapshots_.empty()) {
+ // Separate non-interval and interval outputs.
+ print_output_separator();
+ std::cerr << "Printing whole-trace interval results:\n";
+ for (int i = 0; i < num_tools_; ++i) {
+ // whole_trace_interval_snapshots_[i] may be empty if the corresponding tool
+ // did not produce any interval results.
+ if (!whole_trace_interval_snapshots_[i].empty() &&
+ !tools_[i]->print_interval_results(whole_trace_interval_snapshots_[i])) {
error_string_ = tools_[i]->get_error_string();
return false;
}
- for (auto snapshot : merged_interval_snapshots_[i]) {
+ for (auto snapshot : whole_trace_interval_snapshots_[i]) {
if (!tools_[i]->release_interval_snapshot(snapshot)) {
error_string_ = tools_[i]->get_error_string();
return false;
}
}
+ if (i + 1 < num_tools_) {
+ // Separate tool output.
+ print_output_separator();
+ }
}
- if (i + 1 < num_tools_) {
- // Separate tool output.
- std::cerr << "\n=========================================================="
- "=================\n";
+ } else if (!per_shard_interval_snapshots_.empty()) {
+ // Separate non-interval and interval outputs.
+ print_output_separator();
+ std::cerr << "Printing unmerged per-shard interval results:\n";
+ for (auto &interval_snapshots : per_shard_interval_snapshots_) {
+ int tool_idx = interval_snapshots.first.tool_idx;
+ if (!interval_snapshots.second.empty() &&
+ !tools_[tool_idx]->print_interval_results(interval_snapshots.second)) {
+ error_string_ = tools_[tool_idx]->get_error_string();
+ return false;
+ }
+ for (auto snapshot : interval_snapshots.second) {
+ if (!tools_[tool_idx]->release_interval_snapshot(snapshot)) {
+ error_string_ = tools_[tool_idx]->get_error_string();
+ return false;
+ }
+ }
+ print_output_separator();
+ }
+ }
+ return true;
+}
+
+template
+bool
+analyzer_tmpl_t::finalize_interval_snapshots(
+ analyzer_worker_data_t *worker, bool parallel, int shard_idx)
+{
+ assert(parallel ||
+ shard_idx == 0); // Only parallel mode supports a non-zero shard_idx.
+ for (int tool_idx = 0; tool_idx < num_tools_; ++tool_idx) {
+ if (!worker->shard_data[shard_idx]
+ .tool_data[tool_idx]
+ .interval_snapshot_data.empty() &&
+ !tools_[tool_idx]->finalize_interval_snapshots(worker->shard_data[shard_idx]
+ .tool_data[tool_idx]
+ .interval_snapshot_data)) {
+ worker->error = tools_[tool_idx]->get_error_string();
+ VPRINT(this, 1,
+ "Worker %d hit finalize_interval_snapshots error %s during %s "
+ "analysis in trace shard %s\n",
+ worker->index, worker->error.c_str(), parallel ? "parallel" : "serial",
+ worker->stream->get_stream_name().c_str());
+ return false;
}
}
return true;
@@ -893,9 +1092,10 @@ template
bool
analyzer_tmpl_t::process_interval(
uint64_t interval_id, uint64_t interval_init_instr_count,
- analyzer_worker_data_t *worker, bool parallel, int shard_idx)
+ analyzer_worker_data_t *worker, bool parallel, bool at_instr_record, int shard_idx)
{
- assert(parallel || shard_idx == 0); // Default to zero for the serial mode.
+ assert(parallel ||
+ shard_idx == 0); // Only parallel mode supports a non-zero shard_idx.
for (int tool_idx = 0; tool_idx < num_tools_; ++tool_idx) {
typename analysis_tool_tmpl_t::interval_state_snapshot_t *snapshot;
if (parallel) {
@@ -916,18 +1116,31 @@ analyzer_tmpl_t::process_interval(
return false;
}
if (snapshot != nullptr) {
- snapshot->shard_id = parallel
+ snapshot->shard_id_ = parallel
? worker->shard_data[shard_idx].shard_id
: analysis_tool_tmpl_t<
RecordType>::interval_state_snapshot_t::WHOLE_TRACE_SHARD_ID;
- snapshot->interval_id = interval_id;
- snapshot->interval_end_timestamp = compute_interval_end_timestamp(
- worker->stream->get_first_timestamp(), interval_id);
- snapshot->instr_count_cumulative = worker->stream->get_instruction_ordinal();
- snapshot->instr_count_delta =
- snapshot->instr_count_cumulative - interval_init_instr_count;
- worker->shard_data[shard_idx].tool_data[tool_idx].interval_snapshot_data.push(
- snapshot);
+ snapshot->interval_id_ = interval_id;
+ if (interval_microseconds_ > 0) {
+ // For timestamp intervals, the interval_end_timestamp is the abstract
+ // non-inclusive end timestamp for the interval_id. This is to make it
+ // easier to line up the corresponding shard interval snapshots so that
+ // we can merge them to form the whole-trace interval snapshots.
+ snapshot->interval_end_timestamp_ = compute_interval_end_timestamp(
+ worker->stream->get_first_timestamp(), interval_id);
+ } else {
+ snapshot->interval_end_timestamp_ = worker->stream->get_last_timestamp();
+ }
+ // instr_count_cumulative for the interval snapshot is supposed to be
+ // inclusive, so if the first record after the interval (that is, the record
+ // we're at right now) is an instr, it must be subtracted.
+ snapshot->instr_count_cumulative_ =
+ worker->stream->get_instruction_ordinal() - (at_instr_record ? 1 : 0);
+ snapshot->instr_count_delta_ =
+ snapshot->instr_count_cumulative_ - interval_init_instr_count;
+ worker->shard_data[shard_idx]
+ .tool_data[tool_idx]
+ .interval_snapshot_data.push_back(snapshot);
}
}
return true;
diff --git a/clients/drcachesim/analyzer.h b/clients/drcachesim/analyzer.h
index 8ebc10547b2..63a196bed43 100644
--- a/clients/drcachesim/analyzer.h
+++ b/clients/drcachesim/analyzer.h
@@ -1,5 +1,5 @@
/* **********************************************************
- * Copyright (c) 2016-2023 Google, Inc. All rights reserved.
+ * Copyright (c) 2016-2024 Google, Inc. All rights reserved.
* **********************************************************/
/*
@@ -47,7 +47,6 @@
#include
#include
-#include
#include
#include
#include
@@ -119,7 +118,8 @@ template class analyzer_tmpl_t {
analyzer_tmpl_t(const std::string &trace_path,
analysis_tool_tmpl_t **tools, int num_tools,
int worker_count = 0, uint64_t skip_instrs = 0,
- uint64_t interval_microseconds = 0, int verbosity = 0);
+ uint64_t interval_microseconds = 0, uint64_t interval_instr_count = 0,
+ int verbosity = 0);
/** Launches the analysis process. */
virtual bool
run();
@@ -144,9 +144,10 @@ template class analyzer_tmpl_t {
}
void *shard_data;
- // This is a queue as merge_shard_interval_results processes the intervals in a
- // FIFO manner. Using a queue also makes code a bit simpler.
- std::queue::interval_state_snapshot_t *>
+ // Stores the interval state snapshots generated by this tool for this shard
+ // in the same order as they are generated.
+ std::vector<
+ typename analysis_tool_tmpl_t::interval_state_snapshot_t *>
interval_snapshot_data;
private:
@@ -167,6 +168,8 @@ template class analyzer_tmpl_t {
}
uint64_t cur_interval_index;
+ // Cumulative instr count as it was just before the start of the current
+ // interval.
uint64_t cur_interval_init_instr_count;
// Identifier for the shard (thread or core id).
int64_t shard_id;
@@ -213,20 +216,23 @@ template class analyzer_tmpl_t {
operator=(const analyzer_worker_data_t &) = delete;
};
+ // Pass INVALID_THREAD_ID for only_thread to include all threads.
bool
- init_scheduler(const std::string &trace_path,
- memref_tid_t only_thread = INVALID_THREAD_ID, int verbosity = 0,
- typename sched_type_t::scheduler_options_t *options = nullptr);
+ init_scheduler(const std::string &trace_path, memref_tid_t only_thread, int verbosity,
+ typename sched_type_t::scheduler_options_t options);
+ // For core-sharded, worker_count_ must be set prior to calling this; for parallel
+ // mode if it is not set it will be set to the underlying core count.
+ // For core-sharded, all of "options" is used; otherwise, only the
+ // read_inputs_in_init field is preserved.
bool
- init_scheduler(
- std::unique_ptr reader = std::unique_ptr(nullptr),
- std::unique_ptr reader_end = std::unique_ptr(nullptr),
- int verbosity = 0, typename sched_type_t::scheduler_options_t *options = nullptr);
+ init_scheduler(std::unique_ptr reader,
+ std::unique_ptr reader_end, int verbosity,
+ typename sched_type_t::scheduler_options_t options);
bool
init_scheduler_common(typename sched_type_t::input_workload_t &workload,
- typename sched_type_t::scheduler_options_t *options);
+ typename sched_type_t::scheduler_options_t options);
// Used for std::thread so we need an rvalue (so no &worker).
void
@@ -235,6 +241,10 @@ template class analyzer_tmpl_t {
void
process_serial(analyzer_worker_data_t &worker);
+ // Helper for process_tasks().
+ bool
+ process_tasks_internal(analyzer_worker_data_t *worker);
+
// Helper for process_tasks() which calls parallel_shard_exit() in each tool.
// Returns false if there was an error and the caller should return early.
bool
@@ -249,27 +259,45 @@ template class analyzer_tmpl_t {
bool
record_is_timestamp(const RecordType &record);
+ bool
+ record_is_instr(const RecordType &record);
+
RecordType
create_wait_marker();
RecordType
create_idle_marker();
+ // Invoked after all interval state snapshots have been generated for the given
+ // shard_idx and before any merging or printing of interval snapshots. This
+ // invokes the finalize_interval_snapshots API for all tools that returned some
+ // non-null interval snapshot.
+ bool
+ finalize_interval_snapshots(analyzer_worker_data_t *worker, bool parallel,
+ int shard_idx = 0);
+
// Invoked when the given interval finishes during serial or parallel
// analysis of the trace. For parallel analysis, the shard_id
// parameter should be set to the shard_id for which the interval
// finished. For serial analysis, it should remain the default value.
bool
process_interval(uint64_t interval_id, uint64_t interval_init_instr_count,
- analyzer_worker_data_t *worker, bool parallel, int shard_idx = 0);
+ analyzer_worker_data_t *worker, bool parallel, bool at_instr_record,
+ int shard_idx = 0);
// Compute interval id for the given latest_timestamp, assuming the trace (or
- // trace shard) starts at the given first_timestamp.
+ // trace shard) starts at the given first_timestamp. This is relevant when
+ // timestamp intervals are enabled using interval_microseconds_.
+ uint64_t
+ compute_timestamp_interval_id(uint64_t first_timestamp, uint64_t latest_timestamp);
+
+ // Compute interval id at the given instr count. This is relevant when instr count
+ // intervals are enabled using interval_instr_count_.
uint64_t
- compute_interval_id(uint64_t first_timestamp, uint64_t latest_timestamp);
+ compute_instr_count_interval_id(uint64_t cur_instr_count);
- // Compute the interval end timestamp for the given interval_id, assuming the trace
- // (or trace shard) starts at the given first_timestamp.
+ // Compute the interval end timestamp (non-inclusive) for the given interval_id,
+ // assuming the trace (or trace shard) starts at the given first_timestamp.
uint64_t
compute_interval_end_timestamp(uint64_t first_timestamp, uint64_t interval_id);
@@ -277,11 +305,13 @@ template class analyzer_tmpl_t {
// on the most recent seen timestamp in the trace stream. Returns whether the
// current interval id was updated, and if so also sets the previous interval index
// in prev_interval_index.
+ // at_instr_record indicates that the next record that will be presented to
+ // the analysis tools is an instr record.
bool
advance_interval_id(
typename scheduler_tmpl_t::stream_t *stream,
analyzer_shard_data_t *shard, uint64_t &prev_interval_index,
- uint64_t &prev_interval_init_instr_count);
+ uint64_t &prev_interval_init_instr_count, bool at_instr_record);
// Collects interval results for all shards from the workers, and then optional
// merges the shard-local intervals to form the whole-trace interval results using
@@ -290,20 +320,30 @@ template class analyzer_tmpl_t {
virtual bool
collect_and_maybe_merge_shard_interval_results();
- // Computes and stores the interval results in merged_interval_snapshots_. For
+ // Computes and stores the interval results in whole_trace_interval_snapshots_. For
// serial analysis where we already have only a single shard, this involves
// simply copying interval_state_snapshot_t* from the input. For parallel
// analysis, this involves merging results from multiple shards for intervals
// that map to the same final whole-trace interval.
bool
merge_shard_interval_results(
- std::vector::interval_state_snapshot_t *>>
&intervals,
std::vector::interval_state_snapshot_t
*> &merged_intervals,
int tool_idx);
+ // Populates the per_shard_interval_snapshots_ field based on the interval snapshots
+ // stored in worker_data_.
+ void
+ populate_unmerged_shard_interval_results();
+
+ // Populates the whole_trace_interval_snapshots_ field based on the interval snapshots
+ // stored in the only entry of worker_data_.
+ void
+ populate_serial_interval_results();
+
// Combines all interval snapshots in the given vector to create the interval
// snapshot for the whole-trace interval ending at interval_end_timestamp and
// stores it in 'result'. These snapshots are for the tool at tool_idx. Returns
@@ -328,24 +368,67 @@ template class analyzer_tmpl_t {
std::vector worker_data_;
int num_tools_;
analysis_tool_tmpl_t **tools_;
- // Stores the interval state snapshots for the whole trace, which for the parallel
- // mode are the resulting interval state snapshots after merging from all shards
- // in merge_shard_interval_results.
- // merged_interval_snapshots_[tool_idx] is a vector of the interval snapshots
- // (in order of the intervals) for that tool.
- // This may not be set, depending on the derived class's implementation of
- // collect_and_maybe_merge_shard_interval_results.
+ // Stores the interval state snapshots, merged across shards. These are
+ // produced when timestamp intervals are enabled using interval_microseconds_.
+ //
+ // whole_trace_interval_snapshots_[tool_idx] is a vector of the interval snapshots
+ // (in order of the intervals) for that tool. For the parallel mode, these
+ // interval state snapshots are produced after merging corresponding shard
+ // interval snapshots using merge_shard_interval_results.
std::vector::interval_state_snapshot_t *>>
- merged_interval_snapshots_;
+ whole_trace_interval_snapshots_;
+
+ // Key that combines tool and shard idx for use with an std::unordered_map.
+ struct key_tool_shard_t {
+ int tool_idx;
+ int shard_idx;
+ bool
+ operator==(const key_tool_shard_t &rhs) const
+ {
+ return tool_idx == rhs.tool_idx && shard_idx == rhs.shard_idx;
+ }
+ };
+ struct key_tool_shard_hash_t {
+ std::size_t
+ operator()(const key_tool_shard_t &t) const
+ {
+ return std::hash()(t.tool_idx ^ t.shard_idx);
+ }
+ };
+
+ // Stores the interval state snapshots, unmerged across shards. These are
+ // produced when instr count intervals are enabled using interval_instr_count_.
+ //
+ // per_shard_interval_snapshots_[(tool_idx, shard_idx)] is a vector
+ // of the interval snapshots for that tool and shard. Note that the snapshots for
+ // each shard are separate; they are not merged across shards.
+ //
+ // TODO i#6643: Figure out a useful way to merge instr count intervals across shards.
+ // One way is to merge the shard interval snapshots that correspond to the same
+ // [interval_instr_count_ * interval_id, interval_instr_count_ * (interval_id + 1))
+ // shard-local instrs. But it is not clear whether this is useful.
+ // Another way is to merge the shard interval snapshots that correspond to the same
+ // [interval_instr_count_ * interval_id, interval_instr_count_ * (interval_id + 1))
+ // whole-trace instrs. But that is much harder to compute. We'd need some way to
+ // identify the whole-trace interval boundaries in each shard's stream (since we
+ // process each shard separately); this would likely need a pre-processing pass.
+ std::unordered_map::interval_state_snapshot_t *>,
+ key_tool_shard_hash_t>
+ per_shard_interval_snapshots_;
+
bool parallel_;
int worker_count_;
const char *output_prefix_ = "[analyzer]";
uint64_t skip_instrs_ = 0;
uint64_t interval_microseconds_ = 0;
+ uint64_t interval_instr_count_ = 0;
int verbosity_ = 0;
shard_type_t shard_type_ = SHARD_BY_THREAD;
bool sched_by_time_ = false;
+ typename sched_type_t::mapping_t sched_mapping_ = sched_type_t::MAP_TO_ANY_OUTPUT;
private:
bool
diff --git a/clients/drcachesim/analyzer_multi.cpp b/clients/drcachesim/analyzer_multi.cpp
index 9310abe2b02..492e87c499e 100644
--- a/clients/drcachesim/analyzer_multi.cpp
+++ b/clients/drcachesim/analyzer_multi.cpp
@@ -1,5 +1,5 @@
/* **********************************************************
- * Copyright (c) 2016-2023 Google, Inc. All rights reserved.
+ * Copyright (c) 2016-2024 Google, Inc. All rights reserved.
* **********************************************************/
/*
@@ -53,6 +53,7 @@
#include "simulator/cache_simulator_create.h"
#include "simulator/tlb_simulator_create.h"
#include "tools/basic_counts_create.h"
+#include "tools/filter/record_filter_create.h"
#include "tools/func_view_create.h"
#include "tools/histogram_create.h"
#include "tools/invariant_checker.h"
@@ -65,6 +66,7 @@
#include "tools/view_create.h"
#include "tools/loader/external_config_file.h"
#include "tools/loader/external_tool_creator.h"
+#include "tools/filter/record_filter_create.h"
namespace dynamorio {
namespace drmemtrace {
@@ -72,6 +74,26 @@ namespace drmemtrace {
using ::dynamorio::droption::droption_parser_t;
using ::dynamorio::droption::DROPTION_SCOPE_ALL;
+/****************************************************************
+ * Specializations for analyzer_multi_tmpl_t,
+ * aka analyzer_multi_t.
+ */
+
+template <>
+std::unique_ptr
+analyzer_multi_t::create_ipc_reader(const char *name, int verbose)
+{
+ return std::unique_ptr(new ipc_reader_t(name, verbose));
+}
+
+template <>
+std::unique_ptr
+analyzer_multi_t::create_ipc_reader_end()
+{
+ return std::unique_ptr(new ipc_reader_t());
+}
+
+template <>
analysis_tool_t *
analyzer_multi_t::create_external_tool(const std::string &tool_name)
{
@@ -106,26 +128,245 @@ analyzer_multi_t::create_external_tool(const std::string &tool_name)
return tool;
}
-analyzer_multi_t::analyzer_multi_t()
+template <>
+analysis_tool_t *
+analyzer_multi_t::create_invariant_checker()
+{
+ if (op_offline.get_value()) {
+ // TODO i#5538: Locate and open the schedule files and pass to the
+ // reader(s) for seeking. For now we only read them for this test.
+ // TODO i#5843: Share this code with scheduler_t or pass in for all
+ // tools from here for fast skipping in serial and per-cpu modes.
+ std::string tracedir =
+ raw2trace_directory_t::tracedir_from_rawdir(op_indir.get_value());
+ if (directory_iterator_t::is_directory(tracedir)) {
+ directory_iterator_t end;
+ directory_iterator_t iter(tracedir);
+ if (!iter) {
+ this->error_string_ = "Failed to list directory: " + iter.error_string();
+ return nullptr;
+ }
+ for (; iter != end; ++iter) {
+ const std::string fname = *iter;
+ const std::string fpath = tracedir + DIRSEP + fname;
+ if (starts_with(fname, DRMEMTRACE_SERIAL_SCHEDULE_FILENAME)) {
+ if (ends_with(fname, ".gz")) {
+#ifdef HAS_ZLIB
+ this->serial_schedule_file_ =
+ std::unique_ptr(new gzip_istream_t(fpath));
+#endif
+ } else {
+ this->serial_schedule_file_ = std::unique_ptr(
+ new std::ifstream(fpath, std::ifstream::binary));
+ }
+ if (this->serial_schedule_file_ && !*serial_schedule_file_) {
+ this->error_string_ =
+ "Failed to open serial schedule file " + fpath;
+ return nullptr;
+ }
+ } else if (fname == DRMEMTRACE_CPU_SCHEDULE_FILENAME) {
+#ifdef HAS_ZIP
+ this->cpu_schedule_file_ =
+ std::unique_ptr(new zipfile_istream_t(fpath));
+#endif
+ }
+ }
+ }
+ }
+ return new invariant_checker_t(op_offline.get_value(), op_verbose.get_value(),
+ op_test_mode_name.get_value(),
+ serial_schedule_file_.get(), cpu_schedule_file_.get());
+}
+
+template <>
+analysis_tool_t *
+analyzer_multi_t::create_analysis_tool_from_options(const std::string &simulator_type)
+{
+ if (simulator_type == CPU_CACHE) {
+ const std::string &config_file = op_config_file.get_value();
+ if (!config_file.empty()) {
+ return cache_simulator_create(config_file);
+ } else {
+ cache_simulator_knobs_t *knobs = get_cache_simulator_knobs();
+ return cache_simulator_create(*knobs);
+ }
+ } else if (simulator_type == MISS_ANALYZER) {
+ cache_simulator_knobs_t *knobs = get_cache_simulator_knobs();
+ return cache_miss_analyzer_create(*knobs, op_miss_count_threshold.get_value(),
+ op_miss_frac_threshold.get_value(),
+ op_confidence_threshold.get_value());
+ } else if (simulator_type == TLB) {
+ tlb_simulator_knobs_t knobs;
+ knobs.num_cores = op_num_cores.get_value();
+ knobs.page_size = op_page_size.get_value();
+ knobs.TLB_L1I_entries = op_TLB_L1I_entries.get_value();
+ knobs.TLB_L1D_entries = op_TLB_L1D_entries.get_value();
+ knobs.TLB_L1I_assoc = op_TLB_L1I_assoc.get_value();
+ knobs.TLB_L1D_assoc = op_TLB_L1D_assoc.get_value();
+ knobs.TLB_L2_entries = op_TLB_L2_entries.get_value();
+ knobs.TLB_L2_assoc = op_TLB_L2_assoc.get_value();
+ knobs.TLB_replace_policy = op_TLB_replace_policy.get_value();
+ knobs.skip_refs = op_skip_refs.get_value();
+ knobs.warmup_refs = op_warmup_refs.get_value();
+ knobs.warmup_fraction = op_warmup_fraction.get_value();
+ knobs.sim_refs = op_sim_refs.get_value();
+ knobs.verbose = op_verbose.get_value();
+ knobs.cpu_scheduling = op_cpu_scheduling.get_value();
+ knobs.use_physical = op_use_physical.get_value();
+ return tlb_simulator_create(knobs);
+ } else if (simulator_type == HISTOGRAM) {
+ return histogram_tool_create(op_line_size.get_value(), op_report_top.get_value(),
+ op_verbose.get_value());
+ } else if (simulator_type == REUSE_DIST) {
+ reuse_distance_knobs_t knobs;
+ knobs.line_size = op_line_size.get_value();
+ knobs.report_histogram = op_reuse_distance_histogram.get_value();
+ knobs.distance_threshold = op_reuse_distance_threshold.get_value();
+ knobs.report_top = op_report_top.get_value();
+ knobs.skip_list_distance = op_reuse_skip_dist.get_value();
+ knobs.distance_limit = op_reuse_distance_limit.get_value();
+ knobs.verify_skip = op_reuse_verify_skip.get_value();
+ knobs.histogram_bin_multiplier = op_reuse_histogram_bin_multiplier.get_value();
+ if (knobs.histogram_bin_multiplier < 1.0) {
+ ERRMSG("Usage error: reuse_histogram_bin_multiplier must be >= 1.0\n");
+ return nullptr;
+ }
+ knobs.verbose = op_verbose.get_value();
+ return reuse_distance_tool_create(knobs);
+ } else if (simulator_type == REUSE_TIME) {
+ return reuse_time_tool_create(op_line_size.get_value(), op_verbose.get_value());
+ } else if (simulator_type == BASIC_COUNTS) {
+ return basic_counts_tool_create(op_verbose.get_value());
+ } else if (simulator_type == OPCODE_MIX) {
+ std::string module_file_path = get_module_file_path();
+ if (module_file_path.empty() && op_indir.get_value().empty() &&
+ op_infile.get_value().empty() && !op_instr_encodings.get_value()) {
+ ERRMSG("Usage error: the opcode_mix tool requires offline traces, or "
+ "-instr_encodings for online traces.\n");
+ return nullptr;
+ }
+ return opcode_mix_tool_create(module_file_path, op_verbose.get_value(),
+ op_alt_module_dir.get_value());
+ } else if (simulator_type == SYSCALL_MIX) {
+ return syscall_mix_tool_create(op_verbose.get_value());
+ } else if (simulator_type == VIEW) {
+ std::string module_file_path = get_module_file_path();
+ // The module file is optional so we don't check for emptiness.
+ return view_tool_create(module_file_path, op_skip_refs.get_value(),
+ op_sim_refs.get_value(), op_view_syntax.get_value(),
+ op_verbose.get_value(), op_alt_module_dir.get_value());
+ } else if (simulator_type == FUNC_VIEW) {
+ std::string funclist_file_path = get_aux_file_path(
+ op_funclist_file.get_value(), DRMEMTRACE_FUNCTION_LIST_FILENAME);
+ if (funclist_file_path.empty()) {
+ ERRMSG("Usage error: the func_view tool requires offline traces.\n");
+ return nullptr;
+ }
+ return func_view_tool_create(funclist_file_path, op_show_func_trace.get_value(),
+ op_verbose.get_value());
+ } else if (simulator_type == INVARIANT_CHECKER) {
+ return create_invariant_checker();
+ } else if (simulator_type == SCHEDULE_STATS) {
+ return schedule_stats_tool_create(op_schedule_stats_print_every.get_value(),
+ op_verbose.get_value());
+ } else {
+ auto tool = create_external_tool(simulator_type);
+ if (tool == nullptr) {
+ ERRMSG("Usage error: unsupported analyzer type \"%s\". "
+ "Please choose " CPU_CACHE ", " MISS_ANALYZER ", " TLB ", " HISTOGRAM
+ ", " REUSE_DIST ", " BASIC_COUNTS ", " OPCODE_MIX ", " SYSCALL_MIX
+ ", " VIEW ", " FUNC_VIEW ", or some external analyzer.\n",
+ simulator_type.c_str());
+ }
+ return tool;
+ }
+}
+
+/******************************************************************************
+ * Specializations for analyzer_multi_tmpl_t, aka
+ * record_analyzer_multi_t.
+ */
+
+template <>
+std::unique_ptr
+record_analyzer_multi_t::create_ipc_reader(const char *name, int verbose)
+{
+ error_string_ = "Online analysis is not supported for record_filter";
+ ERRMSG("%s\n", error_string_.c_str());
+ return std::unique_ptr();
+}
+
+template <>
+std::unique_ptr
+record_analyzer_multi_t::create_ipc_reader_end()
+{
+ error_string_ = "Online analysis is not supported for record_filter";
+ ERRMSG("%s\n", error_string_.c_str());
+ return std::unique_ptr();
+}
+
+template <>
+record_analysis_tool_t *
+record_analyzer_multi_t::create_external_tool(const std::string &tool_name)
{
- worker_count_ = op_jobs.get_value();
- skip_instrs_ = op_skip_instrs.get_value();
- interval_microseconds_ = op_interval_microseconds.get_value();
+ error_string_ = "External tools are not supported for record analysis";
+ ERRMSG("%s\n", error_string_.c_str());
+ return nullptr;
+}
+
+template <>
+record_analysis_tool_t *
+record_analyzer_multi_t::create_invariant_checker()
+{
+ error_string_ = "Invariant checker is not supported for record analysis";
+ ERRMSG("%s\n", error_string_.c_str());
+ return nullptr;
+}
+
+template <>
+record_analysis_tool_t *
+record_analyzer_multi_t::create_analysis_tool_from_options(
+ const std::string &simulator_type)
+{
+ if (simulator_type == RECORD_FILTER) {
+ return record_filter_tool_create(
+ op_outdir.get_value(), op_filter_stop_timestamp.get_value(),
+ op_filter_cache_size.get_value(), op_filter_trace_types.get_value(),
+ op_filter_marker_types.get_value(), op_trim_before_timestamp.get_value(),
+ op_trim_after_timestamp.get_value(), op_verbose.get_value());
+ }
+ ERRMSG("Usage error: unsupported record analyzer type \"%s\". Only " RECORD_FILTER
+ " is supported.\n",
+ simulator_type.c_str());
+ return nullptr;
+}
+
+/********************************************************************
+ * Other analyzer_multi_tmpl_t routines that do not need to be specialized.
+ */
+
+template
+analyzer_multi_tmpl_t::analyzer_multi_tmpl_t()
+{
+ this->worker_count_ = op_jobs.get_value();
+ this->skip_instrs_ = op_skip_instrs.get_value();
+ this->interval_microseconds_ = op_interval_microseconds.get_value();
+ this->interval_instr_count_ = op_interval_instr_count.get_value();
// Initial measurements show it's sometimes faster to keep the parallel model
// of using single-file readers but use them sequentially, as opposed to
// the every-file interleaving reader, but the user can specify -jobs 1, so
// we still keep the serial vs parallel split for 0.
- if (worker_count_ == 0)
- parallel_ = false;
+ if (this->worker_count_ == 0)
+ this->parallel_ = false;
if (!op_indir.get_value().empty() || !op_infile.get_value().empty())
op_offline.set_value(true); // Some tools check this on post-proc runs.
// XXX: add a "required" flag to droption to avoid needing this here
if (op_indir.get_value().empty() && op_infile.get_value().empty() &&
op_ipc_name.get_value().empty()) {
- error_string_ =
+ this->error_string_ =
"Usage error: -ipc_name or -indir or -infile is required\nUsage:\n" +
droption_parser_t::usage_short(DROPTION_SCOPE_ALL);
- success_ = false;
+ this->success_ = false;
return;
}
if (!op_indir.get_value().empty()) {
@@ -163,8 +404,8 @@ analyzer_multi_t::analyzer_multi_t()
dir.initialize(op_indir.get_value(), "", op_trace_compress.get_value(),
op_syscall_template_file.get_value());
if (!dir_err.empty()) {
- success_ = false;
- error_string_ = "Directory setup failed: " + dir_err;
+ this->success_ = false;
+ this->error_string_ = "Directory setup failed: " + dir_err;
return;
}
raw2trace_t raw2trace(
@@ -176,69 +417,71 @@ analyzer_multi_t::analyzer_multi_t()
std::move(dir.syscall_template_file_reader_));
std::string error = raw2trace.do_conversion();
if (!error.empty()) {
- success_ = false;
- error_string_ = "raw2trace failed: " + error;
+ this->success_ = false;
+ this->error_string_ = "raw2trace failed: " + error;
}
}
}
// Create the tools after post-processing so we have the schedule files for
// test_mode.
if (!create_analysis_tools()) {
- success_ = false;
- error_string_ = "Failed to create analysis tool:" + error_string_;
+ this->success_ = false;
+ this->error_string_ = "Failed to create analysis tool:" + this->error_string_;
return;
}
- scheduler_t::scheduler_options_t sched_ops;
- scheduler_t::scheduler_options_t *sched_ops_ptr = nullptr;
+ typename sched_type_t::scheduler_options_t sched_ops;
if (op_core_sharded.get_value() || op_core_serial.get_value()) {
if (op_core_serial.get_value()) {
- // TODO i#5694: Add serial core-sharded support by having the
- // analyzer create #cores streams but walk them in lockstep.
- // Then, update drcachesim to use get_output_cpuid().
- error_string_ = "-core_serial is not yet implemented";
- success_ = false;
- return;
+ this->parallel_ = false;
}
sched_ops = init_dynamic_schedule();
- sched_ops_ptr = &sched_ops;
}
if (!op_indir.get_value().empty()) {
std::string tracedir =
raw2trace_directory_t::tracedir_from_rawdir(op_indir.get_value());
- if (!init_scheduler(tracedir, op_only_thread.get_value(), op_verbose.get_value(),
- sched_ops_ptr))
- success_ = false;
+ if (!this->init_scheduler(tracedir, op_only_thread.get_value(),
+ op_verbose.get_value(), std::move(sched_ops)))
+ this->success_ = false;
} else if (op_infile.get_value().empty()) {
// XXX i#3323: Add parallel analysis support for online tools.
- parallel_ = false;
- auto reader = std::unique_ptr(
- new ipc_reader_t(op_ipc_name.get_value().c_str(), op_verbose.get_value()));
- auto end = std::unique_ptr(new ipc_reader_t());
- if (!init_scheduler(std::move(reader), std::move(end), op_verbose.get_value(),
- sched_ops_ptr)) {
- success_ = false;
+ this->parallel_ = false;
+ auto reader =
+ create_ipc_reader(op_ipc_name.get_value().c_str(), op_verbose.get_value());
+ if (!reader) {
+ this->error_string_ = "Failed to create IPC reader: " + this->error_string_;
+ this->success_ = false;
+ return;
+ }
+ auto end = create_ipc_reader_end();
+ // We do not want the scheduler's init() to block.
+ sched_ops.read_inputs_in_init = false;
+ if (!this->init_scheduler(std::move(reader), std::move(end),
+ op_verbose.get_value(), std::move(sched_ops))) {
+ this->success_ = false;
}
} else {
// Legacy file.
- if (!init_scheduler(op_infile.get_value(), INVALID_THREAD_ID /*all threads*/,
- op_verbose.get_value(), sched_ops_ptr))
- success_ = false;
+ if (!this->init_scheduler(op_infile.get_value(),
+ INVALID_THREAD_ID /*all threads*/,
+ op_verbose.get_value(), std::move(sched_ops)))
+ this->success_ = false;
}
if (!init_analysis_tools()) {
- success_ = false;
+ this->success_ = false;
return;
}
// We can't call serial_trace_iter_->init() here as it blocks for ipc_reader_t.
}
-analyzer_multi_t::~analyzer_multi_t()
+template
+analyzer_multi_tmpl_t::~analyzer_multi_tmpl_t()
{
#ifdef HAS_ZIP
if (!op_record_file.get_value().empty()) {
- if (scheduler_.write_recorded_schedule() != scheduler_t::STATUS_SUCCESS) {
+ if (this->scheduler_.write_recorded_schedule() != sched_type_t::STATUS_SUCCESS) {
ERRMSG("Failed to write schedule to %s", op_record_file.get_value().c_str());
}
}
@@ -246,23 +489,25 @@ analyzer_multi_t::~analyzer_multi_t()
destroy_analysis_tools();
}
-scheduler_t::scheduler_options_t
-analyzer_multi_t::init_dynamic_schedule()
+template
+typename scheduler_tmpl_t::scheduler_options_t
+analyzer_multi_tmpl_t::init_dynamic_schedule()
{
- shard_type_ = SHARD_BY_CORE;
- worker_count_ = op_num_cores.get_value();
- scheduler_t::scheduler_options_t sched_ops(
- scheduler_t::MAP_TO_ANY_OUTPUT,
- op_sched_order_time.get_value() ? scheduler_t::DEPENDENCY_TIMESTAMPS
- : scheduler_t::DEPENDENCY_IGNORE,
- scheduler_t::SCHEDULER_DEFAULTS, op_verbose.get_value());
+ this->shard_type_ = SHARD_BY_CORE;
+ this->worker_count_ = op_num_cores.get_value();
+ typename sched_type_t::scheduler_options_t sched_ops(
+ sched_type_t::MAP_TO_ANY_OUTPUT,
+ op_sched_order_time.get_value() ? sched_type_t::DEPENDENCY_TIMESTAMPS
+ : sched_type_t::DEPENDENCY_IGNORE,
+ sched_type_t::SCHEDULER_DEFAULTS, op_verbose.get_value());
sched_ops.quantum_duration = op_sched_quantum.get_value();
if (op_sched_time.get_value())
- sched_ops.quantum_unit = scheduler_t::QUANTUM_TIME;
+ sched_ops.quantum_unit = sched_type_t::QUANTUM_TIME;
sched_ops.syscall_switch_threshold = op_sched_syscall_switch_us.get_value();
sched_ops.blocking_switch_threshold = op_sched_blocking_switch_us.get_value();
sched_ops.block_time_scale = op_sched_block_scale.get_value();
sched_ops.block_time_max = op_sched_block_max_us.get_value();
+ sched_ops.randomize_next_input = op_sched_randomize.get_value();
#ifdef HAS_ZIP
if (!op_record_file.get_value().empty()) {
record_schedule_zip_.reset(new zipfile_ostream_t(op_record_file.get_value()));
@@ -270,32 +515,30 @@ analyzer_multi_t::init_dynamic_schedule()
} else if (!op_replay_file.get_value().empty()) {
replay_schedule_zip_.reset(new zipfile_istream_t(op_replay_file.get_value()));
sched_ops.schedule_replay_istream = replay_schedule_zip_.get();
- sched_ops.mapping = scheduler_t::MAP_AS_PREVIOUSLY;
- sched_ops.deps = scheduler_t::DEPENDENCY_TIMESTAMPS;
+ sched_ops.mapping = sched_type_t::MAP_AS_PREVIOUSLY;
+ sched_ops.deps = sched_type_t::DEPENDENCY_TIMESTAMPS;
} else if (!op_cpu_schedule_file.get_value().empty()) {
cpu_schedule_zip_.reset(new zipfile_istream_t(op_cpu_schedule_file.get_value()));
- sched_ops.mapping = scheduler_t::MAP_TO_RECORDED_OUTPUT;
- sched_ops.deps = scheduler_t::DEPENDENCY_TIMESTAMPS;
+ sched_ops.mapping = sched_type_t::MAP_TO_RECORDED_OUTPUT;
+ sched_ops.deps = sched_type_t::DEPENDENCY_TIMESTAMPS;
sched_ops.replay_as_traced_istream = cpu_schedule_zip_.get();
}
#endif
+ sched_ops.kernel_switch_trace_path = op_sched_switch_file.get_value();
return sched_ops;
}
+template
bool
-analyzer_multi_t::create_analysis_tools()
+analyzer_multi_tmpl_t::create_analysis_tools()
{
- /* TODO i#2006: add multiple tool support. */
- /* TODO i#2006: create a single top-level tool for multi-component
- * tools.
- */
- tools_ = new analysis_tool_t *[max_num_tools_];
+ this->tools_ = new analysis_tool_tmpl_t *[this->max_num_tools_];
if (!op_simulator_type.get_value().empty()) {
std::stringstream stream(op_simulator_type.get_value());
std::string type;
while (std::getline(stream, type, ':')) {
- if (num_tools_ >= max_num_tools_ - 1) {
- error_string_ = "Only " + std::to_string(max_num_tools_ - 1) +
+ if (this->num_tools_ >= this->max_num_tools_ - 1) {
+ this->error_string_ = "Only " + std::to_string(this->max_num_tools_ - 1) +
" simulators are allowed simultaneously";
return false;
}
@@ -306,196 +549,49 @@ analyzer_multi_t::create_analysis_tools()
std::string tool_error = tool->get_error_string();
if (tool_error.empty())
tool_error = "no error message provided.";
- error_string_ = "Tool failed to initialize: " + tool_error;
+ this->error_string_ = "Tool failed to initialize: " + tool_error;
delete tool;
return false;
}
- tools_[num_tools_++] = tool;
+ this->tools_[this->num_tools_++] = tool;
}
}
if (op_test_mode.get_value()) {
- tools_[num_tools_] = create_invariant_checker();
- if (tools_[num_tools_] == NULL)
+ // This will return nullptr for record_ instantiation; we just don't support
+ // -test_mode for record_.
+ this->tools_[this->num_tools_] = create_invariant_checker();
+ if (this->tools_[this->num_tools_] == NULL)
return false;
- if (!*tools_[num_tools_]) {
- error_string_ = tools_[num_tools_]->get_error_string();
- delete tools_[num_tools_];
- tools_[num_tools_] = NULL;
+ if (!*this->tools_[this->num_tools_]) {
+ this->error_string_ = this->tools_[this->num_tools_]->get_error_string();
+ delete this->tools_[this->num_tools_];
+ this->tools_[this->num_tools_] = NULL;
return false;
}
- num_tools_++;
+ this->num_tools_++;
}
- return (num_tools_ != 0);
+ return (this->num_tools_ != 0);
}
+template
bool
-analyzer_multi_t::init_analysis_tools()
+analyzer_multi_tmpl_t::init_analysis_tools()
{
// initialize_stream() is now called from analyzer_t::run().
return true;
}
+template
void
-analyzer_multi_t::destroy_analysis_tools()
+analyzer_multi_tmpl_t::destroy_analysis_tools()
{
- if (!success_)
+ if (!this->success_)
return;
- for (int i = 0; i < num_tools_; i++)
- delete tools_[i];
- delete[] tools_;
-}
-
-analysis_tool_t *
-analyzer_multi_t::create_analysis_tool_from_options(const std::string &simulator_type)
-{
- if (simulator_type == CPU_CACHE) {
- const std::string &config_file = op_config_file.get_value();
- if (!config_file.empty()) {
- return cache_simulator_create(config_file);
- } else {
- cache_simulator_knobs_t *knobs = get_cache_simulator_knobs();
- return cache_simulator_create(*knobs);
- }
- } else if (simulator_type == MISS_ANALYZER) {
- cache_simulator_knobs_t *knobs = get_cache_simulator_knobs();
- return cache_miss_analyzer_create(*knobs, op_miss_count_threshold.get_value(),
- op_miss_frac_threshold.get_value(),
- op_confidence_threshold.get_value());
- } else if (simulator_type == TLB) {
- tlb_simulator_knobs_t knobs;
- knobs.num_cores = op_num_cores.get_value();
- knobs.page_size = op_page_size.get_value();
- knobs.TLB_L1I_entries = op_TLB_L1I_entries.get_value();
- knobs.TLB_L1D_entries = op_TLB_L1D_entries.get_value();
- knobs.TLB_L1I_assoc = op_TLB_L1I_assoc.get_value();
- knobs.TLB_L1D_assoc = op_TLB_L1D_assoc.get_value();
- knobs.TLB_L2_entries = op_TLB_L2_entries.get_value();
- knobs.TLB_L2_assoc = op_TLB_L2_assoc.get_value();
- knobs.TLB_replace_policy = op_TLB_replace_policy.get_value();
- knobs.skip_refs = op_skip_refs.get_value();
- knobs.warmup_refs = op_warmup_refs.get_value();
- knobs.warmup_fraction = op_warmup_fraction.get_value();
- knobs.sim_refs = op_sim_refs.get_value();
- knobs.verbose = op_verbose.get_value();
- knobs.cpu_scheduling = op_cpu_scheduling.get_value();
- knobs.use_physical = op_use_physical.get_value();
- return tlb_simulator_create(knobs);
- } else if (simulator_type == HISTOGRAM) {
- return histogram_tool_create(op_line_size.get_value(), op_report_top.get_value(),
- op_verbose.get_value());
- } else if (simulator_type == REUSE_DIST) {
- reuse_distance_knobs_t knobs;
- knobs.line_size = op_line_size.get_value();
- knobs.report_histogram = op_reuse_distance_histogram.get_value();
- knobs.distance_threshold = op_reuse_distance_threshold.get_value();
- knobs.report_top = op_report_top.get_value();
- knobs.skip_list_distance = op_reuse_skip_dist.get_value();
- knobs.distance_limit = op_reuse_distance_limit.get_value();
- knobs.verify_skip = op_reuse_verify_skip.get_value();
- knobs.histogram_bin_multiplier = op_reuse_histogram_bin_multiplier.get_value();
- if (knobs.histogram_bin_multiplier < 1.0) {
- ERRMSG("Usage error: reuse_histogram_bin_multiplier must be >= 1.0\n");
- return nullptr;
- }
- knobs.verbose = op_verbose.get_value();
- return reuse_distance_tool_create(knobs);
- } else if (simulator_type == REUSE_TIME) {
- return reuse_time_tool_create(op_line_size.get_value(), op_verbose.get_value());
- } else if (simulator_type == BASIC_COUNTS) {
- return basic_counts_tool_create(op_verbose.get_value());
- } else if (simulator_type == OPCODE_MIX) {
- std::string module_file_path = get_module_file_path();
- if (module_file_path.empty() && op_indir.get_value().empty() &&
- op_infile.get_value().empty() && !op_instr_encodings.get_value()) {
- ERRMSG("Usage error: the opcode_mix tool requires offline traces, or "
- "-instr_encodings for online traces.\n");
- return nullptr;
- }
- return opcode_mix_tool_create(module_file_path, op_verbose.get_value(),
- op_alt_module_dir.get_value());
- } else if (simulator_type == SYSCALL_MIX) {
- return syscall_mix_tool_create(op_verbose.get_value());
- } else if (simulator_type == VIEW) {
- std::string module_file_path = get_module_file_path();
- // The module file is optional so we don't check for emptiness.
- return view_tool_create(module_file_path, op_skip_refs.get_value(),
- op_sim_refs.get_value(), op_view_syntax.get_value(),
- op_verbose.get_value(), op_alt_module_dir.get_value());
- } else if (simulator_type == FUNC_VIEW) {
- std::string funclist_file_path = get_aux_file_path(
- op_funclist_file.get_value(), DRMEMTRACE_FUNCTION_LIST_FILENAME);
- if (funclist_file_path.empty()) {
- ERRMSG("Usage error: the func_view tool requires offline traces.\n");
- return nullptr;
- }
- return func_view_tool_create(funclist_file_path, op_show_func_trace.get_value(),
- op_verbose.get_value());
- } else if (simulator_type == INVARIANT_CHECKER) {
- return create_invariant_checker();
- } else if (simulator_type == SCHEDULE_STATS) {
- return schedule_stats_tool_create(op_schedule_stats_print_every.get_value(),
- op_verbose.get_value());
- } else {
- auto tool = create_external_tool(simulator_type);
- if (tool == nullptr) {
- ERRMSG("Usage error: unsupported analyzer type \"%s\". "
- "Please choose " CPU_CACHE ", " MISS_ANALYZER ", " TLB ", " HISTOGRAM
- ", " REUSE_DIST ", " BASIC_COUNTS ", " OPCODE_MIX ", " SYSCALL_MIX
- ", " VIEW ", " FUNC_VIEW ", or some external analyzer.\n",
- simulator_type.c_str());
- }
- return tool;
- }
-}
-
-analysis_tool_t *
-analyzer_multi_t::create_invariant_checker()
-{
- if (op_offline.get_value()) {
- // TODO i#5538: Locate and open the schedule files and pass to the
- // reader(s) for seeking. For now we only read them for this test.
- // TODO i#5843: Share this code with scheduler_t or pass in for all
- // tools from here for fast skipping in serial and per-cpu modes.
- std::string tracedir =
- raw2trace_directory_t::tracedir_from_rawdir(op_indir.get_value());
- if (directory_iterator_t::is_directory(tracedir)) {
- directory_iterator_t end;
- directory_iterator_t iter(tracedir);
- if (!iter) {
- error_string_ = "Failed to list directory: " + iter.error_string();
- return nullptr;
- }
- for (; iter != end; ++iter) {
- const std::string fname = *iter;
- const std::string fpath = tracedir + DIRSEP + fname;
- if (starts_with(fname, DRMEMTRACE_SERIAL_SCHEDULE_FILENAME)) {
- if (ends_with(fname, ".gz")) {
-#ifdef HAS_ZLIB
- serial_schedule_file_ =
- std::unique_ptr(new gzip_istream_t(fpath));
-#endif
- } else {
- serial_schedule_file_ = std::unique_ptr(
- new std::ifstream(fpath, std::ifstream::binary));
- }
- if (serial_schedule_file_ && !*serial_schedule_file_) {
- error_string_ = "Failed to open serial schedule file " + fpath;
- return nullptr;
- }
- } else if (fname == DRMEMTRACE_CPU_SCHEDULE_FILENAME) {
-#ifdef HAS_ZIP
- cpu_schedule_file_ =
- std::unique_ptr(new zipfile_istream_t(fpath));
-#endif
- }
- }
- }
- }
- return new invariant_checker_t(op_offline.get_value(), op_verbose.get_value(),
- op_test_mode_name.get_value(),
- serial_schedule_file_.get(), cpu_schedule_file_.get());
+ for (int i = 0; i < this->num_tools_; i++)
+ delete this->tools_[i];
+ delete[] this->tools_;
}
/* Get the path to an auxiliary file by examining
@@ -504,8 +600,10 @@ analyzer_multi_t::create_invariant_checker()
* If a trace file is provided instead of a trace directory, it searches in the
* directory which contains the trace file.
*/
+template
std::string
-analyzer_multi_t::get_aux_file_path(std::string option_val, std::string default_filename)
+analyzer_multi_tmpl_t::get_aux_file_path(
+ std::string option_val, std::string default_filename)
{
std::string file_path;
if (!option_val.empty())
@@ -543,8 +641,9 @@ analyzer_multi_t::get_aux_file_path(std::string option_val, std::string default_
return file_path;
}
+template
std::string
-analyzer_multi_t::get_module_file_path()
+analyzer_multi_tmpl_t::get_module_file_path()
{
return get_aux_file_path(op_module_file.get_value(), DRMEMTRACE_MODULE_LIST_FILENAME);
}
@@ -552,8 +651,9 @@ analyzer_multi_t::get_module_file_path()
/* Get the cache simulator knobs used by the cache simulator
* and the cache miss analyzer.
*/
+template
cache_simulator_knobs_t *
-analyzer_multi_t::get_cache_simulator_knobs()
+analyzer_multi_tmpl_t::get_cache_simulator_knobs()
{
cache_simulator_knobs_t *knobs = new cache_simulator_knobs_t;
knobs->num_cores = op_num_cores.get_value();
@@ -578,5 +678,9 @@ analyzer_multi_t::get_cache_simulator_knobs()
return knobs;
}
+template class analyzer_multi_tmpl_t;
+template class analyzer_multi_tmpl_t;
+
} // namespace drmemtrace
} // namespace dynamorio
diff --git a/clients/drcachesim/analyzer_multi.h b/clients/drcachesim/analyzer_multi.h
index 5d8e5068c42..4699e09c4ea 100644
--- a/clients/drcachesim/analyzer_multi.h
+++ b/clients/drcachesim/analyzer_multi.h
@@ -1,5 +1,5 @@
/* **********************************************************
- * Copyright (c) 2016-2023 Google, Inc. All rights reserved.
+ * Copyright (c) 2016-2024 Google, Inc. All rights reserved.
* **********************************************************/
/*
@@ -45,15 +45,18 @@
namespace dynamorio {
namespace drmemtrace {
-class analyzer_multi_t : public analyzer_t {
+template
+class analyzer_multi_tmpl_t : public analyzer_tmpl_t {
public:
// Usage: errors encountered during the constructor will set a flag that should
// be queried via operator!.
- analyzer_multi_t();
- virtual ~analyzer_multi_t();
+ analyzer_multi_tmpl_t();
+ virtual ~analyzer_multi_tmpl_t();
protected:
- scheduler_t::scheduler_options_t
+ typedef scheduler_tmpl_t sched_type_t;
+
+ typename scheduler_tmpl_t::scheduler_options_t
init_dynamic_schedule();
bool
create_analysis_tools();
@@ -62,13 +65,19 @@ class analyzer_multi_t : public analyzer_t {
void
destroy_analysis_tools();
- analysis_tool_t *
+ std::unique_ptr
+ create_ipc_reader(const char *name, int verbose);
+
+ std::unique_ptr
+ create_ipc_reader_end();
+
+ analysis_tool_tmpl_t *
create_analysis_tool_from_options(const std::string &type);
- analysis_tool_t *
+ analysis_tool_tmpl_t *
create_external_tool(const std::string &id);
- analysis_tool_t *
+ analysis_tool_tmpl_t *
create_invariant_checker();
std::string
@@ -96,6 +105,11 @@ class analyzer_multi_t : public analyzer_t {
static const int max_num_tools_ = 8;
};
+typedef analyzer_multi_tmpl_t analyzer_multi_t;
+
+typedef analyzer_multi_tmpl_t
+ record_analyzer_multi_t;
+
} // namespace drmemtrace
} // namespace dynamorio
diff --git a/clients/drcachesim/common/memtrace_stream.h b/clients/drcachesim/common/memtrace_stream.h
index ac187ff352d..23e4d3af274 100644
--- a/clients/drcachesim/common/memtrace_stream.h
+++ b/clients/drcachesim/common/memtrace_stream.h
@@ -1,5 +1,5 @@
/* **********************************************************
- * Copyright (c) 2022-2023 Google, Inc. All rights reserved.
+ * Copyright (c) 2022-2024 Google, Inc. All rights reserved.
* **********************************************************/
/*
@@ -47,6 +47,7 @@
#include
#include
+#include
/**
* @file drmemtrace/memtrace_stream.h
@@ -155,10 +156,25 @@ class memtrace_stream_t {
return false;
}
+ /**
+ * Returns the 0-based ordinal for the current shard. For parallel analysis,
+ * this equals the \p shard_index passed to parallel_shard_init_stream().
+ * This is more useful for serial modes where there is no other convenience mechanism
+ * to determine such an index; it allows a tool to compute per-shard results even in
+ * serial mode. The shard orderings in serial mode may not always mach the ordering
+ * in parallel mode. If not implemented, -1 is returned.
+ */
+ virtual int
+ get_shard_index() const
+ {
+ return -1;
+ }
+
/**
* Returns a unique identifier for the current "output cpu". Generally this only
* applies when using #SHARD_BY_CORE. For dynamic schedules, the identifier is
- * typically an output cpu ordinal. For replaying an as-traced schedule, the
+ * typically an output cpu ordinal equal to get_shard_index(). For replaying an
+ * as-traced schedule, the
* identifier is typically the original input cpu which is now mapped directly
* to this output. If not implemented for the current mode, -1 is returned.
*/
@@ -192,6 +208,17 @@ class memtrace_stream_t {
return -1;
}
+ /**
+ * Returns the thread identifier for the current input trace.
+ * This is a convenience method for use in parallel_shard_init_stream()
+ * prior to access to any #memref_t records.
+ */
+ virtual int64_t
+ get_tid() const
+ {
+ return -1;
+ }
+
/**
* Returns the stream interface for the current input trace. This differs from
* "this" for #SHARD_BY_CORE where multiple inputs are interleaved on one
@@ -203,6 +230,16 @@ class memtrace_stream_t {
{
return nullptr;
}
+
+ /**
+ * Returns whether the current record is from a part of the trace corresponding
+ * to kernel execution.
+ */
+ virtual bool
+ is_record_kernel() const
+ {
+ return false;
+ }
};
/**
@@ -274,8 +311,53 @@ class default_memtrace_stream_t : public memtrace_stream_t {
return 0;
}
+ void
+ set_output_cpuid(int64_t cpuid)
+ {
+ cpuid_ = cpuid;
+ }
+ int64_t
+ get_output_cpuid() const override
+ {
+ return cpuid_;
+ }
+ void
+ set_shard_index(int index)
+ {
+ shard_ = index;
+ }
+ int
+ get_shard_index() const override
+ {
+ return shard_;
+ }
+ // Also sets the shard index to the dynamic-discovery-order tid ordinal.
+ void
+ set_tid(int64_t tid)
+ {
+ tid_ = tid;
+ auto exists = tid2shard_.find(tid);
+ if (exists == tid2shard_.end()) {
+ int index = static_cast(tid2shard_.size());
+ tid2shard_[tid] = index;
+ set_shard_index(index);
+ } else {
+ set_shard_index(exists->second);
+ }
+ }
+ int64_t
+ get_tid() const override
+ {
+ return tid_;
+ }
+
private:
- uint64_t *record_ordinal_;
+ uint64_t *record_ordinal_ = nullptr;
+ int64_t cpuid_ = 0;
+ int shard_ = 0;
+ int64_t tid_ = 0;
+ // To let a test set just the tid and get a shard index for free.
+ std::unordered_map tid2shard_;
};
} // namespace drmemtrace
diff --git a/clients/drcachesim/common/options.cpp b/clients/drcachesim/common/options.cpp
index 36b3f9dd994..5bdee7aef05 100644
--- a/clients/drcachesim/common/options.cpp
+++ b/clients/drcachesim/common/options.cpp
@@ -1,5 +1,5 @@
/* **********************************************************
- * Copyright (c) 2015-2023 Google, Inc. All rights reserved.
+ * Copyright (c) 2015-2024 Google, Inc. All rights reserved.
* **********************************************************/
/*
@@ -35,6 +35,7 @@
#include "options.h"
#include
+#include
#include
#include "dr_api.h" // For IF_X86_ELSE.
@@ -289,8 +290,9 @@ droption_t op_cpu_scheduling(
"round-robin fashion. This option causes the scheduler to instead use the recorded "
"cpu that each thread executed on (at a granularity of the trace buffer size) "
"for scheduling, mapping traced cpu's to cores and running each segment of each "
- "thread "
- "on the core that owns the recorded cpu for that segment.");
+ "thread on the core that owns the recorded cpu for that segment. "
+ "This option is not supported with -core_serial; use "
+ "-cpu_schedule_file with -core_serial instead.");
droption_t op_max_trace_size(
DROPTION_SCOPE_CLIENT, "max_trace_size", 0,
@@ -456,13 +458,17 @@ droption_t
"Specifies the replacement policy for TLBs. "
"Supported policies: LFU (Least Frequently Used).");
+// TODO i#6660: Add "-tool" alias as these are not all "simulators".
droption_t
op_simulator_type(DROPTION_SCOPE_FRONTEND, "simulator_type", CPU_CACHE,
- "Specifies the types of simulators, separated by a colon (\":\").",
+ "Specifies which trace analysis tool(s) to run. Multiple tools "
+ "can be specified, separated by a colon (\":\").",
"Predefined types: " CPU_CACHE ", " MISS_ANALYZER ", " TLB
", " REUSE_DIST ", " REUSE_TIME ", " HISTOGRAM ", " BASIC_COUNTS
- ", " INVARIANT_CHECKER ", or " SCHEDULE_STATS
- ". The external types: name of a tool identified by a "
+ ", " INVARIANT_CHECKER ", " SCHEDULE_STATS ", or " RECORD_FILTER
+ ". The " RECORD_FILTER " tool cannot be combined with the others "
+ "as it operates on raw disk records. "
+ "To invoke an external tool: specify its name as identified by a "
"name.drcachesim config file in the DR tools directory.");
droption_t op_verbose(DROPTION_SCOPE_ALL, "verbose", 0, 0, 64,
@@ -520,7 +526,17 @@ droption_t op_interval_microseconds(
"Enable periodic heartbeats for intervals of given microseconds in the trace.",
"Desired length of each trace interval, defined in microseconds of trace time. "
"Trace intervals are measured using the TRACE_MARKER_TYPE_TIMESTAMP marker values. "
- "If set, analysis tools receive a callback at the end of each interval.");
+ "If set, analysis tools receive a callback at the end of each interval, and one "
+ "at the end of trace analysis to print the whole-trace interval results.");
+
+droption_t op_interval_instr_count(
+ DROPTION_SCOPE_FRONTEND, "interval_instr_count", 0,
+ "Enable periodic heartbeats for intervals of given per-shard instr count. ",
+ "Desired length of each trace interval, defined in instr count of each shard. "
+ "With -parallel, this does not support whole trace intervals, only per-shard "
+ "intervals. If set, analysis tools receive a callback at the end of each interval, "
+ "and separate callbacks per shard at the end of trace analysis to print each "
+ "shard's interval results.");
droption_t
op_only_thread(DROPTION_SCOPE_FRONTEND, "only_thread", 0,
@@ -888,6 +904,21 @@ droption_t
"Applies to -core_sharded and -core_serial. "
"Path with stored as-traced schedule for replay.");
#endif
+droption_t op_sched_switch_file(
+ DROPTION_SCOPE_FRONTEND, "sched_switch_file", "",
+ "Path to file holding context switch sequences",
+ "Applies to -core_sharded and -core_serial. Path to file holding context switch "
+ "sequences. The file can contain multiple sequences each with regular trace headers "
+ "and the sequence proper bracketed by TRACE_MARKER_TYPE_CONTEXT_SWITCH_START and "
+ "TRACE_MARKER_TYPE_CONTEXT_SWITCH_END markers.");
+
+droption_t op_sched_randomize(
+ DROPTION_SCOPE_FRONTEND, "sched_randomize", false,
+ "Pick next inputs randomly on context switches",
+ "Applies to -core_sharded and -core_serial. Disables the normal methods of "
+ "choosing the next input based on priority, timestamps (if -sched_order_time is "
+ "set), and FIFO order and instead selects the next input randomly. "
+ "This is intended for experimental use in sensitivity studies.");
// Schedule_stats options.
droption_t
@@ -902,5 +933,48 @@ droption_t op_syscall_template_file(
"If set, system call traces will be injected from the file "
"into the resulting trace.");
+// Record filter options.
+droption_t op_filter_stop_timestamp(
+ DROPTION_SCOPE_FRONTEND, "filter_stop_timestamp", 0, 0,
+ // Wrap max in parens to work around Visual Studio compiler issues with the
+ // max macro (even despite NOMINMAX defined above).
+ (std::numeric_limits::max)(),
+ "Timestamp (in us) in the trace when to stop filtering.",
+ "Record filtering will be disabled (everything will be output) "
+ "when the tool sees a TRACE_MARKER_TYPE_TIMESTAMP marker with "
+ "timestamp greater than the specified value.");
+
+droption_t op_filter_cache_size(
+ DROPTION_SCOPE_FRONTEND, "filter_cache_size", 0,
+ "Enable data cache filter with given size (in bytes).",
+ "Enable data cache filter with given size (in bytes), with 64 byte "
+ "line size and a direct mapped LRU cache.");
+
+droption_t
+ op_filter_trace_types(DROPTION_SCOPE_FRONTEND, "filter_trace_types", "",
+ "Comma-separated integers for trace types to remove.",
+ "Comma-separated integers for trace types to remove. "
+ "See trace_type_t for the list of trace entry types.");
+
+droption_t
+ op_filter_marker_types(DROPTION_SCOPE_FRONTEND, "filter_marker_types", "",
+ "Comma-separated integers for marker types to remove.",
+ "Comma-separated integers for marker types to remove. "
+ "See trace_marker_type_t for the list of marker types.");
+
+droption_t op_trim_before_timestamp(
+ DROPTION_SCOPE_ALL, "trim_before_timestamp", 0, 0,
+ (std::numeric_limits::max)(),
+ "Trim records until this timestamp (in us) in the trace.",
+ "Removes all records (after headers) before the first TRACE_MARKER_TYPE_TIMESTAMP "
+ "marker in the trace with timestamp greater than or equal to the specified value.");
+
+droption_t op_trim_after_timestamp(
+ DROPTION_SCOPE_ALL, "trim_after_timestamp", (std::numeric_limits::max)(), 0,
+ (std::numeric_limits::max)(),
+ "Trim records after this timestamp (in us) in the trace.",
+ "Removes all records from the first TRACE_MARKER_TYPE_TIMESTAMP marker with "
+ "timestamp larger than the specified value.");
+
} // namespace drmemtrace
} // namespace dynamorio
diff --git a/clients/drcachesim/common/options.h b/clients/drcachesim/common/options.h
index dbf1c57ca47..13b8b4268f3 100644
--- a/clients/drcachesim/common/options.h
+++ b/clients/drcachesim/common/options.h
@@ -1,5 +1,5 @@
/* **********************************************************
- * Copyright (c) 2015-2023 Google, Inc. All rights reserved.
+ * Copyright (c) 2015-2024 Google, Inc. All rights reserved.
* **********************************************************/
/*
@@ -36,6 +36,9 @@
#define _OPTIONS_H_ 1
// Tool names (for -simulator_type option).
+// TODO i#6660: When we add "-tool", add "cache_simulator" or "drcachesim"
+// instead of just "-tool cache". Ditto for "TLB".
+#define CPU_CACHE "cache"
#define MISS_ANALYZER "miss_analyzer"
#define TLB "TLB"
#define HISTOGRAM "histogram"
@@ -48,6 +51,7 @@
#define FUNC_VIEW "func_view"
#define INVARIANT_CHECKER "invariant_checker"
#define SCHEDULE_STATS "schedule_stats"
+#define RECORD_FILTER "record_filter"
// Constants used by specific tools.
#define REPLACE_POLICY_NON_SPECIFIED ""
@@ -56,7 +60,6 @@
#define REPLACE_POLICY_FIFO "FIFO"
#define PREFETCH_POLICY_NEXTLINE "nextline"
#define PREFETCH_POLICY_NONE "none"
-#define CPU_CACHE "cache"
#define CACHE_TYPE_INSTRUCTION "instruction"
#define CACHE_TYPE_DATA "data"
#define CACHE_TYPE_UNIFIED "unified"
@@ -158,6 +161,8 @@ extern dynamorio::droption::droption_t op_tracer_alt;
extern dynamorio::droption::droption_t op_tracer_ops;
extern dynamorio::droption::droption_t
op_interval_microseconds;
+extern dynamorio::droption::droption_t
+ op_interval_instr_count;
extern dynamorio::droption::droption_t op_only_thread;
extern dynamorio::droption::droption_t op_skip_instrs;
extern dynamorio::droption::droption_t op_skip_refs;
@@ -200,8 +205,16 @@ extern dynamorio::droption::droption_t op_record_file;
extern dynamorio::droption::droption_t op_replay_file;
extern dynamorio::droption::droption_t op_cpu_schedule_file;
#endif
+extern dynamorio::droption::droption_t op_sched_switch_file;
+extern dynamorio::droption::droption_t op_sched_randomize;
extern dynamorio::droption::droption_t op_schedule_stats_print_every;
extern dynamorio::droption::droption_t op_syscall_template_file;
+extern dynamorio::droption::droption_t op_filter_stop_timestamp;
+extern dynamorio::droption::droption_t op_filter_cache_size;
+extern dynamorio::droption::droption_t op_filter_trace_types;
+extern dynamorio::droption::droption_t op_filter_marker_types;
+extern dynamorio::droption::droption_t op_trim_before_timestamp;
+extern dynamorio::droption::droption_t op_trim_after_timestamp;
} // namespace drmemtrace
} // namespace dynamorio
diff --git a/clients/drcachesim/common/trace_entry.h b/clients/drcachesim/common/trace_entry.h
index 63cadfcb5cb..342ebd8b252 100644
--- a/clients/drcachesim/common/trace_entry.h
+++ b/clients/drcachesim/common/trace_entry.h
@@ -1,5 +1,5 @@
/* **********************************************************
- * Copyright (c) 2015-2023 Google, Inc. All rights reserved.
+ * Copyright (c) 2015-2024 Google, Inc. All rights reserved.
* **********************************************************/
/*
@@ -598,6 +598,35 @@ typedef enum {
*/
TRACE_MARKER_TYPE_CORE_IDLE,
+ /**
+ * Indicates a point in the trace where context switch's kernel trace starts.
+ * The value of the marker is set to the switch type enum value from
+ * #dynamorio::drmemtrace::scheduler_tmpl_t::switch_type_t.
+ */
+ TRACE_MARKER_TYPE_CONTEXT_SWITCH_START,
+
+ /**
+ * Indicates a point in the trace where a context switch's kernel trace ends.
+ * The value of the marker is set to the switch type enum value from
+ * #dynamorio::drmemtrace::scheduler_tmpl_t::switch_type_t.
+ */
+ TRACE_MARKER_TYPE_CONTEXT_SWITCH_END,
+
+ /**
+ * This marker's value is the current thread's vector length in bytes, for
+ * architectures with a dynamic vector length. It is currently only used on AArch64.
+ *
+ * On AArch64 the marker's value contains the SVE vector length. The marker is
+ * emitted with the thread header to establish the initial vector length for that
+ * thread. In the future it will also be emitted later in the trace if the app
+ * changes the vector length at runtime (TODO i#6625). In all cases the vector
+ * length value is specific to the current thread.
+ * The vector length affects how some SVE instructions are decoded so any tools which
+ * decode instructions should clear any cached data and set the vector length used by
+ * the decoder using dr_set_sve_vector_length().
+ */
+ TRACE_MARKER_TYPE_VECTOR_LENGTH,
+
// ...
// These values are reserved for future built-in marker types.
// ...
@@ -894,11 +923,11 @@ typedef enum {
*/
OFFLINE_FILE_TYPE_BLOCKING_SYSCALLS = 0x800,
/**
- * Kernel traces of syscalls are included.
- * The included kernel traces are provided either by the -syscall_template_file to
- * raw2trace (see #OFFLINE_FILE_TYPE_KERNEL_SYSCALL_TRACE_TEMPLATES), or on x86 using
- * the -enable_kernel_tracing option that uses Intel® Processor Trace to collect a
- * trace for system call execution.
+ * Kernel traces (both instructions and memory addresses) of syscalls are included. If
+ * only kernel instructions are included the file type is
+ * #OFFLINE_FILE_TYPE_KERNEL_SYSCALL_INSTR_ONLY instead. The included kernel traces
+ * are provided by the -syscall_template_file to raw2trace (see
+ * #OFFLINE_FILE_TYPE_KERNEL_SYSCALL_TRACE_TEMPLATES).
*/
OFFLINE_FILE_TYPE_KERNEL_SYSCALLS = 0x1000,
/**
@@ -925,6 +954,19 @@ typedef enum {
* the future.
*/
OFFLINE_FILE_TYPE_KERNEL_SYSCALL_TRACE_TEMPLATES = 0x4000,
+ /**
+ * Kernel instruction traces of syscalls are included. When memory addresses are
+ * also included for kernel execution, the file type is
+ * #OFFLINE_FILE_TYPE_KERNEL_SYSCALLS instead.
+ * On x86, the kernel trace is enabled by the -enable_kernel_tracing option that
+ * uses Intel® Processor Trace to collect an instruction trace for system call
+ * execution.
+ */
+ OFFLINE_FILE_TYPE_KERNEL_SYSCALL_INSTR_ONLY = 0x8000,
+ /**
+ * Each trace shard represents one core and contains interleaved software threads.
+ */
+ OFFLINE_FILE_TYPE_CORE_SHARDED = 0x10000,
} offline_file_type_t;
static inline const char *
diff --git a/clients/drcachesim/common/utils.h b/clients/drcachesim/common/utils.h
index 7bc5bec97ff..f49832443ce 100644
--- a/clients/drcachesim/common/utils.h
+++ b/clients/drcachesim/common/utils.h
@@ -1,5 +1,5 @@
/* **********************************************************
- * Copyright (c) 2015-2023 Google, Inc. All rights reserved.
+ * Copyright (c) 2015-2024 Google, Inc. All rights reserved.
* **********************************************************/
/*
@@ -35,17 +35,34 @@
#ifndef _UTILS_H_
#define _UTILS_H_ 1
+#include
#include
#include
#include
#include
#include
+#if defined(_WIN32) || defined(_WIN64) || defined(WINDOWS)
+# define WIN32_LEAN_AND_MEAN
+# define UNICODE // For Windows headers.
+# define _UNICODE // For C headers.
+# define NOMINMAX // Avoid windows.h messing up std::min.
+# include
+#else
+# include
+#endif
+
namespace dynamorio {
namespace drmemtrace {
// XXX: DR should export this
#define INVALID_THREAD_ID 0
+// We avoid collisions with DR's INVALID_PROCESS_ID by using our own name.
+#define INVALID_PID -1
+// A separate sentinel for an idle core with no software thread.
+// XXX i#6703: Export this in scheduler.h as part of its API when we have
+// the scheduler insert synthetic headers.
+#define IDLE_THREAD_ID -1
// XXX: perhaps we should use a C++-ish stream approach instead
// This cannot be named ERROR as that conflicts with Windows headers.
@@ -182,6 +199,27 @@ split_by(std::string s, const std::string &sep)
return vec;
}
+// Returns a timestamp with at least microsecond granularity.
+// On UNIX this is an absolute timestamp; but on Windows where we had
+// trouble with the GetSystemTime* functions not being granular enough
+// it's the timestamp counter from the processor.
+// (We avoid dr_get_microseconds() because not all targets link
+// in the DR library.)
+static inline uint64_t
+get_microsecond_timestamp()
+{
+#if defined(_WIN32) || defined(_WIN64) || defined(WINDOWS)
+ uint64_t res;
+ QueryPerformanceCounter((LARGE_INTEGER *)&res);
+ return res;
+#else
+ struct timeval time;
+ if (gettimeofday(&time, nullptr) != 0)
+ return 0;
+ return time.tv_sec * 1000000 + time.tv_usec;
+#endif
+}
+
} // namespace drmemtrace
} // namespace dynamorio
diff --git a/clients/drcachesim/docs/drcachesim.dox.in b/clients/drcachesim/docs/drcachesim.dox.in
index 447fd74764f..5f1a6751a6c 100644
--- a/clients/drcachesim/docs/drcachesim.dox.in
+++ b/clients/drcachesim/docs/drcachesim.dox.in
@@ -1,5 +1,5 @@
/* **********************************************************
- * Copyright (c) 2015-2023 Google, Inc. All rights reserved.
+ * Copyright (c) 2015-2024 Google, Inc. All rights reserved.
* **********************************************************/
/*
@@ -125,7 +125,11 @@ using the drdecode decoder or any other decoder. An additional field
information should be invalidated due to possibly changed application
code. (For online traces, encodings are not provided unless the
option `-instr_encodings` is passed, as encodings add overhead and
-are not needed for many tools.)
+are not needed for many tools.) Cached decoding information might also
+need to be discarded if there is a
+#dynamorio::drmemtrace::TRACE_MARKER_TYPE_VECTOR_LENGTH marker entry
+indicating a change of vector length on architectures such as AArch64
+which have a dynamic vector length.
Older legacy traces may not contain instruction encodings. For those
traces, encodings for static code can be obtained by
@@ -272,6 +276,7 @@ tools can also be created, as described in \ref sec_drcachesim_newtool.
- \ref sec_tool_histogram
- \ref sec_tool_invariant_checker
- \ref sec_tool_syscall_mix
+- \ref sec_tool_record_filter
\section sec_tool_cache_sim Cache Simulator
@@ -890,6 +895,52 @@ Syscall mix tool results:
1 : 273
\endcode
+\section sec_tool_record_filter Record Filter
+
+The record filter tool modifies a target trace. It contains several varieties of
+filters which selectively remove records from the tool. The filters currently provided
+include:
+
+- Removing records of types specified by the -filter_trace_types option.
+ The types are identified by their #dynamorio::drmemtrace::trace_type_t
+ enum numeric value.
+
+- Remove marker records of marker types specified by the -filter_marker_types option.
+ The types are identified by their #dynamorio::drmemtrace::trace_marker_type_t
+ enum numeric value.
+
+- Running a simple data cache filter and removing hits. The cache is enbabled
+ and its size specified by the -filter_cache_size option.
+
+- Trimming the start (via -trim_before_timestamp) and end (via -trim_after_timestamp)
+ of a trace. Any now-empty shards are deleted entirely.
+
+A filter can be applied only to the start of a trace using the -filter_stop_timestamp
+option.
+
+Example of removing function markers:
+
+\code
+$ bin64/drrun -t drcachesim -indir mytracedir -simulator_type basic_counts
+...
+ 9009 total function id markers
+ 5006 total function return address markers
+ 6007 total function argument markers
+ 4003 total function return value markers
+...
+
+$ bin64/drrun -t drcachesim -simulator_type record_filter -filter_marker_types 4,5,6,7 -indir mytracedir -outdir newdir
+Output 1280800 entries from 1304825 entries.
+
+$ bin64/drrun -t drcachesim -indir newdir -simulator_type basic_counts
+...
+ 0 total function id markers
+ 0 total function return address markers
+ 0 total function argument markers
+ 0 total function return value markers
+...
+\endcode
+
****************************************************************************
\page google_workload_traces Google Workload Traces
diff --git a/clients/drcachesim/drpt2trace/ir2trace.cpp b/clients/drcachesim/drpt2trace/ir2trace.cpp
index 57e27e931d1..9cf3a33f140 100644
--- a/clients/drcachesim/drpt2trace/ir2trace.cpp
+++ b/clients/drcachesim/drpt2trace/ir2trace.cpp
@@ -32,6 +32,12 @@
#include "ir2trace.h"
#include "dr_api.h"
+#include "drir.h"
+#include "trace_entry.h"
+
+#include
+#include
+#include