Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#6662 public traces, part 1: synthetic ISA #6691

Merged
merged 95 commits into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from 91 commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
f9bebe0
i#6662 synthetic ISA: encoding
edeiana Feb 28, 2024
4b70a9d
Forgot to include encode header for synthetic ISA.
edeiana Feb 28, 2024
d301f3e
Added synthetic encoding header to CMakeList.
edeiana Feb 29, 2024
a0bbf6f
Added initial implementation of synthetic encoding/decoding.
edeiana Mar 6, 2024
8b29dcc
Using x86 encode/decode for synthetic encoding/decoding
edeiana Mar 6, 2024
d2115e6
Added test for synthetic encoding/decoding.
edeiana Mar 6, 2024
44c4c3c
Merge branch 'master' into i6662-synthetic-isa
edeiana Mar 6, 2024
246003c
Added encoding/deconding of operands, although virtual registers
edeiana Mar 15, 2024
85ac64d
Fixed description of synthetic ISA mode.
edeiana Mar 15, 2024
79b736b
Updated uses of spelled out DR_ISA_SYNTHETIC.
edeiana Mar 15, 2024
9e67807
Merge remote-tracking branch 'origin/master' into i6662-synthetic-isa
edeiana Mar 15, 2024
1e8a686
Updated synthetic ISA enum use.
edeiana Mar 15, 2024
8bc947a
Updated test for synthetic ISA encoding/decoding.
edeiana Mar 15, 2024
df82fcf
Now returning next instruction's PC after synthetic ISA
edeiana Mar 15, 2024
21e8d77
Improved description in comment.
edeiana Mar 15, 2024
b3cac08
Fixed building error.
edeiana Mar 15, 2024
89e80d7
clang-format.
edeiana Mar 15, 2024
b4b1fd8
Addressed memory alignment, eflags, constants, description of encoding.
edeiana Mar 18, 2024
8df36c1
Fixed minosr issue in Synthetic ISA test: merged lines.
edeiana Mar 18, 2024
dc5f750
Setting instr_t length according to the size of the synthetic
edeiana Mar 20, 2024
8400177
Now considering registers in destination operands that are memory
edeiana Mar 20, 2024
d346c27
Improved test.
edeiana Mar 20, 2024
8d63bbf
Check for instr_t synthetic ISA moved to instr_encode()
edeiana Mar 20, 2024
4787fad
We want to minimize the impact of synthetic ISA.
edeiana Mar 20, 2024
978ed90
Now setting arithetic, operand, and raw bits flags valid after decoding.
edeiana Mar 20, 2024
efc6cf2
We don't need to temporarly change instr_t ISA mode to retrieve flags
edeiana Mar 20, 2024
46bacdb
Documented that we use instr_t ISA mode for decoding when
edeiana Mar 20, 2024
9c96008
Now checking flags, instr length, and PC.
edeiana Mar 20, 2024
615d55f
Fixed "warning C4018: '<': signed/unsigned mismatch" treated
edeiana Mar 20, 2024
df3c68d
Fixed __attribute__ extention not supported by our windows compiler.
edeiana Mar 20, 2024
dd5a979
Whoops, forgot to zero-out the first 4 bytes before encoding.
edeiana Mar 20, 2024
d4f1b3e
Improved comments.
edeiana Mar 21, 2024
7a95872
Now using dcontext ISA mode for decoding in x86.
edeiana Mar 21, 2024
806a72c
Moved encode_to_synth() from instr_encode() to instr_encode_to_copy().
edeiana Mar 21, 2024
e13b25b
Using local variable for header bytes of encoding, instead of
edeiana Mar 21, 2024
ef8560c
Improved description.
edeiana Mar 21, 2024
ab65804
Addressed PR comments.
edeiana Mar 21, 2024
b1bd6d5
Fixed grammar.
edeiana Mar 21, 2024
23995d3
Added decode_from_synth() into decode() of every architecture.
edeiana Mar 22, 2024
b1f844a
Removed synthetic ISA related tests from ir_x86.c set of tests.
edeiana Mar 22, 2024
ce3c5e1
Fixed AARCH64 and RISCV64 tests.
edeiana Mar 22, 2024
0e318dd
Added DR_ISA_SYNTHETIC to is_isa_mode_legal() for all arches.
edeiana Mar 22, 2024
224fadf
Removed decode_from_synth() from decode_from_copy(), we only want it in
edeiana Mar 22, 2024
ddf7b10
Now allowing DR_ISA_SYNTHETIC for all arches in instr_set_isa_mode().
edeiana Mar 22, 2024
363ef2e
Fixed ARM 32 test.
edeiana Mar 22, 2024
c29f344
Addressed misc (mostly minor) review comments.
edeiana Mar 23, 2024
1a2a6b1
Mapping sub-registers to their canonical register (i.e., the largest …
edeiana Mar 24, 2024
035d014
Fixed register size encoding.
edeiana Mar 24, 2024
6d4d4cf
Check if src or dst operands are present before looping through
edeiana Mar 24, 2024
96133f8
Biggest encoded synthetis instruction is now 20 bytes.
edeiana Mar 24, 2024
91981df
Fixed offset when encoding/decoding src operands.
edeiana Mar 24, 2024
bc22a59
Moved decode_from_synth() from decode() to decode_common() for all
edeiana Mar 26, 2024
2856ab0
Now we only have a single operation size field (1 byte encoding).
edeiana Mar 26, 2024
70f6942
Fix building errors for aarch64 and riscv64.
edeiana Mar 26, 2024
4caf3aa
Fix failure on AARCH64.
edeiana Mar 27, 2024
9bc4a80
Fix building error (typo).
edeiana Mar 27, 2024
f88f9b2
Renamed DR_ISA_SYNTHETIC to DR_ISA_REGDEPS.
edeiana Mar 27, 2024
a9ff2d5
Forgot one DR_ISA_SYNTHETIC in aarch64. Fixed.
edeiana Mar 27, 2024
5cd8856
Added instr_convert_to_isa_regdeps() API to translate an instr_t
edeiana Mar 29, 2024
cc34ec5
Added new API to release doc.
edeiana Mar 29, 2024
87d60ab
Updated tests.
edeiana Mar 29, 2024
ef7554c
Minor cleanup.
edeiana Mar 29, 2024
9e7c7f8
Merge branch 'master' into i6662-synthetic-isa
edeiana Mar 29, 2024
3d0106f
clang-format-14 run.
edeiana Mar 29, 2024
4553779
Fixed warning as error on windows.
edeiana Mar 29, 2024
55f737c
Added #mem_ops (i.e., loads + stores) to encoding.
edeiana Mar 31, 2024
f05c367
Added a couple more x86 instructions to test with implicit
edeiana Mar 31, 2024
2374332
Increased size of array containing encoding to max encoding size of 16
edeiana Mar 31, 2024
1b01ed5
Reverted back to no #mem_ops.
edeiana Apr 2, 2024
28b779a
Added DR_ISA_REGDEPS description.
edeiana Apr 4, 2024
c284c6b
Changed convert_to_isa_regdeps() signature.
edeiana Apr 4, 2024
9d785e1
Removed unnecessary cast to (void *).
edeiana Apr 4, 2024
a7149e9
Renaming: from synthetic to isa_regdeps.
edeiana Apr 5, 2024
547d39b
Renaming: encoding/decoding functions using isa_regdeps name.
edeiana Apr 5, 2024
2a0ada8
Added operation_size, modified encoding/decoding accordingly.
edeiana Apr 6, 2024
0b58ad7
Improved description of encoding scheme and its readability.
edeiana Apr 6, 2024
e67df8a
Improved test.
edeiana Apr 6, 2024
d41e21e
Updated documentation.
edeiana Apr 7, 2024
8db9a06
Fixed typo in documentation.
edeiana Apr 7, 2024
4615d17
clang-format-14 run.
edeiana Apr 7, 2024
8d08c07
clang-format-14 run, again.
edeiana Apr 7, 2024
cefaba5
Added encode + decode of instructions generated by INSTR_CREATE_.
edeiana Apr 7, 2024
2b0ad5e
Removed INSTR_CREATE_adr from tests for aarch64 because of i#4847.
edeiana Apr 7, 2024
d125010
Removed setting of instr_t.length, we don't use it anymore,
edeiana Apr 7, 2024
be82357
Updated test's expected output.
edeiana Apr 7, 2024
e639309
Added instr_encode_common() with is now the common routine interposed
edeiana Apr 8, 2024
b29d8a1
Added valid encoding to bytes field of instr_t when converting and
edeiana Apr 9, 2024
74c681f
Formatting fixed.
edeiana Apr 9, 2024
b931392
Addressed PR feedback.
edeiana Apr 9, 2024
1340b90
Fromatting fixed.
edeiana Apr 9, 2024
0d57ec9
Removed unnecessary header.
edeiana Apr 9, 2024
1c828b8
Improved comments and code documentation (doxygen).
edeiana Apr 9, 2024
5ad7405
Merge branch 'master' into i6662-synthetic-isa
edeiana Apr 9, 2024
2aad8e0
Fixed doxygen. Cannot reference #DR_ISA_REGDEPS in a pre-comment for
edeiana Apr 9, 2024
01ef05c
Trying to fix doxygen.
edeiana Apr 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions api/docs/release.dox
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,10 @@ Further non-compatibility-affecting changes include:
is set to true by default to match the existing behavior of the invariant checker.
edeiana marked this conversation as resolved.
Show resolved Hide resolved
edeiana marked this conversation as resolved.
Show resolved Hide resolved
edeiana marked this conversation as resolved.
Show resolved Hide resolved
- Added a new instr API instr_is_xrstor() that tells whether an instruction is any
variant of the x86 xrstor opcode.
- Added a new #dr_isa_mode_t: #DR_ISA_REGDEPS, which is a synthetic ISA with the main
purpose of preserving register dependencies.
- Added instr_convert_to_isa_regdeps() API that converts an #instr_t from a real ISA
edeiana marked this conversation as resolved.
Show resolved Hide resolved
(e.g., #DR_ISA_AMD64) to the #DR_ISA_REGDEPS synthetic ISA.

**************************************************
<hr>
Expand Down
2 changes: 2 additions & 0 deletions core/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,8 @@ set(DECODER_SRCS
ir/${ARCH_NAME}/decode.c
ir/encode_shared.c
ir/${ARCH_NAME}/encode.c
ir/isa_regdeps/encode.c
ir/isa_regdeps/decode.c
ir/disassemble_shared.c
ir/${ARCH_NAME}/disassemble.c
ir/ir_utils_shared.c
Expand Down
10 changes: 10 additions & 0 deletions core/ir/aarch64/codec.c
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,11 @@

#include <stdint.h>
#include "../globals.h"
#include "../isa_regdeps/decode.h"
#include "arch.h"
#include "decode.h"
#include "disassemble.h"
#include "encode_api.h"
#include "instr.h"
#include "instr_create_shared.h"

Expand Down Expand Up @@ -9717,6 +9719,14 @@ decode_category(uint encoding, instr_t *instr)
byte *
decode_common(dcontext_t *dcontext, byte *pc, byte *orig_pc, instr_t *instr)
{
/* #DR_ISA_REGDEPS synthetic ISA has its own decoder.
* XXX i#1684: when DR can be built with full dynamic architecture selection we won't
* need to pollute the decoding of other architectures with this synthetic ISA special
* case.
*/
if (dr_get_isa_mode(dcontext) == DR_ISA_REGDEPS)
return decode_isa_regdeps(dcontext, pc, instr);

byte *next_pc = pc + 4;
uint enc = *(uint *)pc;
uint eflags = 0;
Expand Down
3 changes: 2 additions & 1 deletion core/ir/aarch64/decode.c
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
*/

#include "../globals.h"
#include "encode_api.h"
#include "instr.h"
#include "decode.h"
#include "decode_fast.h" /* ensure we export decode_next_pc, decode_sizeof */
Expand All @@ -41,7 +42,7 @@
bool
is_isa_mode_legal(dr_isa_mode_t mode)
{
return (mode == DR_ISA_ARM_A64);
return (mode == DR_ISA_ARM_A64 || mode == DR_ISA_REGDEPS);
}

app_pc
Expand Down
4 changes: 2 additions & 2 deletions core/ir/aarch64/instr.c
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@
bool
instr_set_isa_mode(instr_t *instr, dr_isa_mode_t mode)
{
if (mode != DR_ISA_ARM_A64)
if (mode != DR_ISA_ARM_A64 && mode != DR_ISA_REGDEPS)
return false;
instr->isa_mode = DR_ISA_ARM_A64;
instr->isa_mode = mode;
return true;
}

Expand Down
12 changes: 11 additions & 1 deletion core/ir/arm/decode.c
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@
*/

#include "../globals.h"
#include "../isa_regdeps/decode.h"
#include "encode_api.h"
#include "instr.h"
#include "decode.h"
#include "decode_private.h"
Expand Down Expand Up @@ -172,7 +174,7 @@ decode_in_it_block(decode_state_t *state, app_pc pc, decode_info_t *di)
bool
is_isa_mode_legal(dr_isa_mode_t mode)
{
return (mode == DR_ISA_ARM_THUMB || DR_ISA_ARM_A32);
edeiana marked this conversation as resolved.
Show resolved Hide resolved
return (mode == DR_ISA_ARM_THUMB || mode == DR_ISA_ARM_A32 || mode == DR_ISA_REGDEPS);
}

/* We need to call canonicalize_pc_target() on all next_tag-writing
Expand Down Expand Up @@ -2428,6 +2430,14 @@ decode_opcode(dcontext_t *dcontext, byte *pc, instr_t *instr)
static byte *
decode_common(dcontext_t *dcontext, byte *pc, byte *orig_pc, instr_t *instr)
{
/* #DR_ISA_REGDEPS synthetic ISA has its own decoder.
* XXX i#1684: when DR can be built with full dynamic architecture selection we won't
* need to pollute the decoding of other architectures with this synthetic ISA special
* case.
*/
if (dr_get_isa_mode(dcontext) == DR_ISA_REGDEPS)
return decode_isa_regdeps(dcontext, pc, instr);

const instr_info_t *info = &invalid_instr;
decode_info_t di;
byte *next_pc;
Expand Down
2 changes: 1 addition & 1 deletion core/ir/arm/instr.c
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
bool
instr_set_isa_mode(instr_t *instr, dr_isa_mode_t mode)
{
if (mode != DR_ISA_ARM_THUMB && mode != DR_ISA_ARM_A32) {
if (mode != DR_ISA_ARM_THUMB && mode != DR_ISA_ARM_A32 && mode != DR_ISA_REGDEPS) {
return false;
}
instr->isa_mode = mode;
Expand Down
46 changes: 46 additions & 0 deletions core/ir/encode_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,52 @@ typedef enum _dr_isa_mode_t {
DR_ISA_ARM_THUMB, /**< Thumb (ARM T32). */
DR_ISA_ARM_A64, /**< ARM A64 (AArch64). */
DR_ISA_RV64IMAFDC, /**< RISC-V (rv64imafdc). */
/**
* This is a synthetic ISA that has the purpose of
edeiana marked this conversation as resolved.
Show resolved Hide resolved
* preserving register dependencies and giving hints on the
* type of operations each instruction is performing.
*
* For this reason some operations that work on instructions
* coming from an actual ISA (e.g., #DR_ISA_AMD64) are not
* supported.
*
* Currently we support:
* - instr_convert_to_isa_regdeps(), which converts an
* #instr_t of an actual ISA to a #DR_ISA_REGDEPS
* instruction.
* - instr_encode() and instr_encode_to_copy(), to encode a
edeiana marked this conversation as resolved.
Show resolved Hide resolved
* #DR_ISA_REGDEPS #instr_t into a sequence of contiguous
* bytes.
* - decode() and decode_from_copy(), to decode an encoded
* #DR_ISA_REGDEPS instruction into an #instr_t.
*
* A #DR_ISA_REGDEPS #instr_t has the following information:
* - categories (composed by #dr_instr_category_t values),
* to indicate the type of operation performed (e.g.,load,
* store, floating point math operation, branch, etc.).
* Note that more than one category can be set.
* - arithmetic flags, with no distinction between different
* flags, we only report if at least one arithmetic flag
* was read and/or written.
* - number of source and destination register operands.
* - source operation size, which is the largest source
* operand the instruction operates on.
* - list of register operand identifiers (contained in
* #opnd_t), separated in source and destination.
* Note that these #reg_id_t identifiers are virtual and
* they should not be assumed to be equal to any DR_REG_
* enum values of any specific architecture, they are
edeiana marked this conversation as resolved.
Show resolved Hide resolved
* meant for tracking dependencies with respect to other
* #DR_ISA_REGDEPS instructions only.
* - ISA mode, which is #DR_ISA_REGDEPS.
*
* Querying additional #instr_t related information outside
* of those described above (e.g., the instruction opcode)
* will return the corresponding zeroed value set by
* instr_create() or instr_init().
*/
DR_ISA_REGDEPS,

} dr_isa_mode_t;

DR_API
Expand Down
29 changes: 24 additions & 5 deletions core/ir/encode_shared.c
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
/* encode_shared.c -- cross-platform encodingn routines */

#include "../globals.h"
#include "isa_regdeps/encode.h"
#include "arch.h"
#include "instr.h"
#include "decode.h"
Expand Down Expand Up @@ -111,28 +112,46 @@ get_encoding_info(instr_t *instr)
return info;
}

static byte *
instr_encode_common(dcontext_t *dcontext, instr_t *instr, byte *copy_pc, byte *final_pc,
bool check_reachable,
bool *has_instr_opnds /*OUT OPTIONAL*/
_IF_DEBUG(bool assert_reachable))
{
/* #DR_ISA_REGDEPS synthetic ISA has its own encoder.
* XXX i#1684: when DR can be built with full dynamic architecture selection we won't
* need to pollute the encoding of other architectures with this synthetic ISA special
* case.
*/
if (instr_get_isa_mode(instr) == DR_ISA_REGDEPS)
return encode_isa_regdeps(dcontext, instr, copy_pc);

return instr_encode_arch(dcontext, instr, copy_pc, final_pc, check_reachable,
has_instr_opnds _IF_DEBUG(assert_reachable));
}

/* completely ignores reachability and predication failures */
byte *
instr_encode_ignore_reachability(dcontext_t *dcontext, instr_t *instr, byte *pc)
{
return instr_encode_arch(dcontext, instr, pc, pc, false, NULL _IF_DEBUG(false));
return instr_encode_common(dcontext, instr, pc, pc, false, NULL _IF_DEBUG(false));
}

/* just like instr_encode but doesn't assert on reachability or predication failures */
byte *
instr_encode_check_reachability(dcontext_t *dcontext, instr_t *instr, byte *pc,
bool *has_instr_opnds /*OUT OPTIONAL*/)
{
return instr_encode_arch(dcontext, instr, pc, pc, true,
has_instr_opnds _IF_DEBUG(false));
return instr_encode_common(dcontext, instr, pc, pc, true,
has_instr_opnds _IF_DEBUG(false));
}

byte *
instr_encode_to_copy(void *drcontext, instr_t *instr, byte *copy_pc, byte *final_pc)
{
dcontext_t *dcontext = (dcontext_t *)drcontext;
return instr_encode_arch(dcontext, instr, copy_pc, final_pc, true,
NULL _IF_DEBUG(true));
return instr_encode_common(dcontext, instr, copy_pc, final_pc, true,
NULL _IF_DEBUG(true));
}

byte *
Expand Down
28 changes: 26 additions & 2 deletions core/ir/instr_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -298,10 +298,19 @@ struct _instr_t {

uint opcode;

union {
# ifdef X86
/* PR 251479: offset into instr's raw bytes of rip-relative 4-byte displacement */
byte rip_rel_pos;
/* PR 251479: offset into instr's raw bytes of rip-relative 4-byte displacement.
* This field is valid when instr_t isa_mode is DR_ISA_X86.
*/
byte rip_rel_pos;
# endif
/* PR #6691: size of source data (i.e., read) an instruction operates on.
edeiana marked this conversation as resolved.
Show resolved Hide resolved
* This field is valid when instr_t isa_mode is DR_ISA_REGDEPS.
* Note that opnd_size_t is an alias of byte.
*/
opnd_size_t operation_size;
edeiana marked this conversation as resolved.
Show resolved Hide resolved
};

/* we dynamically allocate dst and src arrays b/c x86 instrs can have
* up to 8 of each of them, but most have <=2 dsts and <=3 srcs, and we
Expand Down Expand Up @@ -2095,6 +2104,21 @@ DR_API
instr_t *
instr_convert_short_meta_jmp_to_long(void *drcontext, instrlist_t *ilist, instr_t *instr);

DR_API
/**
* Converts a real ISA (e.g., #DR_ISA_AMD64) instruction \p instr_real_isa into a
* #DR_ISA_REGDEPS instruction and stores it into \p instr_regdeps_isa.
* Assumes \p instr_regdeps_isa has been allocated by the caller (e.g., using
* instr_create()).
* Assumes \p instr_real_isa is a fully-decoded or synthesized instruction of a real ISA
* with valid operand information.
* \note \p instr_regdeps_isa will contain only the information of a #DR_ISA_REGDEPS
* synthetic instruction.
*/
void
instr_convert_to_isa_regdeps(void *drcontext, instr_t *instr_real_isa,
instr_t *instr_regdeps_isa);

DR_API
/**
* Given \p eflags, returns whether or not the conditional branch, \p
Expand Down
Loading
Loading