before_block_exec_invalidate_opt: called before execution of every basic block, with the option to invalidate the TB
Callback ID: PANDA_CB_BEFORE_BLOCK_EXEC_INVALIDATE_OPT
Arguments:
CPUState *env
: the current CPU stateTranslationBlock *tb
: the TB we are about to execute
Return value:
true
if we should invalidate the current translation block and retranslate, false
otherwise
Signature:
bool (*before_block_exec_invalidate_opt)(CPUState *env, TranslationBlock *tb);
before_block_exec: called before execution of every basic block
Callback ID: PANDA_CB_BEFORE_BLOCK_EXEC
Arguments:
CPUState *env
: the current CPU stateTranslationBlock *tb
: the TB we are about to execute
Return value:
unused
Signature:
int (*before_block_exec)(CPUState *env, TranslationBlock *tb);
after_block_exec: called after execution of every basic block
Callback ID: PANDA_CB_AFTER_BLOCK_EXEC
Arguments:
CPUState *env
: the current CPU stateTranslationBlock *tb
: the TB we just executedTranslationBlock *next_tb
: the TB we will execute next (may beNULL
)
Return value:
unused
Signature::
int (*after_block_exec)(CPUState *env, TranslationBlock *tb, TranslationBlock *next_tb);
before_block_translate: called before translation of each basic block
Callback ID: PANDA_CB_BEFORE_BLOCK_TRANSLATE
Arguments:
CPUState *env
: the current CPU statetarget_ulong pc
: the guest PC we are about to translate
Return value:
unused
Signature:
int (*before_block_translate)(CPUState *env, target_ulong pc);
after_block_translate: called after the translation of each basic block
Callback ID: PANDA_CB_AFTER_BLOCK_TRANSLATE
Arguments:
CPUState *env
: the current CPU stateTranslationBlock *tb
: the TB we just translated
Return value:
unused
Notes:
This is a good place to perform extra passes over the generated code (particularly by manipulating the LLVM code) FIXME: How would this actually work? By this point the out ASM has already been generated. Modify the IR and then regenerate?
Signature:
int (*after_block_translate)(CPUState *env, TranslationBlock *tb);
insn_translate: called before the translation of each instruction
Callback ID: PANDA_CB_INSN_TRANSLATE
Arguments:
CPUState *env
: the current CPU statetarget_ulong pc
: the guest PC we are about to translate
Return value:
true
if PANDA should insert instrumentation into the generated code,
false
otherwise
Notes:
This allows a plugin writer to instrument only a small number of
instructions, avoiding the performance hit of instrumenting everything.
If you do want to instrument every single instruction, just return
true. See the documentation for PANDA_CB_INSN_EXEC
for more detail.
Signature:
bool (*insn_translate)(CPUState *env, target_ulong pc);
insn_exec: called before execution of any instruction identified
by the PANDA_CB_INSN_TRANSLATE
callback
Callback ID: PANDA_CB_INSN_EXEC
Arguments:
CPUState *env
: the current CPU statetarget_ulong pc
: the guest PC we are about to execute
Return value:
unused
Notes:
This instrumentation is implemented by generating a call to a
helper function just before the instruction itself is generated.
This is fairly expensive, which is why it's only enabled via
the PANDA_CB_INSN_TRANSLATE
callback.
Signature:
int (*insn_exec)(CPUState *env, target_ulong pc);
guest_hypercall: called when a program inside the guest makes a hypercall to pass information from inside the guest to a plugin
Callback ID: PANDA_CB_GUEST_HYPERCALL
Arguments:
CPUState *env
: the current CPU state
Return value:
unused
Notes:
On x86, this is called whenever CPUID is executed. Plugins then check for magic values in the registers to determine if it really is a guest hypercall. Parameters can be passed in other registers. We have modified translate.c to make CPUID instructions end translation blocks. This is useful, if, for example, you want to have a hypercall that turns on LLVM and enables heavyweight instrumentation at a specific point in execution.
S2E accomplishes this by using a (currently) undefined opcode. We have instead opted to use an existing instruction to make development easier (we can use inline asm rather than defining the raw bytes).
AMD's SVM and Intel's VT define hypercalls, but they are privileged instructions, meaning the guest must be in ring 0 to execute them.
For hypercalls in ARM, we use the MCR instruction (move to coprocessor from ARM register), moving to coprocessor 7. CP 7 is reserved by ARM, and isn't implemented in QEMU. The MCR instruction is present in all versions of ARM, and it is an unprivileged instruction in this scenario. Plugins can also check for magic values in registers on ARM.
Signature:
int (*guest_hypercall)(CPUState *env);
monitor: called when someone uses the plugin_cmd
monitor command
Callback ID: PANDA_CB_MONITOR
Arguments:
Monitor *mon
: a pointer to the Monitorconst char *cmd
: the command string passed to plugin_cmd
Return value:
unused
Notes:
The command is passed as a single string. No parsing is performed
on the string before it is passed to the plugin, so each plugin
must parse the string as it deems appropriate (e.g. by using strtok
and getopt
) to do more complex option processing.
It is recommended that each plugin implementing this callback respond to the "help" message by listing the commands supported by the plugin.
Note that every loaded plugin will have the opportunity to respond to
each plugin_cmd
; thus it is a good idea to ensure that your plugin's
monitor commands are uniquely named, e.g. by using the plugin name
as a prefix (sample_do_foo
rather than do_foo
).
Signature:
int (*monitor)(Monitor *mon, const char *cmd);
virt_mem_read: called after memory is read
Callback ID: PANDA_CB_VIRT_MEM_READ
Arguments:
CPUState *env
: the current CPU statetarget_ulong pc
: the guest PC doing the readtarget_ulong addr
: the (virtual) address being readtarget_ulong size
: the size of the readvoid *buf
: pointer to the data that was read
Return value:
unused
Notes:
You must call panda_enable_memcb()
to turn on memory callbacks
before this callback will take effect.
Signature:
int (*virt_mem_read)(CPUState *env, target_ulong pc, target_ulong addr, target_ulong size, void *buf);
virt_mem_write: called before memory is written
Callback ID: PANDA_CB_VIRT_MEM_WRITE
Arguments:
CPUState *env
: the current CPU statetarget_ulong pc
: the guest PC doing the writetarget_ulong addr
: the (virtual) address being writtentarget_ulong size
: the size of the writevoid *buf
: pointer to the data that is to be written
Return value:
unused
Notes:
You must call panda_enable_memcb()
to turn on memory callbacks
before this callback will take effect.
Signature:
int (*virt_mem_write)(CPUState *env, target_ulong pc, target_ulong addr, target_ulong size, void *buf);
phys_mem_read: called after memory is read
Callback ID: PANDA_CB_PHYS_MEM_READ
Arguments:
CPUState *env
: the current CPU statetarget_ulong pc
: the guest PC doing the readtarget_ulong addr
: the (physical) address being readtarget_ulong size
: the size of the readvoid *buf
: pointer to the data that was read
Return value:
unused
Notes:
You must call panda_enable_memcb()
to turn on memory callbacks
before this callback will take effect.
Signature:
int (*phys_mem_read)(CPUState *env, target_ulong pc, target_ulong addr, target_ulong size, void *buf);
phys_mem_write: called before memory is written
Callback ID: PANDA_CB_PHYS_MEM_WRITE
Arguments:
CPUState *env
: the current CPU statetarget_ulong pc
: the guest PC doing the writetarget_ulong addr
: the (physical) address being writtentarget_ulong size
: the size of the writevoid *buf
: pointer to the data that is to be written
Return value:
unused
Notes:
You must call panda_enable_memcb()
to turn on memory callbacks
before this callback will take effect.
Signature:
int (*phys_mem_write)(CPUState *env, target_ulong pc, target_ulong addr, target_ulong size, void *buf);
cb_cpu_restore_state: Called inside of cpu_restore_state(), when there is a CPU fault/exception
Callback ID: PANDA_CB_CPU_RESTORE_STATE
Arguments:
CPUState *env
: the current CPU stateTranslationBlock *tb
: the current translation block
Return value: unused
Signature:
int (*cb_cpu_restore_state)(CPUState *env, TranslationBlock *tb);
user_before_syscall: Called before a syscall for QEMU user mode.
Callback ID: PANDA_CB_USER_BEFORE_SYSCALL
Arguments:
void *cpu_env
: pointer to CPUStatebitmask_transtbl *fcntl_flags_tbl
: syscall flags table from syscall.cint num
: syscall numberabi_long arg1..arg8
: syscall arguments
Return value: unused
Notes: Some system call arguments need some additional processing, as evident in linux-user/syscall.c. If your plugin is particularly interested in system call arguments, be sure to process them in similar ways.
Additionally, this callback is dependent on running qemu in linux-user mode, a mode for which PANDA support is being phased out. To use this callback you will need to wrap the code in #ifdefs. See the 'taint' or 'llvm_trace' PANDA plugins for examples of legacy usage. This callback will likely be removed in future versions of PANDA.
Signature:
int (*user_before_syscall)(void *cpu_env, bitmask_transtbl *fcntl_flags_tbl,
int num, abi_long arg1, abi_long arg2, abi_long
arg3, abi_long arg4, abi_long arg5,
abi_long arg6, abi_long arg7, abi_long arg8);
user_after_syscall: Called after a syscall for QEMU user mode
Callback ID: PANDA_CB_USER_AFTER_SYSCALL
Arguments:
void *cpu_env
: pointer to CPUStatebitmask_transtbl *fcntl_flags_tbl
: syscall flags table from syscall.cint num
: syscall numberabi_long arg1..arg8
: syscall argumentsvoid *p
: void pointer used for processing of some argumentsabi_long ret
: return value of syscall
Return value: unused
Notes: Some system call arguments need some additional processing, as evident in linux-user/syscall.c. If your plugin is particularly interested in system call arguments, be sure to process them in similar ways.
Additionally, this callback is dependent on running qemu in linux-user mode, a mode for which PANDA support is being phased out. To use this callback you will need to wrap the code in #ifdefs. See the 'taint' or 'llvm_trace' PANDA plugins for examples of legacy usage. This callback will likely be removed in future versions of PANDA.
Signature:
int (*user_after_syscall)(void *cpu_env, bitmask_transtbl *fcntl_flags_tbl,
int num, abi_long arg1, abi_long arg2, abi_long
arg3, abi_long arg4, abi_long arg5, abi_long arg6,
abi_long arg7, abi_long arg8, void *p,
abi_long ret);
replay_hd_transfer: Called during a replay of a hard drive transfer action
Callback ID: PANDA_CB_REPLAY_HD_TRANSFER
Arguments:
CPUState* env
: pointer to CPUStateuint32_t type
: type of transfer (Hd_transfer_type)uint64_t src_addr
: address for srcuint64_t dest_addr
: address for destuint32_t num_bytes
: size of transfer in bytes
Return value: unused
Notes: In replay only, some kind of data transfer involving hard drive. NB: We are neither before nor after, really. In replay the transfer doesn't really happen. We are at the point at which it happened, really. Even though the transfer doesn't happen in replay, useful instrumentations (such as taint analysis) can still be applied accurately.
Signature:
int (*replay_hd_transfer)(CPUState *env, uint32_t type, uint64_t src_addr,
uint64_t dest_addr, uint32_t num_bytes);
replay_before_cpu_physical_mem_rw_ram: In replay only, we are about to dma from some qemu buffer to guest memory
Callback ID: PANDA_CB_REPLAY_BEFORE_CPU_PHYSICAL_MEM_RW_RAM
Arguments:
CPUState* env
: pointer to CPUStateuint32_t is_write
: type of transfer going on (is_write == 1 means IO -> RAM else RAM -> IO)uint64_t src_addr
: src of dmauint64_t dest_addr
: dest of dmauint32_t num_bytes
: size of transfer
Return value: unused
Notes: In the current version of QEMU, this appears to be a less commonly used method of performing DMA with the hard drive device. For the hard drive, the most common DMA mechanism can be seen in the PANDA_CB_REPLAY_HD_TRANSFER_TYPE under type HD_TRANSFER_HD_TO_RAM (and vice versa). Other devices still appear to use cpu_physical_memory_rw() though.
Signature:
int (*replay_before_cpu_physical_mem_rw_ram)(
CPUState *env, uint32_t is_write, uint64_t src_addr, uint64_t dest_addr,
uint32_t num_bytes);
replay_handle_packet: TODO: This will be used for network packet replay.
Callback ID: PANDA_CB_REPLAY_HANDLE_PACKET
Arguments:
CPUState *env
: pointer to CPUStateuint8_t *buf
: buffer containing packet dataint size
: num bytes in bufferuint8_t direction
: XXX read or write. not sure which is which.uint64_t old_buf_addr
: XXX this is a mystery
Signature:
int (*replay_handle_packet)(CPUState *env, uint8_t *buf, int size,
uint8_t direction, uint64_t old_buf_addr);
To make the information in the preceding sections concrete, we will now show how to implement a low-overhead x86 system call monitor as a PANDA plugin. To do so, we will use the PANDA_CB_INSN_TRANSLATE
and PANDA_CB_INSN_EXEC
callbacks to create instrumentation that will execute only when the sysenter
command is executed on x86.
First, we will create a Makefile
for our plugin, and place it in panda/qemu/panda_plugins/syscalls
:
# Don't forget to add your plugin to config.panda!
# Set your plugin name here. It does not have to correspond to the name
# of the directory in which your plugin resides.
PLUGIN_NAME=syscalls
# Include the PANDA Makefile rules
include ../panda.mak
# If you need custom CFLAGS or LIBS, set them up here
# CFLAGS+=
# LIBS+=
# The main rule for your plugin. Please stick with the panda_ naming
# convention.
panda_$(PLUGIN_NAME).so: $(PLUGIN_TARGET_DIR)/$(PLUGIN_NAME).o
$(call quiet-command,$(CC) $(QEMU_CFLAGS) -shared -o $(SRC_PATH)/$(TARGET_DIR)/$@ $^ $(LIBS)," PLUGIN $@")
all: panda_$(PLUGIN_NAME).so
Next, we'll create the main code for the plugin, and put it in panda/qemu/panda_plugins/syscalls.c
:
#include "config.h"
#include "qemu-common.h"
#include "cpu.h"
#include "panda_plugin.h"
#include <stdio.h>
#include <stdlib.h>
bool translate_callback(CPUState *env, target_ulong pc);
int exec_callback(CPUState *env, target_ulong pc);
bool init_plugin(void *);
void uninit_plugin(void *);
// This is where we'll write out the syscall data
FILE *plugin_log;
// Check if the instruction is sysenter (0F 34)
bool translate_callback(CPUState *env, target_ulong pc) {
unsigned char buf[2];
cpu_memory_rw_debug(env, pc, buf, 2, 0);
if (buf[0] == 0x0F && buf[1] == 0x34)
return true;
else
return false;
}
// This will only be called for instructions where the
// translate_callback returned true
int exec_callback(CPUState *env, target_ulong pc) {
#ifdef TARGET_I386
// On Windows and Linux, the system call id is in EAX
fprintf(plugin_log,
"PC=" TARGET_FMT_lx ", SYSCALL=" TARGET_FMT_lx "\n",
pc, env->regs[R_EAX]);
#endif
return 0;
}
bool init_plugin(void *self) {
// Don't bother if we're not on x86
#ifdef TARGET_I386
panda_cb pcb;
pcb.insn_translate = translate_callback;
panda_register_callback(self, PANDA_CB_INSN_TRANSLATE, pcb);
pcb.insn_exec = exec_callback;
panda_register_callback(self, PANDA_CB_INSN_EXEC, pcb);
#endif
plugin_log = fopen("syscalls.txt", "w");
if(!plugin_log) return false;
else return true;
}
void uninit_plugin(void *self) {
fclose(plugin_log);
}
The init_plugin
function registers the callbacks for instruction translation and execution. Because we are only implementing an x86 callback monitor, we wrap the callback registration in an #ifdef TARGET_I386
; this means that on other architectures the plugin won't do anything (since no callbacks will be registered). It also opens up a text file that the plugin will use to log the system calls executed by the guest; if opening the file fails, init_plugin
returns false, which will cause PANDA to unload the plugin immediately.
The translate_callback
function reads the bytes that make up the instruction that QEMU is about to translate using cpu_memory_rw_debug
, and and checks to see whether it is a sysenter
instruction. If so, then it returns true
, which tells PANDA to insert instrumentation that will cause the exec_callback
function to be called when the instruction is executed by the guest.
Inside exec_callback
, we simply log the current program counter (EIP
) and the contents of the EAX
register, which is used on both Windows and Linux to hold the system call number.
Finally, in uninit_plugin
, we simply close the plugin log file.
To make the plugin, we add it to the list of plugins in panda/qemu/panda_plugins/config.panda
:
PANDA_PLUGINS = sample taintcap textfinder textprinter syscalls
Then run make
from the base QEMU directory:
brendan@laredo3:~/hg/panda/qemu$ make
CC /home/brendan/hg/panda/qemu//x86_64-softmmu//panda_plugins/syscalls.o
PLUGIN panda_syscalls.so
CC /home/brendan/hg/panda/qemu//i386-linux-user//panda_plugins/syscalls.o
PLUGIN panda_syscalls.so
CC /home/brendan/hg/panda/qemu//arm-linux-user//panda_plugins/syscalls.o
PLUGIN panda_syscalls.so
CC /home/brendan/hg/panda/qemu//arm-softmmu//panda_plugins/syscalls.o
PLUGIN panda_syscalls.so
Finally, you can run QEMU with the plugin enabled:
x86_64-softmmu/qemu-system-x86_64 -m 1024 -vnc :0 -monitor stdio \
-hda /scratch/qcows/qcows/win7.1.qcow2 -loadvm booted -k en-us \
-panda syscalls
When run on a Windows 7 VM, this plugin produces output in syscalls.txt
that looks like:
PC=0000000077bd70b2, SYSCALL=0000000000000153
PC=0000000077bd70b2, SYSCALL=0000000000000188
PC=0000000077bd70b2, SYSCALL=00000000000011fa
PC=0000000077bd70b2, SYSCALL=00000000000011c7
PC=0000000077bd70b2, SYSCALL=00000000000011c7
PC=0000000077bd70b2, SYSCALL=0000000000001232
PC=0000000077bd70b2, SYSCALL=0000000000001232
PC=0000000077bd70b2, SYSCALL=000000000000114d
PC=0000000077bd70b2, SYSCALL=0000000000001275
The raw system call numbers could also be translated into their names, e.g. by using Volatility's list of Windows 7 system calls.