diff --git a/README.md b/README.md
index c52e514..13bb21d 100644
--- a/README.md
+++ b/README.md
@@ -5,19 +5,20 @@
[![Total alerts](https://img.shields.io/lgtm/alerts/g/Kazhuu/asm2cfg.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/Kazhuu/asm2cfg/alerts/)
[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/Kazhuu/asm2cfg.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/Kazhuu/asm2cfg/context:python)
-Python command-line tool and GDB extension to view and save x86, ARM and objdump
-assembly files as control-flow graph (CFG) pdf files. From GDB debugging session
-use `viewcfg` command to view CFG and use `savecfg` command to save it to the
-pdf file.
+Asm2cfg is a python command-line tool and GDB extension to view and save x86 and
+ARM assembly files from GDB, objdump or CSV files as control-flow graph (CFG)
+pdf files.
+From a GDB debugging session use the `viewcfg` command to view CFG and use
+the `savecfg` command to save it to a pdf file.
-Program has been developed to support X86, ARM and objdump assembly outputs.
-Program is mostly tested with x86 assembly. ARM and objdump formats might not be
-fully supported. If you have any suggestions or find bugs, please open an issue
-or create a pull request. If you want to contribute, check
+Asm2cfg has been developed to support X86, ARM, objdump, GDB and CSV assembly
+outputs. The program is mostly tested with x86 assembly. ARM, objdump and CSV
+formats might not be fully supported. If you have any suggestions or find bugs,
+please open an issue or create a pull request. If you want to contribute, check
[Development](#development) how to get started.
## Table of Content
@@ -26,10 +27,10 @@ or create a pull request. If you want to contribute, check
* [Install](#install)
* [Usage From GDB](#usage-from-gdb)
-* [Usage as Standalone](#usage-as-standalone)
- * [Knowing Function Name](#knowing-function-name)
+* [Standalone Usage](#standalone-usage)
+ * [Get Function Names](#get-function-names)
* [Disassemble Function](#disassemble-function)
- * [Draw CFG](#draw-cfg)
+ * [Draw CFG](#draw-cfgs)
* [Examples](#examples)
* [Development](#development)
* [Python Environment](#python-environment)
@@ -43,102 +44,107 @@ or create a pull request. If you want to contribute, check
## Install
-Project can be installed with pip
+The project can be installed using pip:
```
pip install asm2cfg
```
-To be able to view the dot files from GDB. External dot viewer is required. For
-this purpose [xdot](https://pypi.org/project/xdot/) can be used for example. Any
-other dot viewer will also do. To install this on Debian based distro run
+To be able to view the dot files from GDB an external dot viewer is required.
+For this purpose use e.g., [xdot](https://pypi.org/project/xdot/), but any
+other dot viewer will also do.
+To install xdot on Debian based distros run:
```
sudo apt install xdot
```
-Or Arch based
+On Arch based systems run:
```
sudo pacman -S xdot
```
-To add extension to GDB you need to source the pip installed plugin to it. To
-find where pip placed GDB extension run `which gdb_asm2cfg` or in case if you
+To add the extension to GDB you need to source the pip installed plugin. To
+find where pip placed the GDB extension run `which gdb_asm2cfg` or if you
use pyenv use `pyenv which gdb_asm2cfg`. Copy the path to the clipboard.
-Then in you home directory if not already add `.gdbinit` file
-and place following line in it and replace path from the earlier step.
+Then in your home directory, if not already there, add the `.gdbinit` file
+and place following line in it and replace path from the result from the
+previous step.
```
source
```
-For example in my Linux machine line end up to be
+For example on my Linux machine I end up with the following line:
```
source ~/.local/bin/gdb_asm2cfg.py
```
-Now when you start GDB no errors should be displayed and you are ready to go.
+Now when you start GDB, there should be no errors and you are ready to go.
## Usage From GDB
-In GDB session this extension provides command `viewcfg` to view CFG with
-external dot viewer. Command `savecfg` saves the CFG to pdf file to current
-working directory with same name as the function being dumped. Both commands
-disassemble the current execution frame/function when the command is issued. To
-see help for these commands use `help` command like `help viewcfg`.
+In the GDB session this extension provides the command `viewcfg` to view the CFG
+with an external dot viewer. The command `savecfg` saves the CFG to a pdf file
+located in the current working directory with same name as the function being
+dumped. Both commands disassemble the current execution frame/function when the
+command is issued. To see the help for these commands use the `help` command
+e.g, `help viewcfg`.
-For example let's view main function from you favorite non-stripped executable.
-First run GDB until main function
+For example let's view the main function of your favorite non-stripped
+executable. First run GDB until main function:
```
gdb -ex 'b main' -ex 'run'
```
-Now run `viewcfg` to view CFG as a dot graph with external editor. Or run `savecfg`
-to save CFG to pdf file named `main.pdf` to current working directory. If
-function is stripped then memory address of the function will used as a name
-instead. For example `0x555555555faf-0x555555557008.pdf`.
+Now run `viewcfg` to view the CFG as a dot graph with an external editor.
+Or run `savecfg` to save the CFG as a pdf file named `main.pdf` to current
+working directory. If the function is stripped the memory address of the
+function will be used as the name instead.
+For example `0x555555555faf-0x555555557008.pdf`.
-If assembly function is very large with a lot of jumps and calls to other
+If the assembly function is very large with a lot of jumps and calls to other
functions. Then rendering the CFG can take a long time. So be patient or cancel
-rendering with Ctrl-C. To make the rendering faster you can skip function calls
-instructions from splitting the code to more blocks. To set this run `set
-skipcalls on` and then run earlier command again. Note that if function is long
-and has a lot of jumps inside itself, then rendering is still gonna take a long
-time. To have normal behavior again run `set skipcalls off`.
+rendering with Ctrl-C. To make the rendering faster you can skip function call
+instructions from splitting the code into more blocks. To enable this option
+run `set skipcalls on` and then run the previous command again. Note that if
+the function is long and has a lot of jumps inside itself, then rendering will
+still gonna take a long time. To reset to the normal behavior again, run
+`set skipcalls off`.
-## Usage as Standalone
+## Standalone Usage
-This method can be used with assembly files saved from ouput of objdump and GDB
-disassembly. Pip installation will come with `asm2cfg` command-line tool for
-this purpose.
+Asm2cfg can also be used on (dis)-assembly files saved from the output of
+objdump and GDB. The pip installation comes with the `asm2cfg` command-line
+tool for this purpose.
To use as standalone script you first need to dump assembly from GDB or objdump
to the file which is explained below.
-### Knowing Function Name
+### Get Function Names
-If you don't know the name of function you're looking for then you can also list
-all function names using GDB:
+If you don't know the name of the function you're looking for then you can also
+list all function names using GDB:
```
gdb -batch -ex 'b main' -ex r -ex 'info functions' ./test_executable
```
-This will set breakpoint at function `main`, then
+This will set a breakpoint at the function `main`, then
run the program and print symbols from all loaded libraries.
-For functions which come from main executable you can avoid running the program
-and simply do
+For functions which come from the main executable you can avoid running the
+program and simply do:
```
gdb -batch -ex 'info functions' ./test_executable
```
-If you want to narrow the search down you can also use regexp
+If you want to narrow the search down you can also use regexp:
```
gdb ... -ex 'info functions ' ...
@@ -146,22 +152,22 @@ gdb ... -ex 'info functions ' ...
### Disassemble Function
-Once you have the function name, you can produce its disassembly via
+Once you have the function name, you can produce its disassembly via:
```
gdb -batch -ex 'b main' -ex r -ex 'pipe disassemble test_function | tee test_function.asm' ./test_executable
```
-or
+Or:
```
gdb -batch -ex 'set breakpoints pending on' -ex 'b test_function' -ex r -ex 'pipe disassemble | tee test_function.asm' ./test_executable
```
-(the `set breakpoint pending on` command enables pending breakpoints and
-could be added to your `.gdbinit` instead)
+(The `set breakpoint pending on` command enables pending breakpoints and
+could be added to your `.gdbinit` instead.)
-For functions from main executable it's enough to do
+For functions from the main executable it's enough to do:
```
gdb -batch -ex 'pipe disassemble test_function | tee test_function.asm' ./test_executable
@@ -173,57 +179,56 @@ You can also extract function's disassembly from `objdump` output:
objdump -d ./test_executable | sed -ne '/ test_executable.asm
```
-(this may be useful for specific non-native targets which lack GDB support).
+(This may be useful for specific non-native targets which lack GDB support.)
-### Draw CFG
+### Draw CFGs
-Now you have the assembly file. Time to turn that to CFG pdf file. Do that by giving it
-to `asm2cfg` command-line tool like so
+Now, with the assembly file, it is time to use asm2cfg to generate a pdf file
+containing the CFG. Just give the assembly file to the `asm2cfg`
+command-line tool:
```
asm2cfg test_function.asm
```
-Asm2cfg by default expects x86 assembly files. If you want to use ARM assembly files,
-then provide `--target arm` command-line flag.
+Asm2cfg by default expects x86 assembly files. If you want to use ARM assembly
+files, then provide the `--target arm` command-line flag.
-Above command should output `test_function.pdf` file in the same directory where
-the executable was ran. If the assembly file is stripped then the function
-memory range is used as a name instead. For example
-`0x555555555faf-0x555555557008.pdf`.
+The above command outputs the `test_function.pdf` pdf file in the same
+directory. If the assembly file is stripped then the function memory range is
+used as a name instead. For example `0x555555555faf-0x555555557008.pdf`.
-To view CFG instead of saving provide `-v` flag. And to skip function calls from
-splitting the code to further blocks provide `-c` flag. To show the help use
-`-h`.
+To view the CFG instead of saving it, provide the `-v` flag. And to prohibit
+function calls from splitting the code into further blocks provide the `-c`
+flag. To show the help use `-h`.
### Examples
-Repository includes examples which can be used to test the standalone
+The repository includes examples which can be used to test the standalone
functionality for x86, ARM and objdump.
-File `test_function.asm` is non-stripped assembly file and its
-corresponding output `test_function.pdf`.
+The file `test_function.asm` is a non-stripped assembly file and its
+corresponding output file is `test_function.pdf`.
-File `stripped_function.asm` contains
-stripped function and its corresponding output
-`stripped_function.pdf`.
+The file `stripped_function.asm` contains a stripped function and its
+corresponding output file is `stripped_function.pdf`.
-File `att_syntax.asm` is an example of non-stripped AT&T assembly.
+The file `att_syntax.asm` is an example of a non-stripped AT&T assembly file.
-File `huge.asm` is a large stripped
-assembly function and its corresponding output `huge.pdf`. This can be used to
-test processing time of big functions.
+The file `huge.asm` is a large stripped assembly function and its
+corresponding output file is `huge.pdf`. This file can be used to
+test the processing time of big functions.
-Files `objdump.asm` and `stripped_objdump.asm` are the regular and stripped
+The files `objdump.asm` and `stripped_objdump.asm` are the regular and stripped
objdump-based disassemblies of short functions.
-File `arm.asm` is ARM based assembly file and its corresponding pdf file is
+The file `arm.asm` is ARM based assembly file and its corresponding pdf file is
`arm.pdf`.
## Development
You want to contribute? You're very welcome to do so! This section will give you
-guidance how to setup development environment and test things locally.
+guidance on how to setup the development environment and test things locally.
### Python Environment
@@ -234,23 +239,23 @@ normal pip and virtualenv usage.
Install pipenv for your system following the guide
[here](https://pipenv.pypa.io/en/latest/).
-After installing pipenv. Create virtual environment and install all required
-packages to it. Run following at project root
+After installing pipenv. Create a virtual environment and install all required
+packages. Run following at the project root:
```
pipenv install -d
```
-Now you can activate the virtual environment with
+Now you can activate the virtual environment with:
```
pipenv shell
```
-Now your `python` and `pip` commands will correspond to created virtual environment
-instead of your system's Python installation.
+Now your `python` and `pip` commands will correspond to the created virtual
+environment instead of your system's Python installation.
-To deactivate the environment, use
+To deactivate the environment, use:
```
exit
@@ -260,31 +265,32 @@ exit
This project uses [pytest](https://pypi.org/project/pytest/) for testing. Some
test are written using Python's own unittest testing framework, but they work
-with pytest out of the box. Pytest style is preferred way to write tests.
+with pytest out of the box. The pytest style is the preferred way to write
+tests.
-To run tests from project root, use `pytest` or
+To run tests from project root, use `pytest` or:
```
pipenv run pytest
```
-During testing dot viewer might be opened if you have it installed. This is
-because GDB integration command `viewcfg` is tested, which will open external
-dot viewer. Just close it after it's opened. It should not affect the test run
-itself.
+During testing the dot viewer might be opened if you have it installed. This is
+because the GDB integration command `viewcfg` is tested, which will open
+the external dot viewer. Just close it after it's opened. It should not affect
+the test itself.
### Code Linting
Project uses [flake8](https://flake8.pycqa.org/en/latest/) and
[pylint](https://pylint.org/) for code linting.
-To run flake8, use
+To run flake8, use:
```
flake8
```
-And to run pylint use
+And to run pylint use:
```
pylint src test
@@ -294,16 +300,16 @@ Both commands should not print any errors.
### Command-Line Interface
-To test command-line interface of asm2cfg wihtout installing the package. You
-can execute module directly. For example to print help
+To test the command-line interface of asm2cfg without installing the package
+you can execute the module directly. For example to print the help message:
```
python -m src.asm2cfg -h
```
-Standalone method can be used to try out the examples under `examples` folder as
-well. For example following command should generate `main.pdf` file to current
-working directory.
+The standalone method can be used to try out the examples in the `examples`
+folder as well. For example the following command should generate the
+`main.pdf` file in the current working directory:
```
python -m src.asm2cfg -c examples/huge.asm
@@ -311,37 +317,37 @@ python -m src.asm2cfg -c examples/huge.asm
### GDB Integration
-Before testing GDB functionality, make sure asm2cfg is not installed with pip!
-This can lead to GDB using code from pip installed asm2cfg package instead of
-code from this repository!
+Before testing the GDB functionality, make sure asm2cfg is not installed with
+pip! This can lead to GDB using code from the asm2cfg package installed by pip
+instead of the code from this repository!
-Also pipenv cannot be used with GDB. You need to install required packages to
-your system's Python pip. This is because your installed GDB is linked against
-system's Python interpreter and will use it, instead of active virtual
-environment. If packages are not installed to your system's pip. You are likely
-to receive following error messages when trying to use asm2cfg with GDB
+Also pipenv cannot be used with GDB. You need to install the required packages to
+your system's python pip. This is because your installed GDB is linked against
+your system's python interpreter and will use it, instead of the active virtual
+environment. If the packages are not installed to your system's pip you are likely
+to receive following error messages when trying to use asm2cfg with GDB:
```
ModuleNotFoundError: No module named 'graphviz'
```
To fix this, install required packages to your system's pip without active
-virtual environment. Currently GDB integration only requires graphviz.
+virtual environment. Currently the GDB integration only requires graphviz.
```
pip install graphviz
```
-To use asm2cfg GDB related functionality. Use following line from
-project root.
+To use the GDB related functionality of asm2cfg. Use following line from
+project root:
```
PYTHONPATH=${PWD}/src gdb -ex 'source src/gdb_asm2cfg.py'
```
-This will set Python import path so that GDB can import code from this
+This will set the python import path so that GDB can import code from this
repository without installing the package. After this you should be able to use
-commands `viewcfg` and `savecfg`.
+the commands `viewcfg` and `savecfg`.
### Current Development Goals
@@ -351,5 +357,5 @@ lines. If you encounter such problems please open an issue.
Current developed goals are best described in issues section. Please open a new
one if existing one does not exist.
-If you want to talk to me, you can contact me at Discord with name
+If you want to talk to me, you can contact me on Discord with name
`Kazhuu#3121`.
diff --git a/examples/dataframe.csv b/examples/dataframe.csv
new file mode 100644
index 0000000..252ca4a
--- /dev/null
+++ b/examples/dataframe.csv
@@ -0,0 +1,60 @@
+address;bytes;operator;operand
+4608;55;pushq;%rbp
+4609;53;pushq;%rbx
+4610;50;pushq;%rax
+4611;488b5e08;movq;8(%rsi), %rbx
+4615;4889df;movq;%rbx, %rdi
+4618;e827410000;callq;4176
+4623;85c0;testl;%eax, %eax
+4625;7437;je;4642
+4627;8d48ff;leal;-1(%rax), %ecx
+4630;83f903;cmpl;$3, %ecx
+4633;732e;jae;4649
+4635;31ed;xorl;%ebp, %ebp
+4637;e9fa000000;jmp;4718
+4642;31ed;xorl;%ebp, %ebp
+4644;e9a1010000;jmp;4777
+4649;89c1;movl;%eax, %ecx
+4651;83e1fc;andl;$-4, %ecx
+4654;31ed;xorl;%ebp, %ebp
+4656;0f1f840000000000;imull;$131, %ebp, %edx
+4664;69d531010000;movsbl;(%rbx), %esi
+4670;0fbe33;addl;%edx, %esi
+4673;01d6;imull;$131, %esi, %edx
+4675;69d631010000;movsbl;1(%rbx), %esi
+4681;0fbe7301;addl;%edx, %esi
+4685;01d6;imull;$131, %esi, %edx
+4687;69d631010000;movsbl;2(%rbx), %esi
+4693;0fbe7302;addl;%edx, %esi
+4697;01d6;imull;$131, %esi, %edx
+4699;69d631010000;movsbl;3(%rbx), %ebp
+4705;0fbe6b03;addl;%edx, %ebp
+4709;01d5;addq;$4, %rbx
+4711;4883c304;addl;$-4, %ecx
+4715;83c1fc;jne;4656
+4718;0f854cffffff;testb;$3, %al
+4724;a803;je;4760
+4726;746c;andl;$3, %eax
+4728;83e003;xorl;%ecx, %ecx
+4731;31c9;imull;$131, %ebp, %edx
+4733;66662e0f1f840000000000;movsbl;(%rbx,%rcx), %ebp
+4744;69d531010000;addl;%edx, %ebp
+4750;0fbe2c0b;addq;$1, %rcx
+4754;01d5;cmpl;%ecx, %eax
+4756;4883c101;jne;4731
+4760;39c8;cmpl;$-1114471758, %ebp
+4762;75bc;jne;4777
+4764;81fda8e8b8eb;movl;$-1114471758, %ebp
+4770;7526;leaq;.Lstr.1(%rip), %rdi
+4772;bda8e8b8eb;jmp;4784
+4777;488d3dae000000;leaq;.Lstr(%rip), %rdi
+4784;eb0f;callq;4144
+4786;488d3d82000000;leaq;.L.str.2(%rip), %rdi
+4793;e8d63e0000;movl;%ebp, %esi
+4798;488d3d52000000;xorl;%eax, %eax
+4805;89ee;callq;4128
+4807;31c0;xorl;%eax, %eax
+4809;e88a3e0000;addq;$8, %rsp
+4814;31c0;popq;%rbx
+4816;4883c408;popq;%rbp
+4820;5b;retq;None
diff --git a/src/asm2cfg/asm2cfg.py b/src/asm2cfg/asm2cfg.py
index 2c426ae..fbcae1e 100644
--- a/src/asm2cfg/asm2cfg.py
+++ b/src/asm2cfg/asm2cfg.py
@@ -2,29 +2,71 @@
Module containing main building blocks to parse assembly and draw CFGs.
"""
+from abc import ABC, abstractmethod
import re
import sys
import tempfile
-
+from enum import Enum
from graphviz import Digraph
# TODO: make this a command-line flag
VERBOSE = 0
+# Common regexes
+HEX_PATTERN = r'[0-9a-fA-F]+'
+HEX_LONG_PATTERN = r'(?:0x0*)?' + HEX_PATTERN
-def escape(instruction):
+
+class InputFormat(Enum):
"""
- Escape used dot graph characters in given instruction so they will be
- displayed correctly.
+ An enum which represents various supported input formats
"""
- instruction = instruction.replace('<', r'\<')
- instruction = instruction.replace('>', r'\>')
- instruction = instruction.replace('|', r'\|')
- instruction = instruction.replace('{', r'\{')
- instruction = instruction.replace('}', r'\}')
- instruction = instruction.replace(' ', ' ')
- return instruction
+
+ GDB = 'GDB'
+ OBJDUMP = 'OBJDUMP'
+ CSV = 'CSV'
+
+
+class JumpTable:
+ """
+ Holds info about branch sources and destinations in asm function.
+ """
+
+ def __init__(self, instructions):
+ # Address where the jump begins and value which address
+ # to jump to. This also includes calls.
+ self.abs_sources = {}
+ self.rel_sources = {}
+
+ # Addresses where jumps end inside the current function.
+ self.abs_destinations = set()
+ self.rel_destinations = set()
+
+ # Iterate over the lines and collect jump targets and branching points.
+ for inst in instructions:
+ if inst is None or not inst.is_direct_jump():
+ continue
+
+ self.abs_sources[inst.address.abs] = inst.target
+ self.abs_destinations.add(inst.target.abs)
+
+ self.rel_sources[inst.address.offset] = inst.target
+ self.rel_destinations.add(inst.target.offset)
+
+ def is_destination(self, address):
+ if address.abs is not None:
+ return address.abs in self.abs_destinations
+ if address.offset is not None:
+ return address.offset in self.rel_destinations
+ return False
+
+ def get_target(self, address):
+ if address.abs is not None:
+ return self.abs_sources.get(address.abs)
+ if address.offset is not None:
+ return self.rel_sources.get(address.offset)
+ return None
class BasicBlock:
@@ -85,69 +127,11 @@ def __repr__(self):
return '\n'.join([i.text for i in self.instructions])
-def print_assembly(basic_blocks):
- """
- Debug function to print the assembly.
- """
- for basic_block in basic_blocks.values():
- print(basic_block)
-
-
-def read_lines(file_path):
- """ Read lines from the file and return then as a list. """
- lines = []
- with open(file_path, 'r', encoding='utf8') as asm_file:
- lines = asm_file.readlines()
- return lines
-
-
-# Common regexes
-HEX_PATTERN = r'[0-9a-fA-F]+'
-HEX_LONG_PATTERN = r'(?:0x0*)?' + HEX_PATTERN
-
-
-class InputFormat: # pylint: disable=too-few-public-methods
- """
- An enum which represents various supported input formats
- """
- GDB = 'GDB'
- OBJDUMP = 'OBJDUMP'
-
-
-def parse_function_header(line):
- """
- Return function name of memory range from the given string line.
-
- Match lines for non-stripped binaries:
- 'Dump of assembler code for function test_function:'
- lines for stripped binaries:
- 'Dump of assembler code from 0x555555555faf to 0x555555557008:'
- and lines for obdjdump disassembly:
- '0000000000016bb0 <_obstack_allocated_p@@Base>:'
- """
-
- objdump_name_pattern = re.compile(fr'{HEX_PATTERN} <([a-zA-Z_0-9@.]+)>:')
- function_name = objdump_name_pattern.search(line)
- if function_name is not None:
- return InputFormat.OBJDUMP, function_name[1]
-
- function_name_pattern = re.compile(r'function (\w+):$')
- function_name = function_name_pattern.search(line)
- if function_name is not None:
- return InputFormat.GDB, function_name[1]
-
- memory_range_pattern = re.compile(fr'(?:Address range|from) ({HEX_LONG_PATTERN}) to ({HEX_LONG_PATTERN}):$')
- memory_range = memory_range_pattern.search(line)
- if memory_range is not None:
- return InputFormat.GDB, f'{memory_range[1]}-{memory_range[2]}'
-
- return None, None
-
-
class Address:
"""
Represents location in program which may be absolute or relative
"""
+
def __init__(self, abs_addr, base=None, offset=None):
self.abs = abs_addr
self.base = base
@@ -182,6 +166,7 @@ class Encoding:
e.g. the '31 c0' in
'16bd3: 31 c0 xor %eax,%eax'
"""
+
def __init__(self, bites):
self.bites = bites
@@ -192,7 +177,46 @@ def __str__(self):
return ' '.join(map(lambda b: f'{b:#x}', self.bites))
-class X86TargetInfo:
+class TargetInfo(ABC):
+ """
+ Abstract class, contains instruction info for the targets.
+ """
+
+ def __init__(self):
+ pass
+
+ @abstractmethod
+ def comment(self):
+ """
+ Returns the comment symbol for the target.
+ """
+
+ @abstractmethod
+ def is_call(self, instruction):
+ """
+ Returns True if the instruction is of type call.
+ """
+
+ @abstractmethod
+ def is_jump(self, instruction):
+ """
+ Returns True if the instruction is of type jump.
+ """
+
+ @abstractmethod
+ def is_unconditional_jump(self, instruction):
+ """
+ Returns True if the instruction is an is_unconditional jump.
+ """
+
+ @abstractmethod
+ def is_sink(self, instruction):
+ """
+ Is this an instruction which terminates function execution e.g. return?
+ """
+
+
+class X86TargetInfo(TargetInfo):
"""
Contains instruction info for X86-compatible targets.
"""
@@ -223,7 +247,7 @@ def is_sink(self, instruction):
return instruction.opcode.startswith('ret')
-class ARMTargetInfo:
+class ARMTargetInfo(TargetInfo):
"""
Contains instruction info for ARM-compatible targets.
"""
@@ -266,6 +290,7 @@ class Instruction:
Represents a single assembly instruction with it operands, location and
optional branch target
"""
+
def __init__(self, body, text, lineno, address, opcode, ops, target, imm, target_info): # noqa
self.body = body
self.text = text
@@ -303,6 +328,69 @@ def __str__(self):
return result
+def escape(instruction):
+ """
+ Escape used dot graph characters in given instruction so they will be
+ displayed correctly.
+ """
+ instruction = instruction.replace('<', r'\<')
+ instruction = instruction.replace('>', r'\>')
+ instruction = instruction.replace('|', r'\|')
+ instruction = instruction.replace('{', r'\{')
+ instruction = instruction.replace('}', r'\}')
+ instruction = instruction.replace(' ', ' ')
+ return instruction
+
+
+def print_assembly(basic_blocks):
+ """
+ Debug function to print the assembly.
+ """
+ for basic_block in basic_blocks.values():
+ print(basic_block)
+
+
+def read_lines(file_path):
+ """ Read lines from the file and return then as a list. """
+ lines = []
+ with open(file_path, 'r', encoding='utf8') as asm_file:
+ lines = asm_file.readlines()
+ return lines
+
+
+def parse_function_header(line):
+ """
+ Return function name of memory range from the given string line.
+
+ Match lines for non-stripped binaries:
+ 'Dump of assembler code for function test_function:'
+ lines for stripped binaries:
+ 'Dump of assembler code from 0x555555555faf to 0x555555557008:'
+ and lines for obdjdump disassembly:
+ '0000000000016bb0 <_obstack_allocated_p@@Base>:'
+ """
+
+ objdump_name_pattern = re.compile(fr'{HEX_PATTERN} <([a-zA-Z_0-9@.]+)>:')
+ function_name = objdump_name_pattern.search(line)
+ if function_name is not None:
+ return InputFormat.OBJDUMP, function_name[1]
+
+ function_name_pattern = re.compile(r'function (\w+):$')
+ function_name = function_name_pattern.search(line)
+ if function_name is not None:
+ return InputFormat.GDB, function_name[1]
+
+ memory_range_pattern = re.compile(fr'(?:Address range|from) ({HEX_LONG_PATTERN}) to ({HEX_LONG_PATTERN}):$')
+ memory_range = memory_range_pattern.search(line)
+ if memory_range is not None:
+ return InputFormat.GDB, f'{memory_range[1]}-{memory_range[2]}'
+
+ if line.strip() == 'address;bytes;operator;operand':
+ return InputFormat.CSV, None
+
+ return None, None
+
+
def parse_address(line):
"""
Parses leading address of instruction
@@ -397,10 +485,40 @@ def parse_comment(line, target_info):
return target, imm_match[3]
+def parse_line_csv(line: str, lineno, target_info):
+ """
+ Parse a single line of assembly to create an Instruction instance.
+ """
+ original_line = line
+ elements: list[str] = line.split(';')
+ addr: Address = Address(int(elements[0]))
+ operands: str = elements[3]
+ target: Address | None = None
+ match = re.match(r'^[\d]+$', operands)
+ if match:
+ target = Address(int(operands))
+ txt = original_line.strip()
+ return Instruction(
+ body=None,
+ text=txt,
+ lineno=lineno,
+ address=addr,
+ opcode=elements[2],
+ ops=operands,
+ target=target,
+ imm=None,
+ target_info=target_info,
+ )
+
+
def parse_line(line, lineno, function_name, fmt, target_info):
"""
Parses a single line of assembly to create Instruction instance
"""
+ original_line = line
+
+ if fmt == InputFormat.CSV:
+ return parse_line_csv(line, lineno, target_info)
# Strip GDB prefix and leading whites
if line.startswith('=> '):
@@ -417,7 +535,6 @@ def parse_line(line, lineno, function_name, fmt, target_info):
if not line:
return encoding
- original_line = line
body, opcode, ops, line = parse_body(line, target_info)
if opcode is None:
return None
@@ -438,47 +555,6 @@ def parse_line(line, lineno, function_name, fmt, target_info):
return Instruction(body, original_line.strip(), lineno, address, opcode, ops, target, imm, target_info)
-class JumpTable:
- """
- Holds info about branch sources and destinations in asm function.
- """
-
- def __init__(self, instructions):
- # Address where the jump begins and value which address
- # to jump to. This also includes calls.
- self.abs_sources = {}
- self.rel_sources = {}
-
- # Addresses where jumps end inside the current function.
- self.abs_destinations = set()
- self.rel_destinations = set()
-
- # Iterate over the lines and collect jump targets and branching points.
- for inst in instructions:
- if inst is None or not inst.is_direct_jump():
- continue
-
- self.abs_sources[inst.address.abs] = inst.target
- self.abs_destinations.add(inst.target.abs)
-
- self.rel_sources[inst.address.offset] = inst.target
- self.rel_destinations.add(inst.target.offset)
-
- def is_destination(self, address):
- if address.abs is not None:
- return address.abs in self.abs_destinations
- if address.offset is not None:
- return address.offset in self.rel_destinations
- return False
-
- def get_target(self, address):
- if address.abs is not None:
- return self.abs_sources.get(address.abs)
- if address.offset is not None:
- return self.rel_sources.get(address.offset)
- return None
-
-
def parse_lines(lines, skip_calls, target_name): # noqa pylint: disable=unused-argument
if target_name == 'x86':
target_info = X86TargetInfo()
@@ -492,6 +568,8 @@ def parse_lines(lines, skip_calls, target_name): # noqa pylint: disable=unused-
current_function_name = current_format = None
for num, line in enumerate(lines, 1):
fmt, function_name = parse_function_header(line)
+ if fmt == InputFormat.CSV:
+ function_name = 'CSV'
if function_name is not None:
assert current_function_name is None, 'we handle only one function for now'
if VERBOSE:
@@ -501,6 +579,7 @@ def parse_lines(lines, skip_calls, target_name): # noqa pylint: disable=unused-
continue
instruction_or_encoding = parse_line(line, num, current_function_name, current_format, target_info)
+ # print(instruction_or_encoding)
if isinstance(instruction_or_encoding, Encoding):
# Partial encoding for previous instruction, skip it
continue
@@ -508,12 +587,18 @@ def parse_lines(lines, skip_calls, target_name): # noqa pylint: disable=unused-
instructions.append(instruction_or_encoding)
continue
+ # Ignore the last line of gdb informing about the end of the dump
if line.startswith('End of assembler dump') or not line:
continue
+ # Ignore empty lines
if line.strip() == '':
continue
+ # Ignore the header of the CSV file
+ if line.strip() == 'address;bytes;operator;operand':
+ continue
+
print(f'Unexpected assembly at line {num}:\n {line}')
sys.exit(1)