Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86_64-w64-mingw32-gcc support. PLEASE #9

Open
swang206 opened this issue Jan 14, 2022 · 8 comments
Open

x86_64-w64-mingw32-gcc support. PLEASE #9

swang206 opened this issue Jan 14, 2022 · 8 comments

Comments

@swang206
Copy link

swang206 commented Jan 14, 2022

I really really need mingw-w64 support. i have a lot of programs that are using GCC that targets windows. Using Microsoft compiler isn't an option at all.

I really need that.
https://github.com/GrammaTech/gtirb-pprinter/blob/011f9202744bfb67f902deba34a614cc8edaf86d/src/gtirb_pprinter/AttPrettyPrinter.cpp

You can just add several things to AttPrettyPrinter to support PE. It would work perfectly without much effort. However, that solves a lot of my issues. Thank you. Please.

I love u.

@kwarrick
Copy link
Contributor

kwarrick commented Jan 19, 2022

Hi @swang206!

I've wanted to add GAS assembly output for PE for some time but it keeps getting pushed down the list.

As you suggest, it shouldn't be much work to get AT&T output. In fact, all you would really need to do so is register the printer for that syntax.

void registerPrettyPrinters() {
// ... 
 registerPrinter({"pe", "raw"}, {"x86", "x64"}, {"att"}, {"gas"},
                  std::make_shared<AttPrettyPrinterFactory>(), false, false);
}

Then use that syntax:

$ gtirb-pprinter --assembler gas --syntax att --asm out.s out.gtirb

Unfortunately, that is where it starts to get more difficult. The output assembly has many issues and will not assemble. I will outline some of the immediate problems here, and maybe we can find solutions together.


  1. ImageBase is a linker defined symbol that we need for base-relative code.
# WARNING: integral symbol __ImageBase may not have been correctly relocated
.extern __ImageBase, 0x400000

How should we define this with GAS directives?


  1. Symbols from libraries.

Symbols from kernel32.dll are not resolving:

/tmp/ccT6QuDQ.o:fake:(.text+0x74): undefined reference to `fmode'
/tmp/ccT6QuDQ.o:fake:(.text+0x1de): undefined reference to `Sleep'
/tmp/ccT6QuDQ.o:fake:(.text+0x276): undefined reference to `SetUnhandledExceptionFilter'
/tmp/ccT6QuDQ.o:fake:(.text+0x294): undefined reference to `acmdln'

I believe this can be fixed with using the __imp___ prefixed version of the symbol:

-            movl _fmode,%eax
+           movl __imp___fmode,%eax

That will require some additional work in the pretty printer for outputting the correct reference.


  1. Entrypoint/Main

We approach PE and ELF binaries a little differently. By default, for ELF binaries, we elide __libc_start_main and recompile with a simple gcc ex.s . For PE binaries, we create a symbol __EntryPoint and do not elide the CRT main "wrapper".

This leads to problems if you intend to use gcc ex.s for PE.

/usr/lib/gcc/i686-w64-mingw32/7.3-win32/../../../../i686-w64-mingw32/lib/../lib/libmingw32.a(lib32_libmingw32_a-crt0_c.o): In function `main':
./mingw-w64-crt/crt/crt0_c.c:18: undefined reference to `WinMain@16'

We can do a couple of things here, neither of which is clearly the "correct" solution.

  • Infer a "main" function so that gcc ex.s works.
  • Use the assembler directly and link afterwards.

The first will require some additional work on ddisasm's part. The second seems like an immediate solution, but will also require some adjustment.

Perhaps I just don't have the commands right yet.

$ i686-w64-mingw32-as -c out.s
out.s: Assembler messages:
out.s:134: Warning: `%es:0(%edi)' is not valid here (expected `(%edi)')
$ i686-w64-mingw32-ld ex.o msvcrt.lib KERNEL32.lib -o out.exe -L/usr/i686-w64-mingw32/lib -entry=_EntryPoint -subsystem=console
i686-w64-mingw32-ld: warning: cannot find entry symbol _EntryPoint; defaulting to 0000000000401000
ex.o:ex.c:(.text+0x2c): undefined reference to `__main'

What do you think?


  1. Syntax errors.
    Note the warning above indicates that we have incorrect assembly output:
out.s:134: Warning: `%es:0(%edi)' is not valid here (expected `(%edi)')

I expect that after we get initial support these assembly problems will be a non-significant amount of the effort.


We definitely would like GAS assembly outputs for PE targets, but we still need so solve the challenges above before we can get there. Any help you can provide in addressing these issues, even if it just determining the correct directives and commands to reassemble the output assembly would be greatly appreciated.

@kwarrick
Copy link
Contributor

@swang206 Also,

Using Microsoft compiler isn't an option at all.

Even if you cannot use the Microsoft compiler you might still be able reassemble MASM with other tools. Consider these examples.

# Install `llvm-dlltool`.
$ sudo apt-get install llvm

Build and disassemble a PE32 binary:

$ ddisasm/examples/ex1
$ i686-w64-mingw32-gcc ex.c -o ex.exe
$ ddisasm --asm out.asm --generate-import-libs ex.exe

Reassemble MASM with UASM.

$ uasm -coff out.asm

UASM is an open source MASM-compatible assembler: http://www.terraspace.co.uk/uasm.html

Link the object with MinGW:

$ i686-w64-mingw32-ld out.o KERNEL32.lib msvcrt.lib -entry __EntryPoint -subsystem console
$ wine a.exe
!!!Hello World!!!

... or ...

Link the object with LLVM lld-link:

$ sudo apt-get install lld
$ lld-link-6.0 out.o /machine:x86 /subsystem:console /entry:_EntryPoint
$ wine out.exe
!!!Hello World!!!

@kwarrick
Copy link
Contributor

@swang206

You may find this project helpful while we are still working on GAS assembler output: https://github.com/mstorsjo/msvc-wine

Setup a WINE-driven MSVC environment in a docker container:

$ git clone https://github.com/mstorsjo/msvc-wine
$ docker build --tag msvc-wine .

Build and disassemble a PE32 binary:

$ ddisasm/examples/ex1
$ i686-w64-mingw32-gcc ex.c -o ex.exe
$ ddisasm --asm out.asm --generate-import-libs ex.exe

Reassemble and link the MASM output in the docker container, with the Microsoft compiler:

$ docker run -it --rm -v $PWD:/ex1 -w /ex1 msvc-wine
$ /opt/msvc/bin/x86/ml out.asm /link /subsystem:console /entry:_EntryPoint /machine:x86
$ exit
$ wine out.exe
!!!Hello World!!!

Note that the container provides script wrappers around all of the MSVC tools:

root@499689ee9927:/ex1# ls /opt/msvc/bin/x86
armasm        cl       dumpbin      link      ml64        mt.exe     rc.exe
armasm.exe    cl.exe   dumpbin.exe  link.exe  ml64.exe    nmake      wine-msvc.sh
armasm64      cmd      lib          ml        msvcenv.sh  nmake.exe
armasm64.exe  cmd.exe  lib.exe      ml.exe    mt          rc

So you could build the binary with CL.exe as well:

$ docker run -it --rm -v $PWD:/ex1 -w /ex1 msvc-wine
$ root@499689ee9927:/ex1# /opt/msvc/bin/x86/cl ex.c
Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30138 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

ex.c
Microsoft (R) Incremental Linker Version 14.29.30138.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:ex.exe
ex.obj

@R2IDefense
Copy link

R2IDefense commented May 12, 2023

Kevin,
I've been doing a little poking at this as I'm interested in getting the assembly output to work with clang-cl / LLVM.

Doing some tests, the __ImageBase is exported by the linker, and for instructions utilizing it, I didn't have to do anything special with them. The one issue I did run into with it though was jump tables using it in a calculation. For example:
Original MASM:

$L_140081138           DWORD IMAGEREL $L_14002c6e2
        DWORD IMAGEREL $L_14002c6db
        DWORD IMAGEREL $L_14002c6cd
        DWORD IMAGEREL $L_14002c6bf

would get converted to:

.L_140081138:
          .long .L_14002c6e2-__ImageBase
          .long .L_14002c6db-__ImageBase
          .long .L_14002c6cd-__ImageBase
          .long .L_14002c6bf-__ImageBase

This would fail to compile as __ImageBase wasn't known at that time and couldn't be used in a calculation in that section. I posted a question about converting the jump table to something that LLVM would understand and it was suggested to use the .rva directive instead:

.L_140081138:
          .rva .L_14002c6e2
          .rva .L_14002c6db
          .rva .L_14002c6cd
          .rva .L_14002c6bf

It also "seemed" to be correct just leaving them as .long, removing the calculation and leaving them in the _RDATA section so that the linker would resolve it. I haven't been able to completely verify it as I've had successful compiles but not successful runs. The usage of those labels and the way that pretty printer is outputting the instructions isn't correct (yet). For some reason it's outputting some constant value or something that is wrong. In this example, instead of using .L_140081138, it wrote out the instruction as movl 528696(%r10,%r11,4),%r11d using 528696 instead of the label.

The usage of __imp_ for the external functions works in my tests. I've not run into any issues with resolving the names at link time doing so.

For the entry point when compiling, I simply had to add a .globl __EntryPoint at the top and tell the linker what the entry was.
ie: clang-cl f2_cl_att.s /link /ENTRY:__EntryPoint /SUBSYSTEM:CONSOLE user32.lib kernel32.lib shlwapi.lib /LARGEADDRESSAWARE:NO

Would be interested in your thoughts on the jump table stuff and pointers on where to look at adjusting those.
Thanks.

@nlapinski
Copy link

nlapinski commented May 18, 2023

llvm-ml will compile MASM under msys2, if uasm is not working.

the flags are odd and not documented it seems, but this will make your obj
llvm-ml --assemble main.asm

Then link with lld-link

@R2IDefense
Copy link

llvm-ml will compile MASM under msys2, if uasm is not working.

the flags are odd and not documented it seems, but this will make your obj llvm-ml --assemble main.asm

Then link with lld-link

It seems unable to handle anything with relative addresses like this:
movzx ECX,BYTE PTR [R8+RAX*4+(IMAGEREL $L_14008bc12)]

@R2IDefense
Copy link

I was able to get pprinter generating AT&T assembly syntax for Windows PE executables that can be compiled with clang/clang-cl. I need to do some additional testing and code cleanup, but will contribute back once done. It's available in our fork in the meantime if anyone is interested. https://github.com/R2IDefense/gtirb-pprinter

@kwarrick
Copy link
Contributor

@R2IDefense

Yay! I really want to add llvm-ml support. I think it makes sense to add a syntax variant at this point. I will write something up so we can run the end-to-end tests for ddisasm and see what issues remain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants