Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReleaseSmall macOS hello world binary 51K vs. only 33K with C #20468

Open
mk12 opened this issue Jul 1, 2024 · 3 comments
Open

ReleaseSmall macOS hello world binary 51K vs. only 33K with C #20468

mk12 opened this issue Jul 1, 2024 · 3 comments
Labels
bug Observed behavior contradicts documented or intended behavior

Comments

@mk12
Copy link
Contributor

mk12 commented Jul 1, 2024

Zig Version

0.14.0-dev.130+cb308ba3a

Steps to Reproduce and Observed Behavior

Environment

I'm on macOS Sonoma 14.5 on MacBook Pro with the Apple M1 Pro chip.

Reproduce

Hello world in C compiled with clang is 33K:

printf '#include <stdio.h>\n int main(void) { puts("Hello"); return 0; }' > hello.c
clang -Oz -o hello-c hello.c

wc -c hello-c
# 33432 hello-c

Hello world in Zig is 51K:

printf 'pub fn main() void { @import("std").debug.print("Hello\n", .{}); }' > hello.zig
zig build-exe -O ReleaseSmall -fsingle-threaded -femit-bin=hello-zig hello.zig

wc -c hello-zig
# 51320 hello-zig

Investigation

Here's my theory. The hello-c binary has no __DATA segment:

bloaty --domain=file hello-c -v
# FILE MAP:
# [0, 20] [Mach-O Headers], [Mach-O Headers]
# [20, 68] [Mach-O Headers], [Mach-O Headers]
# [68, 1f0] __TEXT, [Mach-O Headers]
# [1f0, 288] __TEXT, [Mach-O Headers]
# [288, 2d0] __TEXT, [Mach-O Headers]
# [2d0, 2e0] __TEXT, [Mach-O Headers]
# [2e0, 2f0] __TEXT, [Mach-O Headers]
# [2f0, 308] __TEXT, [Mach-O Headers]
# [308, 358] __TEXT, [Mach-O Headers]
# [358, 378] __TEXT, [Mach-O Headers]
# [378, 390] __TEXT, [Mach-O Headers]
# [390, 3b0] __TEXT, [Mach-O Headers]
# [3b0, 3c0] __TEXT, [Mach-O Headers]
# [3c0, 3d8] __TEXT, [Mach-O Headers]
# [3d8, 410] __TEXT, [Mach-O Headers]
# [410, 420] __TEXT, [Mach-O Headers]
# [420, 430] __TEXT, [Mach-O Headers]
# [430, 440] __TEXT, [Mach-O Headers]
# [440, 3f74] __TEXT, [__TEXT]
# [3f74, 3f94] __TEXT, __TEXT,__text
# [3f94, 3fa0] __TEXT, __TEXT,__stubs
# [3fa0, 3fa6] __TEXT, __TEXT,__cstring
# [3fa6, 3fa8] __TEXT, [__TEXT]
# [3fa8, 4000] __TEXT, __TEXT,__unwind_info
# [4000, 4008] __DATA_CONST, __DATA_CONST,__got
# [4008, 8000] __DATA_CONST, [__DATA_CONST]
# [8000, 8090] __LINKEDIT, [__LINKEDIT]
# [8090, 8098] __LINKEDIT, Function Start Addresses
# [8098, 80c8] __LINKEDIT, Symbol Table
# [80c8, 80d0] __LINKEDIT, Indirect Symbol Table
# [80d0, 80f8] __LINKEDIT, String Table
# [80f8, 8100] __LINKEDIT, [__LINKEDIT]
# [8100, 8298] __LINKEDIT, Code Signature
# ...

On the other hand, hello-zig has an 80 byte __DATA segment, pushing the final __LINKEDIT segment to offset 0xc000 for 16K page alignment:

bloaty --domain=file hello-zig -v
# FILE MAP:
# [0, 20] [Mach-O Headers], [Mach-O Headers]
# [20, 68] [Mach-O Headers], [Mach-O Headers]
# [68, 2e0] __TEXT, [Mach-O Headers]
# [2e0, 378] __TEXT, [Mach-O Headers]
# [378, 500] __TEXT, [Mach-O Headers]
# [500, 548] __TEXT, [Mach-O Headers]
# [548, 578] __TEXT, [Mach-O Headers]
# [578, 588] __TEXT, [Mach-O Headers]
# [588, 598] __TEXT, [Mach-O Headers]
# [598, 5b0] __TEXT, [Mach-O Headers]
# [5b0, 600] __TEXT, [Mach-O Headers]
# [600, 620] __TEXT, [Mach-O Headers]
# [620, 638] __TEXT, [Mach-O Headers]
# [638, 648] __TEXT, [Mach-O Headers]
# [648, 668] __TEXT, [Mach-O Headers]
# [668, 680] __TEXT, [Mach-O Headers]
# [680, 6b8] __TEXT, [Mach-O Headers]
# [6b8, 6c8] __TEXT, [Mach-O Headers]
# [6c8, 12f4] __TEXT, __TEXT,__text
# [12f4, 133c] __TEXT, __TEXT,__stubs
# [133c, 139c] __TEXT, __TEXT,__stub_helper
# [139c, 13a0] __TEXT, [__TEXT]
# [13a0, 140c] __TEXT, __TEXT,__const
# [140c, 14c3] __TEXT, __TEXT,__cstring
# [14c3, 14c4] __TEXT, [__TEXT]
# [14c4, 24fc] __TEXT, __TEXT,__unwind_info
# [24fc, 2500] __TEXT, [__TEXT]
# [2500, 2718] __TEXT, __TEXT,__eh_frame
# [2718, 4000] [Unmapped], [Unmapped]
# [4000, 4008] __DATA_CONST, __DATA_CONST,__got
# [4008, 8000] [Unmapped], [Unmapped]
# [8000, 8030] __DATA, __DATA,__la_symbol_ptr
# [8030, 8040] __DATA, __DATA,__const
# [8040, 8050] __DATA, __DATA,__data
# [8050, c000] [Unmapped], [Unmapped]
# [c000, c008] __LINKEDIT, Rebase Info
# [c008, c020] __LINKEDIT, Binding Info
# [c020, c088] __LINKEDIT, Lazy Binding Info
# [c088, c0b8] __LINKEDIT, Export Info
# [c0b8, c338] __LINKEDIT, Symbol Table
# [c338, c36c] __LINKEDIT, Indirect Symbol Table
# [c36c, c370] __LINKEDIT, [__LINKEDIT]
# [c370, c776] __LINKEDIT, String Table
# [c776, c780] __LINKEDIT, [__LINKEDIT]
# [c780, c878] __LINKEDIT, Code Signature
# ...

I'm not sure about the other sections, but let's look at __data for hello-c:

otool -d hello-c
# hello-c:

nm -s __DATA __data hello-c
# (no output)

And for hello-zig:

otool -d hello-zig
# hello-zig:
# (__DATA,__data) section
# 0000000100008040	00000000 00000000 ffffffff ffffffff

nm -s __DATA __data hello-zig
# 0000000100008048 d _Progress.stderr_mutex.0
# 0000000100008040 d ___dso_handle
# 0000000100008040 d dyld_private

That _Progress.stderr_mutex.0 looks suspicious to me. I believe it comes from here:

var stderr_mutex = std.Thread.Mutex.Recursive.init;

But I'm not using std.Progress anywhere so I don't know how it makes it into the final binary.

Expected Behavior

Zig should be able to match the binary size of C/clang.

@mk12 mk12 added the bug Observed behavior contradicts documented or intended behavior label Jul 1, 2024
@alexrp
Copy link
Sponsor Contributor

alexrp commented Jul 1, 2024

  • zig/lib/std/debug.zig

    Lines 93 to 98 in cb308ba

    pub fn print(comptime fmt: []const u8, args: anytype) void {
    lockStdErr();
    defer unlockStdErr();
    const stderr = io.getStdErr().writer();
    nosuspend stderr.print(fmt, args) catch return;
    }
  • zig/lib/std/debug.zig

    Lines 83 to 85 in cb308ba

    pub fn lockStdErr() void {
    std.Progress.lockStdErr();
    }
  • zig/lib/std/Progress.zig

    Lines 528 to 531 in cb308ba

    pub fn lockStdErr() void {
    stderr_mutex.lock();
    clearWrittenWithEscapeCodes() catch {};
    }

Note that for "real" code, you should use std.log.

Edit: That said, it seems like std.log.defaultLog also uses std.Progress. So you'd have to supply your own logFn to get completely around pulling that in, I suppose.

@mk12
Copy link
Contributor Author

mk12 commented Jul 1, 2024

@alexrp Ah good point. I had hoped -fsingle-threaded would eliminate anything to do with locking but I guess it doesn't get rid of the global variable. However even using std.io.getStdErr directly I'm still seeing the mutex:

pub fn main() void{
    @import("std").io.getStdErr().writer().writeAll("Hello\n") catch unreachable;
}

Same binary size, same otool and nm output.

@alexrp
Copy link
Sponsor Contributor

alexrp commented Jul 1, 2024

Ok, that I can't explain. That seems odd.

I guess this is why the mutex global variable is not completely eliminated though:

const SingleThreadedImpl = struct {
is_locked: bool = false,
fn tryLock(self: *@This()) bool {
if (self.is_locked) return false;
self.is_locked = true;
return true;
}
fn lock(self: *@This()) void {
if (!self.tryLock()) {
unreachable; // deadlock detected
}
}
fn unlock(self: *@This()) void {
assert(self.is_locked);
self.is_locked = false;
}
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior
Projects
None yet
Development

No branches or pull requests

2 participants