Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add -flto=thin to mac aarch64 build cflags for ez 10% gain #469

Merged

Conversation

fighet-parnet
Copy link
Contributor

@fighet-parnet fighet-parnet commented Jun 22, 2023

Adds somewhere between 5-10% improvement to wall time.

I've ran all tests, booted several fakeships and comets with this installed, and all seems to work properly.

Mac x86 is coming, I just don't have access to an intel mac right now [but soon...].
Linux will come later once I figure out some bugs.

The following is time to boot a fakeship from a brass pill:

Without LTO:
________________________________________________________
Executed in  180.46 secs    fish           external
usr time   17.83 secs    0.10 millis   17.83 secs
sys time    0.23 secs    1.86 millis    0.23 secs


With LTO:
________________________________________________________
Executed in  164.87 secs    fish           external
usr time   15.65 secs    0.12 millis   15.65 secs
sys time    0.19 secs    1.94 millis    0.19 secs

180/164 -> 1.09


~barter-simsum

Brass pill boot:

x86 linux without lto: ~140s
x86 linux with lto: ~128s

~ 8.6% improvement - note we may be able to squeeze a bit more out if we can
apply -flto to all dependencies and not just urbit binaries. This is currently
an issue for x86 linux though due to some weird uninvestigated behavior with
libsigsegv

For those curious, the following hints at what was inlined

readelf -s ./bazel-bin/pkg/vere/urbit | grep lto_priv
  1233: 000000000042d0f0   361 FUNC    LOCAL  DEFAULT    2 _rebalance.lto_priv.1
  1237: 000000000042d6c0   435 FUNC    LOCAL  DEFAULT    2 _rebalance.lto_priv.0
 13002: 0000000000d24a00   200 OBJECT  GLOBAL HIDDEN    14 u3_Signal.lto_priv.0
 13358: 0000000000d36de0     8 OBJECT  GLOBAL HIDDEN    14 sec_u.lto_priv.0
 13429: 0000000000d24ae0    80 OBJECT  GLOBAL HIDDEN    14 u3V.lto_priv.0
 13785: 00000000004516e0   110 FUNC    GLOBAL HIDDEN     2 _tap_in.lto_priv.0
 14094: 0000000000d3aeb0     4 OBJECT  GLOBAL HIDDEN    14 sag_w.lto_priv.0
 14100: 0000000000469af0   971 FUNC    GLOBAL HIDDEN     2 _lord_stop.lto_priv.0
 14120: 00000000004581f0   393 FUNC    GLOBAL HIDDEN     2 _cj_nail.lto_priv.0
 14178: 0000000000456a30   450 FUNC    GLOBAL HIDDEN     2 _cj_fine.lto_priv.0
 14732: 0000000000463eb0    82 FUNC    GLOBAL HIDDEN     2 _box_free.lto_priv.0
 14849: 000000000044a320   658 FUNC    GLOBAL HIDDEN     2 _n_find.lto_priv.0
 15325: 0000000000478dd0  1487 FUNC    GLOBAL HIDDEN     2 _pier_init.lto_priv.0
 15413: 00000000004674a0   939 FUNC    GLOBAL HIDDEN     2 _ca_willoc.lto_priv.0
 15841: 0000000000477300   394 FUNC    GLOBAL HIDDEN     2 _dawn_fail.lto_priv.0
 16128: 00000000004449d0  1746 FUNC    GLOBAL HIDDEN     2 _cr_sing.lto_priv.0
 16214: 0000000000474c40   500 FUNC    GLOBAL HIDDEN     2 _ttyf_loja.lto_priv.0
 16277: 000000000078fe48     8 OBJECT  GLOBAL HIDDEN     5 ver_hos_c.lto_priv.0
 16887: 0000000000443ab0   460 FUNC    GLOBAL HIDDEN     2 _n_bam.lto_priv.0
 17346: 000000000045cc90   540 FUNC    GLOBAL HIDDEN     2 _cj_minx.lto_priv.0
 17897: 0000000000d249e8     8 OBJECT  GLOBAL HIDDEN    14 _file_u.lto_priv.0
 18325: 00000000004447a0    58 FUNC    GLOBAL HIDDEN     2 _n_feb.lto_priv.0
 18517: 00000000004453e0  3981 FUNC    GLOBAL HIDDEN     2 _n_comp.lto_priv.0
 18739: 0000000000d249f6     1 OBJECT  GLOBAL HIDDEN    14 _ct_lop_o.lto_priv.0
 18880: 000000000044fda0  1650 FUNC    GLOBAL HIDDEN     2 _find_home.lto_priv.0
 19091: 0000000000474e40   500 FUNC    GLOBAL HIDDEN     2 _ttyf_hija.lto_priv.0
 19508: 0000000000450420  1393 FUNC    GLOBAL HIDDEN     2 _pave_home.lto_priv.0
 19595: 000000000042a3d0   687 FUNC    GLOBAL HIDDEN     2 _in_uni.lto_priv.0
 19822: 0000000000430e20  1095 FUNC    GLOBAL HIDDEN     2 _block_rip.lto_priv.0
 20236: 0000000000457030   331 FUNC    GLOBAL HIDDEN     2 _cj_axis.lto_priv.0
 20306: 0000000000457180   514 FUNC    GLOBAL HIDDEN     2 _cj_gust.lto_priv.0
 20866: 0000000000456c00   667 FUNC    GLOBAL HIDDEN     2 _cj_cast.lto_priv.0
 21317: 0000000000446fe0 11412 FUNC    GLOBAL HIDDEN     2 _n_burn.lto_priv.0
 21430: 000000000045c820  1130 FUNC    GLOBAL HIDDEN     2 _cj_spot.lto_priv.0

@fighet-parnet fighet-parnet requested a review from a team as a code owner June 22, 2023 22:49
Copy link
Member

@barter-simsum barter-simsum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no problem approving this for macos/clang since afaik, there are no issues with -flto and including debug symbols in the build (unlike with gcc). That - missing debug symbols in release - may be a reason not to accept this for linux, but I'm not sure. Would leave that decision up to @joemfb.

@fighet-parnet
Copy link
Contributor Author

fighet-parnet commented Jun 23, 2023

I'm not sure what you mean. Currently on linux, debug symbols seem to be intact, at least in my local builds. According to
https://hubicka.blogspot.com/2018/06/gcc-8-link-time-and-interprocedural.html
this would have been fixed in GCC 8? We're using GCC 9.

~/d/c/d/vere (lto) [2]> file bazel-bin/pkg/vere/urbit
bazel-bin/pkg/vere/urbit: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, with debug_info, not stripped
(gdb) disas worker
Dump of assembler code for function worker:
   0x00000000005a0a80 <+0>:	push   rbp
   0x00000000005a0a81 <+1>:	push   rbx
   0x00000000005a0a82 <+2>:	sub    rsp,0x8
   0x00000000005a0a86 <+6>:	call   0x780734 <sem_post>
   0x00000000005a0a8b <+11>:	test   eax,eax
   0x00000000005a0a8d <+13>:	jne    0x4013ca <worker.cold>
   0x00000000005a0a93 <+19>:	mov    edi,0xd0bea0
   0x00000000005a0a98 <+24>:	call   0x77fe35 <pthread_mutex_lock>
   0x00000000005a0a9d <+29>:	test   eax,eax

Unless I'm missing something?

@barter-simsum
Copy link
Member

I'm unsure whether this is a bug with gcc, gdb, or perhaps what is more likely - that I am just
doing something wrong.

Debug info is clearly intact on linux with the -flto option. That said, in gdb, anything that would
involve parsing DWARF file/line/column info seems completely broken. Other things like printing
local vars/args are also broken (i args and i lo return "No symbol table info available."). Some
things still work, like disassembly obviously, and commands that print global variables.

There are definitely some expected changes to the dwarf. For instance, the functions _find_home
and _pave_home. These no longer show up in the symbol table. Instead, I see
_find_home.lto_priv.0 and _pave_home.lto_priv.0.

readelf -s /home/d/src/urbit/vere2/bazel-bin/pkg/vere/urbit | grep lto_priv
 12907: 0000000000d21640   200 OBJECT  GLOBAL HIDDEN    14 u3_Signal.lto_priv.0
 13254: 0000000000d23860     8 OBJECT  GLOBAL HIDDEN    14 sec_u.lto_priv.0
 13324: 0000000000d21720    80 OBJECT  GLOBAL HIDDEN    14 u3V.lto_priv.0
 13677: 000000000045c880   110 FUNC    GLOBAL HIDDEN     2 _tap_in.lto_priv.0
 13979: 0000000000d33990     4 OBJECT  GLOBAL HIDDEN    14 sag_w.lto_priv.0
 13984: 000000000046b210   971 FUNC    GLOBAL HIDDEN     2 _lord_stop.lto_priv.0
 14003: 000000000043f2e0   393 FUNC    GLOBAL HIDDEN     2 _cj_nail.lto_priv.0
 14061: 000000000043f540   450 FUNC    GLOBAL HIDDEN     2 _cj_fine.lto_priv.0
 14620: 0000000000469240    82 FUNC    GLOBAL HIDDEN     2 _box_free.lto_priv.0
 14739: 0000000000452f00   658 FUNC    GLOBAL HIDDEN     2 _n_find.lto_priv.0
 15304: 0000000000469490   939 FUNC    GLOBAL HIDDEN     2 _ca_willoc.lto_priv.0
 15363: 00000000004216d0   356 FUNC    GLOBAL HIDDEN     2 _next.lto_priv.0
 15721: 0000000000421b10   200 FUNC    GLOBAL HIDDEN     2 _last.lto_priv.0
 15735: 0000000000471730   394 FUNC    GLOBAL HIDDEN     2 _dawn_fail.lto_priv.0
 16024: 000000000044d050  1746 FUNC    GLOBAL HIDDEN     2 _cr_sing.lto_priv.0
 16111: 000000000046fd60   500 FUNC    GLOBAL HIDDEN     2 _ttyf_loja.lto_priv.0
 16778: 00000000004482c0   460 FUNC    GLOBAL HIDDEN     2 _n_bam.lto_priv.0
 17236: 000000000043b6b0   540 FUNC    GLOBAL HIDDEN     2 _cj_minx.lto_priv.0
 17780: 0000000000d21618     8 OBJECT  GLOBAL HIDDEN    14 _file_u.lto_priv.0
 18216: 0000000000448280    58 FUNC    GLOBAL HIDDEN     2 _n_feb.lto_priv.0
 18411: 0000000000451920  3981 FUNC    GLOBAL HIDDEN     2 _n_comp.lto_priv.0
 18635: 0000000000d21626     1 OBJECT  GLOBAL HIDDEN    14 _ct_lop_o.lto_priv.0
 18776: 0000000000458120  1618 FUNC    GLOBAL HIDDEN     2 _find_home.lto_priv.0
 18989: 000000000046ff60   500 FUNC    GLOBAL HIDDEN     2 _ttyf_hija.lto_priv.0
 19408: 0000000000458780  1393 FUNC    GLOBAL HIDDEN     2 _pave_home.lto_priv.0
 19495: 000000000042cbb0   687 FUNC    GLOBAL HIDDEN     2 _in_uni.lto_priv.0
 19724: 00000000004324e0  1095 FUNC    GLOBAL HIDDEN     2 _block_rip.lto_priv.0
 19899: 0000000000471d50   195 FUNC    GLOBAL HIDDEN     2 _auto_link.lto_priv.0
 20138: 000000000043fab0   331 FUNC    GLOBAL HIDDEN     2 _cj_axis.lto_priv.0
 20208: 000000000043fc00   514 FUNC    GLOBAL HIDDEN     2 _cj_gust.lto_priv.0
 20770: 000000000043f710   667 FUNC    GLOBAL HIDDEN     2 _cj_cast.lto_priv.0
 21222: 000000000044dc00 11412 FUNC    GLOBAL HIDDEN     2 _n_burn.lto_priv.0
 21336: 000000000043fe10  1130 FUNC    GLOBAL HIDDEN     2 _cj_spot.lto_priv.0

If I look at the dwarf info in more detail at, let's say u3m_pave, nothing really seems out of place
to with/without LTO. Observe they both show the nuu_o param.


WITH LTO:

 <1><89f9f>: Abbrev Number: 44 (DW_TAG_subprogram)
    <89fa0>   DW_AT_external    : 1
    <89fa0>   DW_AT_name        : (indirect string, offset: 0xd22e): u3m_pave
    <89fa4>   DW_AT_decl_file   : 26
    <89fa5>   DW_AT_decl_line   : 648
    <89fa7>   DW_AT_decl_column : 1
    <89fa8>   DW_AT_prototyped  : 1
    <89fa8>   DW_AT_sibling     : <0x89fba>
 <2><89fac>: Abbrev Number: 45 (DW_TAG_formal_parameter)
    <89fad>   DW_AT_name        : (indirect string, offset: 0xd0db): nuu_o
    <89fb1>   DW_AT_decl_file   : 26
    <89fb2>   DW_AT_decl_line   : 648
    <89fb4>   DW_AT_decl_column : 15
    <89fb5>   DW_AT_type        : <0x8879f>


WITHOUT LTO:

 <1><fdf55>: Abbrev Number: 44 (DW_TAG_subprogram)
    <fdf56>   DW_AT_external    : 1
    <fdf56>   DW_AT_name        : (indirect string, offset: 0x10fac): u3m_pave
    <fdf5a>   DW_AT_decl_file   : 1
    <fdf5b>   DW_AT_decl_line   : 648
    <fdf5d>   DW_AT_decl_column : 1
    <fdf5e>   DW_AT_prototyped  : 1
    <fdf5e>   DW_AT_inline      : 1	(inlined)
    <fdf5f>   DW_AT_sibling     : <0xfdf71>
 <2><fdf63>: Abbrev Number: 45 (DW_TAG_formal_parameter)
    <fdf64>   DW_AT_name        : (indirect string, offset: 0x10df6): nuu_o
    <fdf68>   DW_AT_decl_file   : 1
    <fdf69>   DW_AT_decl_line   : 648
    <fdf6b>   DW_AT_decl_column : 15
    <fdf6c>   DW_AT_type        : <0xf8cac>


Additionally, observe that the file index (26 - manage.c:648) with lto is correct:

readelf --debug-dump=rawline /home/d/src/urbit/vere2/bazel-bin/pkg/vere/urbit
Raw dump of debug contents of section .debug_line:

...

The Directory Table (offset 0x6df0):
  1	pkg/c3
  2	pkg/noun
  3	/usr/local/x86_64-linux-musl/x86_64-linux-musl/include/bits
  4	/usr/local/x86_64-linux-musl/x86_64-linux-musl/include

 The File Name Table (offset 0x6e74):
  Entry	Dir	Time	Size	Name
  1	3	0	0	alltypes.h
  2	0	0	0	<built-in>
  3	4	0	0	unistd.h
  4	4	0	0	stdio.h
  5	1	0	0	types.h
  6	4	0	0	errno.h
  7	2	0	0	types.h
  8	2	0	0	options.h
  9	2	0	0	log.c

 No Line Number Statements.
  Offset:                      0x6edf
  Length:                      633
  DWARF Version:               4
  Prologue Length:             627
  Minimum Instruction Length:  1
  Maximum Ops per Instruction: 1
  Initial value of 'is_stmt':  1
  Line Base:                   -10
  Line Range:                  242
  Opcode Base:                 13

 Opcodes:
  Opcode 1 has 0 args
  Opcode 2 has 1 arg
  Opcode 3 has 1 arg
  Opcode 4 has 1 arg
  Opcode 5 has 1 arg
  Opcode 6 has 0 args
  Opcode 7 has 0 args
  Opcode 8 has 0 args
  Opcode 9 has 1 arg
  Opcode 10 has 0 args
  Opcode 11 has 0 args
  Opcode 12 has 1 arg

 The Directory Table (offset 0x6efb):
  1	bazel-out/k8-fastbuild/bin/external/gmp/gmp/include
  2	bazel-out/k8-fastbuild/bin/external/sigsegv/sigsegv/include
  3	pkg/c3
  4	pkg/noun
  5	/usr/local/x86_64-linux-musl/x86_64-linux-musl/include/bits
  6	/usr/local/x86_64-linux-musl/x86_64-linux-musl/include/sys
  7	/usr/local/x86_64-linux-musl/x86_64-linux-musl/include

 The File Name Table (offset 0x702a):
  Entry	Dir	Time	Size	Name
  1	5	0	0	alltypes.h
  2	0	0	0	<built-in>
  3	7	0	0	unistd.h
  4	5	0	0	setjmp.h
  5	7	0	0	setjmp.h
  6	7	0	0	stdio.h
  7	7	0	0	signal.h
  8	5	0	0	signal.h
  9	6	0	0	time.h
  10	6	0	0	resource.h
  11	3	0	0	types.h
  12	7	0	0	errno.h
  13	3	0	0	defs.h
  14	4	0	0	types.h
  15	4	0	0	version.h
  16	5	0	0	stat.h
  17	4	0	0	options.h
  18	4	0	0	allocate.h
  19	4	0	0	events.h
  20	1	0	0	gmp.h
  21	4	0	0	jets.h
  22	7	0	0	time.h
  23	4	0	0	trace.h
  24	4	0	0	vortex.h
  25	2	0	0	sigsegv.h
  26	4	0	0	manage.c

I really have no idea what's causing the bad behavior in gdb then. @fighet-parnet, any ideas? Can
you repro?

gdb version 13.1 fyi. Additionally, tried fixing the DWARF version with -gdwarf-5, or -gdwarf-3,
etc. But no dice. I did this because I noticed that when -flto is specified, gcc defaults to
dwarf4. While when unspecified, it was using dwarf3.

@barter-simsum
Copy link
Member

ha, figured it out after reviewing https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html. Will update later

bazel/toolchain/BUILD.bazel Outdated Show resolved Hide resolved
@barter-simsum
Copy link
Member

barter-simsum commented Jun 23, 2023

Yeah so following up on the gdb weirdness I experienced. Basically looks like we need a

build:linux --linkopt="-g"

in .bazelrc.

The gnu optimizations docs on -flto state " If any of the input files at link time were built with debug info generation enabled the link will enable debug info generation as well." So idk why this is necessary.

Further note this warning: " Any elaborate debug info settings like the dwarf level -gdwarf-5 need to be explicitly repeated at the linker command line and mixing different settings in different translation units is discouraged. "

It doesn't look like there's any problem with the --per_file_copt hack we're using in .bazelrc for faster builds. But I guess there's the potential to run into issues given the "mixing different settings in different translation units is discouraged."

Anyway, without the explicit linkopt="-g", source level debugging in gdb is broken. With it, all is good.

P.S.

look at this:

# NO linkopt NO flto:
# 9.8M
#
# NO linkopt YE flto:
# 8.1M
#
# YE linkopt YE flto:
# 11 M

@fighet-parnet I think you were surprised to find that enabling -flto with build --per_file_copt='pkg/.*@-flto' on linux actually decreased the size of the binary. Which I think is unexpected because more inlining tends to have the opposite effect. When you add build:linux --linkopt="-g" to .bazelrc, the binary is expectedly larger. So maybe the size decrease is explained by less debug info present.

@fighet-parnet
Copy link
Contributor Author

Okay, @barter-simsum what do you think about this appropriately grug-brained way of adding copts with quite a bit more finer-grained control than before?

@fighet-parnet
Copy link
Contributor Author

Also note that this enables LTO on linux, but only on our code so by my previous testing this should work fine (though I haven't built this on linux yet.)

@barter-simsum
Copy link
Member

barter-simsum commented Jun 27, 2023

@fighet-parnet fantastic. Looks like this solves the issue I ran into which was that by removing .bazelrc debug/release configurations and moving them to a dual cc_library, none of the copts specified with --per_file_copt were passed to vere libraries. I'll review a bit more thoroughly later and approve.

@barter-simsum
Copy link
Member

@fighet-parnet One thing I noticed is that the vere_library and vere_binary
macros should include on linux builds -g in the linkopts. Otherwise, we'll
probably break gdb source debugging capabilities like I previously experienced
when -flto is included (and elaborated on above).

This adds a macro "vere_library" which supports our concepts of debug and
release builds, and gives finer-grained control over which copts/linkopts are
passed and when.

Takes advantage of bazel's "compilation_mode={dbg,opt}" to control
debug/optimized builds.
@fighet-parnet
Copy link
Contributor Author

Ah yes, for some reason I thought -flto was the link-time parameter missing. Updated.

@barter-simsum
Copy link
Member

@fighet-parnet

Ok, I think the following changes would make a bit more sense.

In top-level BUILD.bazel:

1) Just define release and debug config_settings rather than thinlto, lto, and
debug.

-- side note, why precisely is this necessary? It seems like we're basically aliasing compilation_mode.

config_setting(
    name = "rel",
    values = {
        "compilation_mode": "opt"
    }
)

config_setting(
    name = "dbg",
    values = {
        "compilation_mode": "dbg"
    }
)

2) Do the macos/linux dispatch in bazel/common_settings.bzl instead of the
top-level BUILD.bazel

something like this:

def vere_library(...):
    ...
    "//:dbg": ...,
    "//:rel": ...
        "@platforms//os:linux": ["-flto"],
        "@platforms//os:macos": ["-flto=thin"]
    ...
    etc

@fighet-parnet
Copy link
Contributor Author

fighet-parnet commented Jun 29, 2023

https://bazel.build/docs/configurable-attributes#and-chaining
As seen here, we actually can't dispatch with AND in a select statement (rel AND mac), thanks to the infinite wisdom of Google, hence the need for the two configs per platform.

I could move the config_settings (or just the LTO ones) out of the root BUILD but that seemed like a good spot for them. We're aliasing the compilation_mode because there doesn't seem to be a method to dispatch on compilation_mode in a select statement.

@barter-simsum
Copy link
Member

@fighet-parnet

Ah, that's too bad. Didn't realize select statements had that limitation. And
since this is the only use case of a dispatch on platform + , I
don't want to add a dependency on "Bazel Skylib" for config_setting_group as
suggested by the docs you linked.

LGETM

@barter-simsum
Copy link
Member

Oh, one last thing. INSTALL.md needs to be updated with the new build commands.

s/--config=dbg/--compilation_mode=dbg

I'd update this myself, but it's on a branch you own

@barter-simsum barter-simsum mentioned this pull request Jul 3, 2023
@barter-simsum barter-simsum self-requested a review July 18, 2023 15:56
Copy link
Member

@barter-simsum barter-simsum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aed

@barter-simsum barter-simsum merged commit 50d602e into urbit:develop Jul 18, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants