Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: use the LLVM Attributor to move allocations to the stack #4582

Open
eliasnaur opened this issue Nov 4, 2024 · 0 comments
Open

Comments

@eliasnaur
Copy link
Contributor

eliasnaur commented Nov 4, 2024

As suggested by @aykevl, the LLVM Attributor's HeapToStack optimization pass potentially replaces TinyGo's limited ability to prove allocations safe to move to the stack.

I've played with the attributor in LLVM 18.1 and so far have limited luck. First and foremost, merely replacing the function-attrs pass with the attributor pass,

diff --git a/transform/optimizer.go b/transform/optimizer.go
index 54f9762b..f600e1e5 100644
--- a/transform/optimizer.go
+++ b/transform/optimizer.go
@@ -53,7 +51,7 @@ func Optimize(mod llvm.Module, config *compileopts.Config) []error {
                // Run some preparatory passes for the Go optimizer.
                po := llvm.NewPassBuilderOptions()
                defer po.Dispose()
-               optPasses := "globaldce,globalopt,ipsccp,instcombine<no-verify-fixpoint>,adce,function-attrs"
+               optPasses := "globaldce,globalopt,ipsccp,instcombine<no-verify-fixpoint>,adce,attributor"
                if llvmutil.Version() < 18 {
                        // LLVM 17 doesn't have the no-verify-fixpoint flag.
                        optPasses = "globaldce,globalopt,ipsccp,instcombine,adce,function-attrs"

results in significant slowdown in compilation of a medium sized test program, from a few seconds to almost 2 minutes. I don't know whether the attributor can be run incrementally to amortize the time cost. There is also the much faster attributor-light pass, but I haven't seen it eliminate any allocations, not even for LLVM's own heap_to_stack.ll tests. I suspect the H2S pass doesn't work or is disabled in attributor-light.

Then, I failed to eliminate allocations for "append" style functions (append, strconv.Append* etc.). For example:

declare dso_local noalias nonnull ptr @runtime.alloc(i64 %size, ptr %layout) local_unnamed_addr allockind("alloc,zeroed") allocsize(0)

declare fastcc void @runtime.printuint8(i8 %n)

declare fastcc void @runtime.memmove(ptr nocapture nofree %dst, ptr nocapture nofree %src) 

define internal fastcc { ptr, i64, i64 } @runtime.sliceGrow(ptr %oldBuf) {
entry:
  %0 = insertvalue { ptr, i64, i64 } zeroinitializer, ptr %oldBuf, 0
  ret { ptr, i64, i64 } %0
}

define fastcc void @escaping_slice() unnamed_addr {
entry:
  %makeslice = call ptr @runtime.alloc(i64 noundef 20, ptr inttoptr (i64 3 to ptr))

  %elemsBuf.priv = alloca i8, align 1

  %1 = call fastcc { ptr, i64, i64 } @runtime.sliceGrow(ptr %makeslice)

  %2 = extractvalue { ptr, i64, i64 } %1, 0
  call fastcc void @runtime.memmove(ptr %2, ptr %elemsBuf.priv)

  ; inlined and simplified print.
  %4 = load i8, ptr %makeslice, align 1
  call fastcc void @runtime.printuint8(i8 %4)
  ret void
}

define fastcc void @stack_slice() unnamed_addr {
entry:
  %makeslice = call ptr @runtime.alloc(i64 noundef 20, ptr inttoptr (i64 3 to ptr))

  %elemsBuf.priv = alloca i8, align 1

  ; This is runtime.sliceGrow inlined, the only difference to escaping_slice.
  %1 = insertvalue { ptr, i64, i64 } zeroinitializer, ptr %makeslice, 0

  %2 = extractvalue { ptr, i64, i64 } %1, 0
  call fastcc void @runtime.memmove(ptr %2, ptr %elemsBuf.priv)

  ; inlined and simplified print.
  %4 = load i8, ptr %makeslice, align 1
  call fastcc void @runtime.printuint8(i8 %4)
  ret void
}

is a reduced form of the translation of the program

package main

func main() {
	buf := make([]byte, 20)
	buf = append(buf[:0], 'g')
	for _, c := range buf {
		println(c)
	}
}

The result of running the same optimization passes that TinyGo runs:

$ opt -S -passes='globaldce,globalopt,ipsccp,instcombine<no-verify-fixpoint>,attributor' ~/Downloads/alloca.ll
; ModuleID = '/Users/a/Downloads/alloca.ll'
source_filename = "/Users/a/Downloads/alloca.ll"

; Function Attrs: allockind("alloc,zeroed") allocsize(0)
declare dso_local noalias nonnull ptr @runtime.alloc(i64, ptr) local_unnamed_addr #0

declare fastcc void @runtime.printuint8(i8) local_unnamed_addr

declare fastcc void @runtime.memmove(ptr nocapture nofree, ptr nocapture nofree) local_unnamed_addr

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
define internal fastcc { ptr, i64, i64 } @runtime.sliceGrow(ptr noalias nofree noundef nonnull readnone dereferenceable(20) %oldBuf) unnamed_addr #1 {
entry:
  %0 = insertvalue { ptr, i64, i64 } zeroinitializer, ptr %oldBuf, 0
  ret { ptr, i64, i64 } %0
}

define fastcc void @escaping_slice() unnamed_addr {
entry:
  %makeslice = call nonnull dereferenceable(20) ptr @runtime.alloc(i64 noundef 20, ptr noundef nonnull inttoptr (i64 3 to ptr))
  %elemsBuf.priv = alloca i8, align 1
  %0 = call fastcc { ptr, i64, i64 } @runtime.sliceGrow(ptr noalias nofree noundef nonnull readnone dereferenceable(20) %makeslice) #3
  %1 = extractvalue { ptr, i64, i64 } %0, 0
  call fastcc void @runtime.memmove(ptr %1, ptr nocapture noundef nonnull %elemsBuf.priv)
  %2 = load i8, ptr %makeslice, align 1
  call fastcc void @runtime.printuint8(i8 %2)
  ret void
}

define fastcc void @stack_slice() unnamed_addr {
entry:
  %makeslice.h2s = alloca i8, i64 20, align 1
  call void @llvm.memset.p0.i64(ptr %makeslice.h2s, i8 0, i64 20, i1 false)
  %elemsBuf.priv = alloca i8, align 1
  call fastcc void @runtime.memmove(ptr nocapture nofree noundef nonnull %makeslice.h2s, ptr nocapture noundef nonnull %elemsBuf.priv)
  %0 = load i8, ptr %makeslice.h2s, align 1
  call fastcc void @runtime.printuint8(i8 %0)
  ret void
}

; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #2

attributes #0 = { allockind("alloc,zeroed") allocsize(0) }
attributes #1 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) }
attributes #2 = { nocallback nofree nounwind willreturn memory(argmem: write) }
attributes #3 = { nounwind memory(none) }

Note how stack_slice has its allocation eliminated, whereas escaping_slice hasn't. The only difference between the functions is the call to sliceGrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant