Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm.sdiv results in ld.lld error: undefined symbol __divdi3 #261

Open
xlinsist opened this issue Jul 15, 2023 · 1 comment
Open

llvm.sdiv results in ld.lld error: undefined symbol __divdi3 #261

xlinsist opened this issue Jul 15, 2023 · 1 comment

Comments

@xlinsist
Copy link
Contributor

There is a conv_2d_nchw_fchw im2col implementation(conv2d-nchw-fchw-im2col.mlir) containing llvm.sdiv after lowering, which will cause __divdi3 -related errors as follows:

bash-5.2$ time mill -i 'tests.run[conv2d-nchw-fchw-im2col.mlir].run'
[7/438] tests.cases.buddy[conv2d-nchw-fchw-im2col.mlir].elf 
ld.lld: error: undefined symbol: __divdi3
>>> referenced by LLVMDialectModule
>>>               /tmp/nix-shell.gop5Y7/conv2d-nchw-fchw-im2col-d7bfd8.o:(test)
>>> referenced by LLVMDialectModule
>>>               /tmp/nix-shell.gop5Y7/conv2d-nchw-fchw-im2col-d7bfd8.o:(test)
>>> referenced by LLVMDialectModule
>>>               /tmp/nix-shell.gop5Y7/conv2d-nchw-fchw-im2col-d7bfd8.o:(test)
>>> referenced 1 more times

ld.lld: error: undefined symbol: __moddi3
>>> referenced by LLVMDialectModule
>>>               /tmp/nix-shell.gop5Y7/conv2d-nchw-fchw-im2col-d7bfd8.o:(test)
clang: error: ld.lld command failed with exit code 1 (use -v to see invocation)
1 targets failed
tests.cases.buddy[conv2d-nchw-fchw-im2col.mlir].elf os.SubprocessException: CommandResult 1

    os.proc.call(ProcessOps.scala:85)
    ammonite.$file.build$tests$cases$Case.$anonfun$elf$5(build.sc:330)
    mill.define.Task$TraverseCtx.evaluate(Task.scala:380)

Are there any suggestions to fix or bypass it?

conv2d-nchw-fchw-im2col.mlir is implemented as follows:

// BUDDY-OPT
// --convert-vector-to-scf
// --convert-scf-to-cf
// --expand-strided-metadata
// --lower-affine
// --convert-vector-to-llvm
// --memref-expand
// --arith-expand
// --convert-arith-to-llvm
// --finalize-memref-to-llvm
// --convert-math-to-llvm
// --convert-func-to-llvm
// --reconcile-unrealized-casts
// BUDDY-OPT-END

#map = affine_map<(d0) -> (d0 floordiv 9)>
#map1 = affine_map<(d0, d1) -> (d0 floordiv 56 + (d1 mod 9) floordiv 3)>
#map2 = affine_map<(d0, d1) -> (d0 + d1 - (d0 floordiv 56) * 56 - (d1 floordiv 3) * 3)>

memref.global "private" @gv_input_i32 : memref<1x64x58x58xi32>
memref.global "private" @gv_kernel_i32 : memref<64x64x3x3xi32>
memref.global "private" @gv_output_i32 : memref<1x64x56x56xi32>
memref.global "private" @gv_input_collapse_i32 : memref<1x576x3136xi32>

func.func @test() -> i32 {
  
  %input1 = memref.get_global @gv_input_i32 : memref<1x64x58x58xi32>
  %input2 = memref.get_global @gv_kernel_i32 : memref<64x64x3x3xi32>
  %output = memref.get_global @gv_output_i32 : memref<1x64x56x56xi32>
  %input_collapse = memref.get_global @gv_input_collapse_i32 : memref<1x576x3136xi32>
  
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %c576 = arith.constant 576 : index // 576 = 64 * 3 * 3 = kernel's f*h*w
  %c3136 = arith.constant 3136 : index // 3136 = 56 * 56 = output's h*w
  %c64 = arith.constant 64 : index
  %kernel_collapse = memref.collapse_shape %input2 [[0], [1, 2, 3]] : memref<64x64x3x3xi32> into memref<64x576xi32>
  %output_collapse = memref.collapse_shape %output [[0], [1], [2, 3]] : memref<1x64x56x56xi32> into memref<1x64x3136xi32>

  scf.for %idx0 = %c0 to %c1 step %c1 {
    scf.for %idx1 = %c0 to %c576 step %c1 {
      scf.for %idx2 = %c0 to %c3136 step %c1 {
        %0 = affine.apply #map(%idx1)
        %1 = affine.apply #map1(%idx2, %idx1)
        %2 = affine.apply #map2(%idx2, %idx1)
        %3 = memref.load %input1[%idx0, %0, %1, %2] : memref<1x64x58x58xi32>
        memref.store %3, %input_collapse[%idx0, %idx1, %idx2] : memref<1x576x3136xi32>
      }
    }
  }

  scf.for %idx0 = %c0 to %c1 step %c1 {
    scf.for %idx1 = %c0 to %c64 step %c1 {
      scf.for %idx2 = %c0 to %c3136 step %c1 {
        scf.for %idx3 = %c0 to %c576 step %c1 {
          %0 = memref.load %kernel_collapse[%idx1, %idx3] : memref<64x576xi32>
          %1 = memref.load %input_collapse[%idx0, %idx3, %idx2] : memref<1x576x3136xi32>
          %2 = memref.load %output_collapse[%idx0, %idx1, %idx2] : memref<1x64x3136xi32>
          %3 = arith.muli %0, %1 : i32
          %4 = arith.addi %3, %2 : i32
          memref.store %4, %output_collapse[%idx0, %idx1, %idx2] : memref<1x64x3136xi32>
        }
      }
    }
  }

  %result_mem = memref.expand_shape %output_collapse [[0], [1], [2, 3]] : memref<1x64x3136xi32> into memref<1x64x56x56xi32>
  %result = vector.load %result_mem[%c0, %c0, %c0, %c0] : memref<1x64x56x56xi32>, vector<8xi32>

  %mask_res = arith.constant dense<1> : vector<8xi1>
  %c1_i32 = arith.constant 1 : i32
  %evl = arith.constant 8 : i32
  %res_reduce_add_mask_driven = "llvm.intr.vp.reduce.add" (%c1_i32, %result, %mask_res, %evl) :
        (i32, vector<8xi32>, vector<8xi1>, i32) -> i32

  return %res_reduce_add_mask_driven : i32
}

@xlinsist
Copy link
Contributor Author

Further observation shows that the generated sdiv operation in this demo is of the i64 type, because the array index in MLIR is represented by index and will automatically be transformed to i64 when lowering. Since vector repo only supports i32 for now, this implementations requires specific modifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant