llvm.sdiv results in ld.lld error: undefined symbol __divdi3 #261

xlinsist · 2023-07-15T17:27:18Z

There is a conv_2d_nchw_fchw im2col implementation(conv2d-nchw-fchw-im2col.mlir) containing llvm.sdiv after lowering, which will cause __divdi3 -related errors as follows:

bash-5.2$ time mill -i 'tests.run[conv2d-nchw-fchw-im2col.mlir].run'
[7/438] tests.cases.buddy[conv2d-nchw-fchw-im2col.mlir].elf 
ld.lld: error: undefined symbol: __divdi3
>>> referenced by LLVMDialectModule
>>>               /tmp/nix-shell.gop5Y7/conv2d-nchw-fchw-im2col-d7bfd8.o:(test)
>>> referenced by LLVMDialectModule
>>>               /tmp/nix-shell.gop5Y7/conv2d-nchw-fchw-im2col-d7bfd8.o:(test)
>>> referenced by LLVMDialectModule
>>>               /tmp/nix-shell.gop5Y7/conv2d-nchw-fchw-im2col-d7bfd8.o:(test)
>>> referenced 1 more times

ld.lld: error: undefined symbol: __moddi3
>>> referenced by LLVMDialectModule
>>>               /tmp/nix-shell.gop5Y7/conv2d-nchw-fchw-im2col-d7bfd8.o:(test)
clang: error: ld.lld command failed with exit code 1 (use -v to see invocation)
1 targets failed
tests.cases.buddy[conv2d-nchw-fchw-im2col.mlir].elf os.SubprocessException: CommandResult 1

    os.proc.call(ProcessOps.scala:85)
    ammonite.$file.build$tests$cases$Case.$anonfun$elf$5(build.sc:330)
    mill.define.Task$TraverseCtx.evaluate(Task.scala:380)

Are there any suggestions to fix or bypass it?

conv2d-nchw-fchw-im2col.mlir is implemented as follows:

// BUDDY-OPT
// --convert-vector-to-scf
// --convert-scf-to-cf
// --expand-strided-metadata
// --lower-affine
// --convert-vector-to-llvm
// --memref-expand
// --arith-expand
// --convert-arith-to-llvm
// --finalize-memref-to-llvm
// --convert-math-to-llvm
// --convert-func-to-llvm
// --reconcile-unrealized-casts
// BUDDY-OPT-END

#map = affine_map<(d0) -> (d0 floordiv 9)>
#map1 = affine_map<(d0, d1) -> (d0 floordiv 56 + (d1 mod 9) floordiv 3)>
#map2 = affine_map<(d0, d1) -> (d0 + d1 - (d0 floordiv 56) * 56 - (d1 floordiv 3) * 3)>

memref.global "private" @gv_input_i32 : memref<1x64x58x58xi32>
memref.global "private" @gv_kernel_i32 : memref<64x64x3x3xi32>
memref.global "private" @gv_output_i32 : memref<1x64x56x56xi32>
memref.global "private" @gv_input_collapse_i32 : memref<1x576x3136xi32>

func.func @test() -> i32 {
  
  %input1 = memref.get_global @gv_input_i32 : memref<1x64x58x58xi32>
  %input2 = memref.get_global @gv_kernel_i32 : memref<64x64x3x3xi32>
  %output = memref.get_global @gv_output_i32 : memref<1x64x56x56xi32>
  %input_collapse = memref.get_global @gv_input_collapse_i32 : memref<1x576x3136xi32>
  
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %c576 = arith.constant 576 : index // 576 = 64 * 3 * 3 = kernel's f*h*w
  %c3136 = arith.constant 3136 : index // 3136 = 56 * 56 = output's h*w
  %c64 = arith.constant 64 : index
  %kernel_collapse = memref.collapse_shape %input2 [[0], [1, 2, 3]] : memref<64x64x3x3xi32> into memref<64x576xi32>
  %output_collapse = memref.collapse_shape %output [[0], [1], [2, 3]] : memref<1x64x56x56xi32> into memref<1x64x3136xi32>

  scf.for %idx0 = %c0 to %c1 step %c1 {
    scf.for %idx1 = %c0 to %c576 step %c1 {
      scf.for %idx2 = %c0 to %c3136 step %c1 {
        %0 = affine.apply #map(%idx1)
        %1 = affine.apply #map1(%idx2, %idx1)
        %2 = affine.apply #map2(%idx2, %idx1)
        %3 = memref.load %input1[%idx0, %0, %1, %2] : memref<1x64x58x58xi32>
        memref.store %3, %input_collapse[%idx0, %idx1, %idx2] : memref<1x576x3136xi32>
      }
    }
  }

  scf.for %idx0 = %c0 to %c1 step %c1 {
    scf.for %idx1 = %c0 to %c64 step %c1 {
      scf.for %idx2 = %c0 to %c3136 step %c1 {
        scf.for %idx3 = %c0 to %c576 step %c1 {
          %0 = memref.load %kernel_collapse[%idx1, %idx3] : memref<64x576xi32>
          %1 = memref.load %input_collapse[%idx0, %idx3, %idx2] : memref<1x576x3136xi32>
          %2 = memref.load %output_collapse[%idx0, %idx1, %idx2] : memref<1x64x3136xi32>
          %3 = arith.muli %0, %1 : i32
          %4 = arith.addi %3, %2 : i32
          memref.store %4, %output_collapse[%idx0, %idx1, %idx2] : memref<1x64x3136xi32>
        }
      }
    }
  }

  %result_mem = memref.expand_shape %output_collapse [[0], [1], [2, 3]] : memref<1x64x3136xi32> into memref<1x64x56x56xi32>
  %result = vector.load %result_mem[%c0, %c0, %c0, %c0] : memref<1x64x56x56xi32>, vector<8xi32>

  %mask_res = arith.constant dense<1> : vector<8xi1>
  %c1_i32 = arith.constant 1 : i32
  %evl = arith.constant 8 : i32
  %res_reduce_add_mask_driven = "llvm.intr.vp.reduce.add" (%c1_i32, %result, %mask_res, %evl) :
        (i32, vector<8xi32>, vector<8xi1>, i32) -> i32

  return %res_reduce_add_mask_driven : i32
}

The text was updated successfully, but these errors were encountered:

xlinsist · 2023-07-21T11:09:33Z

Further observation shows that the generated sdiv operation in this demo is of the i64 type, because the array index in MLIR is represented by index and will automatically be transformed to i64 when lowering. Since vector repo only supports i32 for now, this implementations requires specific modifications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llvm.sdiv results in ld.lld error: undefined symbol __divdi3 #261

llvm.sdiv results in ld.lld error: undefined symbol __divdi3 #261

xlinsist commented Jul 15, 2023

xlinsist commented Jul 21, 2023

llvm.sdiv results in ld.lld error: undefined symbol __divdi3 #261

llvm.sdiv results in ld.lld error: undefined symbol __divdi3 #261

Comments

xlinsist commented Jul 15, 2023

xlinsist commented Jul 21, 2023