Skip to content

Releases: ROCm/MISA

MISA v2022.03.02

02 Mar 05:52
Compare
Choose a tag to compare

This release is mainly for navi21 NCHWc kernels.

  • support navi21 NCHWc kernel
  • support fp16x8, fp16x4, int8x16, int8x8, int8x4, int4x32, int4x16, int4x8
  • filter layout support both KCYXc and CYXKc
  • support tile based conv in NCHWc kernel
  • change name from iGEMMgen to MISA

generator v2021.10.17

17 Oct 13:12
92dd200
Compare
Choose a tag to compare
  • support gfx90a NHWC fp16 alt implementation
  • support gfx90a NHWC bf16

generator v2021.07.22

22 Jul 10:05
17c305c
Compare
Choose a tag to compare
  • support support NHWC fp32/fp16 fwd/bwd/wrw xdlops
  • support gfx90a NHWC fp32/fp16 fwd/bwd/wrw
  • reorganize pythono code structure

generator v2021.05.25

25 May 14:12
f9937a2
Compare
Choose a tag to compare

This generator is mainly for a feature named "global memory access pattern", or gmap for short.

As the name suggested, gmap is used to dump the memory access pattern of input/weight/output tensor, for organized for each individual block, and for each individual read/write request.

This feature is controlled by an environment variable IGEMM_DUMP_GMAP, example to use this feature:

python3 igemm_codegen.py config/igemm_bwd_gtc_gfx908_nhwc_fp16.config
cd out/
IGEMM_DUMP_GMAP=1 ./conv_driver.exe    convfp16 -n 2 -c 1024 -H 40 -W 52 -k 512 -y 1 -x 1 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -t 1 -F 2  --in_layout NHWC --fil_layout NHWC --out_layout NHWC

Currently only support NHWC fwd/bwd, fp32/fp16. More layout precision support is to be added.

generator v2021.05.18

18 May 07:48
Compare
Choose a tag to compare

New Features

  • support NCHW fp32 fwd/bwd/wrw xdlops
  • support NHWC fp32 fwd/bwd xdlops
  • support NHWC fp16 fwd/bwd xdlops

generator v0.5.0

31 Dec 06:48
b9d3a22
Compare
Choose a tag to compare

New Features

  • support fp32 NCHW on gfx908, fwd/bwd/wrw direction.
  • support group conv in three direction.
  • support generate single kernel based on input config.
  • support auto gen based on sequence mode in config file
  • support gpu reference kernel as verification backend. speedup a lot compared to cpu verification.
  • fwd/bwd use magic number for integer division