GitHub - guoshzhao/antares: Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore and Intel-OneAPI backends.

What is Antares:

Antares is an automatic engine to generate multi-platform kernels with optimization for DNN developers (targeting to backends like CUDA/ROCm/CPU/DirectX12/Graphcore/OneAPI/..). It is also a framework for Hardware developers to extend new backends/hareware quickly and easily. Antares provides IR that follows "One Language Syntax for All Platforms", and general-purpose device access APIs that hide the differences of not only DNN description but also device mapping.

Features
- Backend Extension
- Effective Auto Tuning
- Einsum-based Antares IR
- Framework JIT Extension (Op Maker Plugin for Pytorch/Tensorflow/Tensorflow2)
How to Use Antares
- Senario-1: Quick Start for Developers that Use Antares to Tune Operator/Sub-graph in Foreground Terminal
- Senario-2: Quick Start for Developers that Use Antares to Extend Operator/Sub-graph in Pytorch/Tensorflow
Antares Pre-dependencies for Different Backends
- Linux-based: cuda, rocm, mcpu, scpu, gc, sycl_intel, sycl_cuda, ocl_amdgpu, ocl_nvidia, ..
- Windows-based: cuda_win64, rocm_win64, hlsl_win64, ..
About Microsft Open Source

About Antares Features:

a. Backend Extension

The current version of Antares supports code generation for the following backends (in orange blocks) and devices (in black blocks):

b. Effective Auto Tuning

Auto tuning by Antares contributes to not only much less tuning time, but also equivalent or better performance for Intra-op/Inter-op execution (against TVM Ansor).

c. Einsum-based Antares IR

Antares IR is the frontend of both kernel generation and automatic optimization.
The syntax of Antares IR is slim to describe most MLP/CNN/DNN/LSTM/Transformer based models like MNIST/ResNet/BERT/GPT/..

E.g. The following computation logic describes a layer of standard BERT transformer:

  merged_layer_local[R, B, S1, N1, H1] +=! input_tensor[B, S1, N, H] * qkv_weight[R, N, H, N1, H1];
  merged_layer_trans[R, B, N1, S1, H1] = merged_layer_local[R, B, S1, N1, H1] + qkv_bias[R, N1, H1];
  attention_scores[B, N1, S1, S2] +=! merged_layer_trans[0, B, N1, S1, H1] * merged_layer_trans[1, B, N1, S2, H1] / const({H}).cast(`float32`);
    softmax_1_temp0[B, N1] >=! attention_scores[B, N1, S1, S2];
    softmax_1_temp1[B, N1] +=! (attention_scores[B, N1, S1, S2] - softmax_1_temp0[B, N1]).call(`exp`);
  attention_probs[B, N1, S1, S2] = (attention_scores[B, N1, S1, S2] - softmax_1_temp0[B, N1]).call(`exp`) / softmax_1_temp1[B, N1];
  ... ...
  layer_norm_2_src[B, S1, N2, H2] = layer_output[B, S1, N2, H2] + attention_output_norm[B, S1, N2, H2];
    layer_norm_2_temp0[B, S1] += layer_norm_2_src[B, S1, N2, H2];
    layer_norm_2_temp1[B, S1] += layer_norm_2_src[B, S1, N2, H2] * layer_norm_2_src[B, S1, N2, H2];
  layer_output_norm[B, S1, N2, H2] = (layer_norm_2_src[B, S1, N2, H2] * {N * H} - layer_norm_2_temp0[B, S1]) * (layer_norm_2_temp0[B, S1] * {N * H} - layer_norm_2_temp1[B, S1] * layer_norm_2_temp1[B, S1]).call(`max`, [1e-8]).call(`rsqrt`);

For more IR usage or examples, please follow documentation here: Antares IR & Examples

d. Pytorch/Tensorflow/Tensorflow2 Op Maker (JIT Plugin)

Antares provides JIT plugin for Pytorch/Tensorflow/Tensorflow2 to help frameworks to easily extend new operators, e.g.:

# Tensorflow/Tensorflow2 Example:
op = antares.make_op(ir='dot_0[N, M] +=! data[N, K] * weight[K, M]', feed_dict={'data': x, 'weight': y}).emit()
result_1 = sess.run(op)
print('The custom result_1 is:\n%s' % result_1)
result_2 = sess.run(tf.add(op, op))
print('The custom result_2 is:\n%s' % result_2)  

# Pytorch Example:
custom_op = CustomOp(ir='dot_0[N, M] +=! data[N, K] * weight[K, M]', feed_dict={'data': x, 'weight': y}).to(device, dtype).emit()
result = custom_op()
print('The custom result is:', result)

For complete programs, please follow examples here: Antares Examples for Pytorch and Antares Examples for TF/TF2

How to Use Antares?

Senario-1: Quick Start for Developers that Use Antares to Tune Operator/Sub-graph in Foreground Terminal:

Step-1: Prepare Environment

sudo apt install docker.io
git clone https://github.com/microsoft/antares --branch v0.2.x
cd antares/

# To set the backend type to environment variable `BACKEND` to build the corresponding environment:
echo 'c-cuda' > backend.default

# Build the environment for this backend: (if this step failed, please go to "Pre-dependencies" section to check which "backend-related dependencies" are missing)
make

All valid backends are listed in directory antares/backends

Step-2: Tune a Specific Workload in Foreground (e.g. tuning an MNIST-inference using 1000 trials)

# Run the following command in bash:
COMMIT=force STEP=1000 COMPUTE_V1='- einstein_v2(input_dict={"data": {"dtype": "float32", "shape": [64, 784]}, "weight_0": {"dtype": "float32", "shape": [784, 512]}, "weight_1": {"dtype": "float32", "shape": [512, 512]}, "weight_2": {"dtype": "float32", "shape": [512, 10]}, "bias_0": {"dtype": "float32", "shape": [512]}, "bias_1": {"dtype": "float32", "shape": [512]}, "bias_2": {"dtype": "float32", "shape": [10]}}, extra_outputs=[], exprss="data_0[N, M] +=!  data[N, K] * weight_0[K, M];   data_1[N, K] =   (data_0[N, K] + bias_0[K]).call(`max`, [0.0]);   data_2[N, M] +=!  data_1[N, K] * weight_1[K, M];   data_3[N, K] =   (data_2[N, K] + bias_1[K]).call(`max`, [0.0]);   data_4[N, M] +=!  data_3[N, K] * weight_2[K, M];   data_5[N, K] =   (data_4[N, K] + bias_2[K]);")' make

Apart from detailed reporting logs during the tuning procedure, the best kernel record will be saved to directory antares/codehub. If you don't want to create/overwrite existing kernel record in codehub, environment variable COMMIT=force in the tuning command can be removed.

Senario-2: Quick Start for Developers that Use Antares to Extend Operator/Sub-graph in Pytorch/Tensorflow (only for CUDA & ROCm backend currently):

Step-1: Prepare Environment

You need to follow Step-1 from Senario-1 to finish environment preparation beforehand. This prevents many environmental issues when walking to the next step.
Step-2: Set up Background Codegen Service
```
make rest-server
```
By default, it listens on TCP port = 8880, and the purpose of this service is to avoid bringing heavy backend-related dependencies in Pytorch/Tensorflow, which helps JIT plugin to be light-weighted.

Step-3: Set up a corresponding TF/TF2/Pytorch version that matches your CUDA/ROCm driver version. (If you have installed TF/TF2/Pytorch, please just ignore this step)

Here we provide several prebuilt package sources that match different environment requirements:

  For Tensorflow 1.x & 2.x: Recommended Packages (tested in Ubuntu 20.04):
  #   Tensorflow-1 for NVIDIA CUDA 10.0:
  python3 -m pip install --upgrade pip && python3 -m pip install tensorflow-gpu==1.15.4
  #   Tensorflow-1 for NVIDIA CUDA 11.0:
  python3 -m pip install --upgrade pip && python3 -m pip install https://github.com/ghostplant/tensorflow-wheel-collections/releases/download/cuda-11/tensorflow_gpu-1.15.4_cuda11+nv-cp38-cp38-linux_x86_64.whl
  #   Tensorflow-1 for AMD ROCm 4.0:
  python3 -m pip install tensorflow-rocm==1.15.9

  #   Tensorflow-2 for NVIDIA CUDA 11.0:
  python3 -m pip install --upgrade pip && python3 -m pip install tensorflow-gpu==2.4.0
  #   Tensorflow-2 for AMD ROCm 4.0:
  python3 -m pip install tensorflow-rocm==2.4.0

  For Pytorch 1.x: Recommended Packages (tested in Ubuntu 20.04):
  #   Pytorch for NVIDIA CUDA 10.0:
  python3 -m pip install torch==1.5.0 torchvision==0.6.0 -f https://download.pytorch.org/whl/torch_stable.html
  #   Pytorch for NVIDIA CUDA 11.0:
  python3 -m pip install torch===1.7.1+cu110 torchvision===0.8.2+cu110 torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
  #   Pytorch for AMD ROCm 4.0:
  python3 -m pip install torch torchvision -f https://download.pytorch.org/whl/rocm4.0.1/torch_stable.html

Step-4: Install JIT Plugin Client and Run Examples

# Set up JIT Plugin for Pytorch:
sudo python3 ./frameworks/pytorch/setup.py

# Set up JIT Plugin for Tensorflow/Tensorflow2:
sudo python3 ./frameworks/tensorflow/setup.py

# Test Examples for Pytorch:
cd ./frameworks/pytorch/examples
./1_hello_world.py

# Test Examples for Tensorflow:
cd ./frameworks/tensorflow/examples
./1_hello_world.py

More examples here: Antares Examples for Pytorch and Antares Examples for TF/TF2

Antares Predependencies for Different Backends:

Before running make command in antares root directory, you need to ensure the corresponding backend driver is installed correctly.

Predependencies for backend c-cuda, c-sycl_cuda:

Requirement: Ubuntu >= 18.04

Requirement: Install NVIDIA CUDA toolkit (>= 10.0) on Host OS

Requirement: docker
Predependencies for backend c-ocl_nvidia:

Requirement: Ubuntu >= 18.04

Requirement: Install NVIDIA CUDA toolkit (>= 10.0) to Host OS

Requirement: run bash command "make install_host" in antares root directory beforehand
Predependencies for backend c-rocm, c-ocl_amdgpu:

Requirement: Ubuntu >= 18.04

Requirement: Install AMD ROCm (>= 4.0) package "rock-dkms" & "rock-dkms-firmware" from repo http://repo.radeon.com/rocm/apt/debian to Host OS

Requirement: docker
Predependencies for backend c-gc:

Requirement: Ubuntu >= 18.04

Requirement: Install Poplar SDK to Host OS, ensure "popc" command exists in system PATH

Requirement: run bash command "make install_host" in antares root directory beforehand
Predependencies for backend c-scpu, c-mcpu, c-sycl_intel:

Requirement: Ubuntu >= 18.04

Requirement: docker
Predependencies for backend c-hlsl_win64, c-hlsl_xbox:

Requirement: Windows 10 64 bit (>= 2004), run "dxdiag.exe" to ensure Direct3D 12.0 Accleration is enabled

Requirement: Windows Subsystem Linux 1.0 How to Install WSL 1.0

Requirement: GIT clones antares repo inside WSL environment, and the path of antares directory should be **visible to Windows**, (e.g. "/../c/Users/me/Desktop/antares" would be OK, but "/home/me/antares" won't).

Requirement: run bash command "make install_host" in antares root directory beforehand
Predependencies for backend c-rocm_win64:

Requirement: Windows 10 64 bit (>= 2004)

Requirement: Windows Subsystem Linux 1.0 How to Install WSL 1.0

Requirement: Install Official AMD GPU driver (release version >= 2020.11). Ensure C:\Windows\System32\amdhip64.dll exists after installation.

Requirement: GIT clones antares repo inside WSL environment, and the path of antares directory should be **visible to Windows**, (e.g. "/../c/Users/me/Desktop/antares" would be OK, but "/home/me/antares" won't).

Requirement: run bash command "make install_host" in antares root directory beforehand
Predependencies for backend c-cuda_win64:

Requirement: Windows 10 64 bit (>= 2004)

Requirement: Windows Subsystem Linux 1.0 How to Install WSL 1.0

Requirement: Install Official NVIDIA CUDA driver (>= 10.0). Ensure C:\Windows\System32\nvcuda.dll exists after installation.

Requirement: GIT clones antares repo inside WSL environment, and the path of antares directory should be **visible to Windows**, (e.g. "/../c/Users/me/Desktop/antares" would be OK, but "/home/me/antares" won't).

Requirement: run bash command "make install_host" in antares root directory beforehand

Current Support Table:

	HIP-C(c-rocm/c-rocm_win64)	CUDA(c-cuda/c-cuda_win64)	CPU(c-mcpu/c-scpu)	DirectX12(c-hlsl_win64)	Graphcore(c-gc)	Intel OneAPI(c-sycl_intel)	Codeplay DPCPP (c-sycl_cuda)
Deploy Environment	Linux/WSL1	Linux	Linux	WSL1	Linux	Linux
Target Device	AMDGPU	NVGPU	Generic CPU	Generic Graphic Card	IPU Device	Intel CPU/HD Graphic/FPGA	NVGPU
Global schedules	Y	Y	Y	Y	Y	Y	Y
Local schedules	Y	Y	Y	Y		Y	Y
Head fusion	Y	Y	Y	Y	Y	Y	Y
Tail fusion	Y	Y		Y			Y
Evaluator	Y	Y	Y	Y	Y	Y	Y
Tensorflow Plugin	Y	Y
Pytorch Plugin	Y	Y
Multi Kernel Eval	Y	Y	Y	Y		Y	Y

About Microsft Open Source

For more information about Microsoft Open Source Policy, please see Microsoft Open Source Code of Conduct

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
antares		antares
backends		backends
codehub		codehub
docker		docker
engine		engine
frameworks		frameworks
graph_evaluator		graph_evaluator
hardware		hardware
images		images
lang		lang
public		public
tuner		tuner
.dockerignore		.dockerignore
AntaresIR.md		AntaresIR.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.TXT		LICENSE.TXT
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Antares:

About Antares Features:

a. Backend Extension

b. Effective Auto Tuning

c. Einsum-based Antares IR

d. Pytorch/Tensorflow/Tensorflow2 Op Maker (JIT Plugin)

How to Use Antares?

Senario-1: Quick Start for Developers that Use Antares to Tune Operator/Sub-graph in Foreground Terminal:

Senario-2: Quick Start for Developers that Use Antares to Extend Operator/Sub-graph in Pytorch/Tensorflow (only for CUDA & ROCm backend currently):

Antares Predependencies for Different Backends:

Current Support Table:

About Microsft Open Source

About

Releases

Packages

Languages

License

guoshzhao/antares

Folders and files

Latest commit

History

Repository files navigation

What is Antares:

About Antares Features:

a. Backend Extension

b. Effective Auto Tuning

c. Einsum-based Antares IR

d. Pytorch/Tensorflow/Tensorflow2 Op Maker (JIT Plugin)

How to Use Antares?

Senario-1: Quick Start for Developers that Use Antares to Tune Operator/Sub-graph in Foreground Terminal:

Senario-2: Quick Start for Developers that Use Antares to Extend Operator/Sub-graph in Pytorch/Tensorflow (only for CUDA & ROCm backend currently):

Antares Predependencies for Different Backends:

Current Support Table:

About Microsft Open Source

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages