Add quantized_maxpool_2d for xpu #1049

gaopengff · 2024-11-06T01:36:01Z

Now we only support datatype of uint8(Byte). Referring the stock pytorch cpu implementation at code.
Waiting #921 to be merged.

gaopengff · 2024-11-08T01:20:59Z

The specific unit test cases have passed in latest CI.

2024-11-07T09:10:50.0722836Z quantization/core/test_quantized_op_xpu.py::TestQuantizedOpsXPU::test_max_pool2d_nhwc_xpu PASSED [ 48%]
2024-11-07T09:10:50.1551494Z quantization/core/test_quantized_op_xpu.py::TestQuantizedOpsXPU::test_max_pool2d_pt2e_xpu PASSED [ 50%]
2024-11-07T09:10:50.5847030Z quantization/core/test_quantized_op_xpu.py::TestQuantizedOpsXPU::test_max_pool2d_xpu PASSED [ 51%]

src/ATen/native/quantized/QuantizedMaxPool2d.cpp

ZhiweiYan-96 · 2024-11-13T14:38:04Z

src/ATen/native/quantized/sycl/QuantizedMaxPool2d.cpp

+          w_start += dW_;
+
+        // Stock pytorch's cpu implementation use vectorized instructions
+        // through channels such as AVX-512. We use for-loop directly.


just for confirmation, the work-item indexing is optimized for cl right? I mean, tensor is cl, and the inner-most dim of work-item is on channels

It's not optimized for cl. The innter-most dim is nbatch like DilatedMaxPool2d implementation. On our gpu we do not have vectorized method, so I use unrolled loop to simulate it. If compiler compiled and optimized the loop to vectorized codes, it will help. I just followed the stock's implementation.

Great thanks for your explanations. I am OK that we can keep this implementation. Functionality is preferred now.

hi, @EikanWang FYI, if perf is also important for us currently, could we add further optimization after this PR?

ZhiweiYan-96

Please use TORCH_CHECK before launch kernels for dtype check.

Add quantized_maxpool_2d for xpu

151ddcd

gaopengff added the xpu-op label Nov 6, 2024

add ut

6495d8e

gaopengff changed the title ~~Add quantized_maxpool_2d for xpu~~ [WIP]Add quantized_maxpool_2d for xpu Nov 6, 2024

gaopengff added 4 commits November 7, 2024 09:37

skip unsupported ut

0281818

resolve conflict

b138ce0

change file path

3639903

change include path

969653b

gaopengff changed the title ~~[WIP]Add quantized_maxpool_2d for xpu~~ Add quantized_maxpool_2d for xpu Nov 8, 2024

gaopengff requested a review from ZhiweiYan-96 November 8, 2024 01:39

Merge branch 'main' into gaopengf/add_quantized_maxpool_2d

0f6f0e3

ZhiweiYan-96 reviewed Nov 13, 2024

View reviewed changes

src/ATen/native/quantized/QuantizedMaxPool2d.cpp Show resolved Hide resolved

ZhiweiYan-96 reviewed Nov 13, 2024

View reviewed changes

ZhiweiYan-96 requested changes Nov 13, 2024

View reviewed changes

ZhiweiYan-96 requested review from EikanWang, fengyuan14, xytintel and CuiYifeng November 13, 2024 15:11

gaopengff added 2 commits November 14, 2024 10:49

Merge branch 'main' into gaopengf/add_quantized_maxpool_2d

6d04d19

use TORCH_CHECK before launching kernel

9435541

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add quantized_maxpool_2d for xpu #1049

Add quantized_maxpool_2d for xpu #1049

gaopengff commented Nov 6, 2024 •

edited

Loading

gaopengff commented Nov 8, 2024

ZhiweiYan-96 Nov 13, 2024

gaopengff Nov 14, 2024

ZhiweiYan-96 Nov 14, 2024 •

edited

Loading

ZhiweiYan-96 left a comment

Add quantized_maxpool_2d for xpu #1049

Are you sure you want to change the base?

Add quantized_maxpool_2d for xpu #1049

Conversation

gaopengff commented Nov 6, 2024 • edited Loading

gaopengff commented Nov 8, 2024

ZhiweiYan-96 Nov 13, 2024

Choose a reason for hiding this comment

gaopengff Nov 14, 2024

Choose a reason for hiding this comment

ZhiweiYan-96 Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

ZhiweiYan-96 left a comment

Choose a reason for hiding this comment

gaopengff commented Nov 6, 2024 •

edited

Loading

ZhiweiYan-96 Nov 14, 2024 •

edited

Loading