Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracker] All the issue related with e2e shark test suite #812

Open
pdhirajkumarprasad opened this issue Aug 27, 2024 · 4 comments
Open

Comments

@pdhirajkumarprasad
Copy link

pdhirajkumarprasad commented Aug 27, 2024

Full ONNX FE tracker is at: #564

Running model

In alt_e2e test suite:

setenv CACHE_DIR "some Path where model will be downloaded"

If building torch-mlir and iree from source:

source /path/to/iree-build/.env && export PYTHONPATH
export PYTHONPATH=/path/to/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir:/path/to/torch-mlir/test/python/fx_importer:$PYTHONPATH
export PATH=/path/to/iree-build/tools/:/path/to/torch-mlir/build/bin/:$PATH

python ./run.py --mode=cl-onnx-iree -v --torchtolinalg -t ModelName

For onnx/models/

critical issues

CPU

# device issue type issue no #model impacted list of model assignee status
1 CPU "onnx.Resize" failed to legalize operation 'torch.operator' that was explicitly marked illegal 599 11 modelList @aldesilv Needs bump + llvm/torch-mlir#3870
2 CPU One or more operations with large vector sizes (8192 bytes) were found 19058 4 @pashu123
3 CPU failed to legalize unresolved materialization from ('i64') to 'index' that remained live after conversion 18899 4 @zjgarvey
6 CPU "onnx.NonMaxSuppression" failed to legalize operation 'torch.operator' that was explicitly marked illegal 881 2 @jinchen62
7 CPU 'func.func' op exceeded stack allocation limit of 32768 bytes for function. Got 1048576 bytes 19027 2 modelList @pashu123
8 CPU 'tensor.reshape' op source and destination tensor should have the same number of elements 1 modelList @zjgarvey model now fails due to issue 2
9 CPU onnx.LSTM 1 modelList
10 CPU torch.aten.convolution 1-d grouped 1 modelList @AmosLewis
11 CPU 'tensor.dim' op unexpected during shape cleanup; dynamic dimensions must have been resolved prior to leaving the flow dialect 876 1 modelList
12 CPU failed to legalize operation onnx.NonZero 820 1 modelList @renxida @AmosLewis will message xida
13 CPU failed to legalize operation onnx.if 882 1 @AmosLewis
14 CPU boolean indexing ops: AtenNonzeroOp, AtenIndexTensorOp, AtenMaskedSelectOp 3293 @renxida
15 CPU Add TorchToLinalg lowering for MaxUnpool operation 718 @jinchen62
16 CPU Fix Onnx.DFT Torch->Linalg lowering 800 @PhaneeshB

import and setup failures

# device issue type issue no #model impacted list of model assignee status
3 N/A OOM during ORT #862 3 model list
4 N/A OOM import, missing dim_params, ORT PASS #860 #861 21 model list
5 N/A Unable to update opset ver due to BatchNormalization, ORT PASS #859 5 model list
6 N/A Unable to update opset ver due to BN, OOM import, ORT PASS #859 #861 1 model list
7 N/A duplicate metadata_prop keys, ORT PASS #863 1 model list
8 N/A OOM import, ORT PASS #861 25 model list

iree-compile

IREE project tracker: https://github.com/orgs/iree-org/projects/8/views/3

# device issue type issue no #model impacted list of model assignee Status
3 GPU func.func' op uses 401920 bytes of shared memory; exceeded the limit of 65536 bytes 18603 100+

iree runtime

# device issue type issue no #model impacted list of model assignee Status
1 CPU Abort 18741 515+ modelList

numerics

# device issue type issue no #model impacted list of model assignee
1 CPU numeric need_to_analyze 101 modleList
2 [numerics]: element at index 0 (0.332534) does not match the expected (0.308342); for LSTM ops 2 18441

IREE EP only issues

iree-compile fails with ElementsAttr does not provide iteration facilities for type 'mlir::Attribute' on int8 models at QuantizeLinear op

low priority

issue no 828 Turbine Camp
Issue no 797 Ops not in model

@nod-ai nod-ai deleted a comment Aug 27, 2024
@nod-ai nod-ai deleted a comment from yiweifengyan Aug 27, 2024
@zjgarvey
Copy link
Collaborator

Can you update the model List links?

@jinchen62
Copy link
Contributor

Could you also attach the issue links you referred to so we would know if we cover all model paths. Also it seems not including #801 right?

@pdhirajkumarprasad
Copy link
Author

@zjgarvey the model list contain the updated link only.

@jinchen62 Yes, so far the report is based on onnx model of e2e shark test suite

@jinchen62
Copy link
Contributor

jinchen62 commented Aug 29, 2024

@pdhirajkumarprasad I think it would be helpful to attach more details of the error message.

I feel like the onnx.Transpose one in onnx to torch is the shape inference issue that I was dealing with. I fixed it by setting opset version to 21 with locally built torch-mlir in shark testsuite llvm/torch-mlir#3593. @zjgarvey I realized that this seems not working for the CI job, right? Any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants