Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctest test_concurrency intermittent error on AArch64 with gcc #1690

Open
AmyWignall-arm opened this issue Jul 21, 2023 · 1 comment
Open
Labels
bug A confirmed library bug help wanted platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64

Comments

@AmyWignall-arm
Copy link
Contributor

Summary

The ctest test_concurrency fails intermittently with error:

43: Test command: /home/amywig01/oneDNN/naclBuild/tests/gtests/test_concurrency
43: Test timeout computed to be: 10000000
43: Note: Google Test filter = *:-*_GPU*
43: [==========] Running 1 test from 1 test suite.
43: [----------] Global test environment set-up.
43: [----------] 1 test from test_concurrency_t
43: [ RUN      ] test_concurrency_t.Basic
43: test_concurrency: pthread_mutex_lock.c:117: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
    Test #43: test_concurrency .................Child aborted***Exception:   0.32 sec

For both reference and ACL builds.
It usually fails within 10 or 20 runs. In the example runs I did it always failed within 50 runs.

Version

onednn_verbose,info,oneDNN v3.2.0 (commit 1f428df708d943b2fb1bcb4c7f7e209cafaa7d22)

Environment

cpu: m6g.16xlarge:

$ lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              1
Core(s) per socket:              64
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       ARM
Model:                           1
Model name:                      Neoverse-N1
Stepping:                        r3p1
BogoMIPS:                        243.75
L1d cache:                       4 MiB
L1i cache:                       4 MiB
L2 cache:                        64 MiB
L3 cache:                        32 MiB
NUMA node0 CPU(s):               0-63
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; CSV2, BHB
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid as
                                 imdrdm lrcpc dcpop asimddp ssbs

Also fails with same error on cpu c6g.16xlarge:

$ lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              1
Core(s) per socket:              64
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       ARM
Model:                           1
Stepping:                        r1p1
BogoMIPS:                        2100.00
L1d cache:                       4 MiB
L1i cache:                       4 MiB
L2 cache:                        64 MiB
L3 cache:                        32 MiB
NUMA node0 CPU(s):               0-63
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; CSV2, BHB
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid as
                                 imdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm d
                                 it uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dg
                                 h rng
  • OS version (uname -a)
ubuntu 20.04.1
  • Compiler version (gcc --version)
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
  • CMake version (cmake --version)
cmake version 3.16.3
  • git hash (git log -1 --format=%H)
1f428df708d943b2fb1bcb4c7f7e209cafaa7d22

Steps to reproduce

ctest -VV -R test_concurrency --repeat-until-fail 100

Observed behavior

$ ctest -VV -R test_concurrency --repeat-until-fail 100
...
43: Test command: /home/amywig01/oneDNN/naclBuild/tests/gtests/test_concurrency
43: Test timeout computed to be: 10000000
43: Note: Google Test filter = *:-*_GPU*
43: [==========] Running 1 test from 1 test suite.
43: [----------] Global test environment set-up.
43: [----------] 1 test from test_concurrency_t
43: [ RUN      ] test_concurrency_t.Basic
43: test_concurrency: pthread_mutex_lock.c:117: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
    Test #43: test_concurrency .................Child aborted***Exception:   0.34 sec

Expected behavior

Test passes

@AmyWignall-arm AmyWignall-arm added the sighting Suspicious library behavior. Should be promoted to a bug when confirmed label Jul 21, 2023
@vpirogov vpirogov added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label Mar 29, 2024
@vpirogov vpirogov added bug A confirmed library bug help wanted and removed sighting Suspicious library behavior. Should be promoted to a bug when confirmed labels Jul 16, 2024
@michalowski-arm
Copy link
Contributor

As far as I can tell this is no longer an issue. Seems to be fixed with commit 557f3f0, I was not able to reproduce the error past it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A confirmed library bug help wanted platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64
Projects
None yet
Development

No branches or pull requests

3 participants