Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run llama2 in two CPU meet error when set dtype int8 and bf16_int8 #56

Open
zhm-algo opened this issue Nov 15, 2023 · 1 comment
Open
Labels
bug Something isn't working

Comments

@zhm-algo
Copy link

the script is in the attachment.
llama2-7b.zip

the error info is shown as below

  1. int8

memory node number: 16
HBM SNC4 mode
llama2-7b.sh: 17: Bad substitution
llama2-7b.sh: 17: Bad substitution
llama2-7b.sh: 17: Bad substitution
llama2-7b.sh: 17: Bad substitution
FP16 Performance
FP16 Performance
FP16 Performance
FP16 Performance
llama2-7b.sh: 17: Bad substitution
llama2-7b.sh: 17: Bad substitution
llama2-7b.sh: 17: Bad substitution
llama2-7b.sh: 17: Bad substitution
FP16 Performance
FP16 Performance
FP16 Performance
FP16 Performance
Segmentation fault (core dumped)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 21023 RUNNING AT ubuntu-desktop
= KILLED BY SIGNAL: 9 (Killed)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 21024 RUNNING AT ubuntu-desktop
= KILLED BY SIGNAL: 9 (Killed)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 21025 RUNNING AT ubuntu-desktop
= KILLED BY SIGNAL: 9 (Killed)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 3 PID 21026 RUNNING AT ubuntu-desktop
= KILLED BY SIGNAL: 9 (Killed)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 5 PID 21028 RUNNING AT ubuntu-desktop
= KILLED BY SIGNAL: 9 (Killed)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 6 PID 21029 RUNNING AT ubuntu-desktop
= KILLED BY SIGNAL: 9 (Killed)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 7 PID 21030 RUNNING AT ubuntu-desktop
= KILLED BY SIGNAL: 9 (Killed)

  1. bf16_int8
    memory node number: 16
    HBM SNC4 mode
    llama2-7b.sh: 17: Bad substitution
    llama2-7b.sh: 17: Bad substitution
    FP16 Performance
    llama2-7b.sh: 17: Bad substitution
    FP16 Performance
    FP16 Performance
    llama2-7b.sh: 17: Bad substitution
    llama2-7b.sh: 17: Bad substitution
    llama2-7b.sh: 17: Bad substitution
    llama2-7b.sh: 17: Bad substitution
    FP16 Performance
    FP16 Performance
    llama2-7b.sh: 17: Bad substitution
    FP16 Performance
    FP16 Performance
    FP16 Performance
    Segmentation fault (core dumped)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 21300 RUNNING AT ubuntu-desktop
= KILLED BY SIGNAL: 9 (Killed)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 21301 RUNNING AT ubuntu-desktop
= KILLED BY SIGNAL: 9 (Killed)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 21302 RUNNING AT ubuntu-desktop
= KILLED BY SIGNAL: 9 (Killed)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 3 PID 21303 RUNNING AT ubuntu-desktop
= KILLED BY SIGNAL: 9 (Killed)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 4 PID 21304 RUNNING AT ubuntu-desktop
= KILLED BY SIGNAL: 9 (Killed)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 6 PID 21306 RUNNING AT ubuntu-desktop
= KILLED BY SIGNAL: 9 (Killed)

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 7 PID 21307 RUNNING AT ubuntu-desktop
= KILLED BY SIGNAL: 9 (Killed)

@Duyi-Wang
Copy link
Contributor

With the latest code, it seems to be related to the segmentation when the rank is greater than 8, and it is unrelated to 2-socket or HBM SNC4. It will fail when rank=4 or 6 in v1.0.0。 ChatGLM 1&2 is ok.
It can be reproduced by following cmd using C++ example.

OMP_NUM_THREADS=6 mpirun -n 8 numactl -N 0 ./example -m /data/llama-2-7b-cpu/ -t /data/llama-2-7b-hf/tokenizer.model --loop 1 --output_len 1 --dtype int8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants