OOM when reproducing h2o_20_a100_80 #4

machilusZ · 2023-09-09T18:59:18Z

I tried to reproduce the experiment on a nvidia-a100-80g, but got the OOM issue, even for the 6.7b opt model.

python -m flexgen.flex_opt --model facebook/opt-6.7b --path DUMMY --percent 100 0 100 0 100 0 --overlap False --gpu-batch-size 64 --prompt-len 2048 --gen-len 2048 --hh-long-seq --hh-ratio 0.2 --hh-all
2048
model size: 12.386 GB, cache size: 128.000 GB, hidden size (prefill): 2.000 GB
init weight...
== after init weight ==
used: 126 GB, free: 1271 GB, cached: 370 GB, available: 1619 GB
warmup - generate
benchmark - generate
== after init cache ==
used: 126 GB, free: 1271 GB, cached: 370 GB, available: 1619 GB
Traceback (most recent call last):
File "/opt/conda/envs/ptca/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/envs/ptca/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/scratch/AzureBlobStorage_INPUT1/vc_data_blob/users/yunanzhang/H2O/h2o_flexgen/flexgen/flex_opt.py", line 1423, in
run_flexgen(args)
File "/scratch/AzureBlobStorage_INPUT1/vc_data_blob/users/yunanzhang/H2O/h2o_flexgen/flexgen/flex_opt.py", line 1318, in run_flexgen
output_ids = model.generate(
File "/scratch/AzureBlobStorage_INPUT1/vc_data_blob/users/yunanzhang/H2O/h2o_flexgen/flexgen/flex_opt.py", line 942, in generate
self.generation_loop_normal()
File "/scratch/AzureBlobStorage_INPUT1/vc_data_blob/users/yunanzhang/H2O/h2o_flexgen/flexgen/flex_opt.py", line 983, in generation_loop_normal
self.compute_layer(i, j, k)
File "/scratch/AzureBlobStorage_INPUT1/vc_data_blob/users/yunanzhang/H2O/h2o_flexgen/flexgen/flex_opt.py", line 841, in compute_layer
self.layers[j].forward(self.hidden[i][j][k], self.cache_read_buf[j][k],
File "/scratch/AzureBlobStorage_INPUT1/vc_data_blob/users/yunanzhang/H2O/h2o_flexgen/flexgen/flex_opt.py", line 482, in forward
h, new_k_cache, new_v_cache, acc = self.compute.mha(h, mask, w_q, b_q,
File "/scratch/AzureBlobStorage_INPUT1/vc_data_blob/users/yunanzhang/H2O/h2o_flexgen/flexgen/pytorch_backend.py", line 365, in mha
attn_weights = F.softmax(attn_weights, dim=2, dtype=torch.float32).to(torch.float16)
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/nn/functional.py", line 1843, in softmax
ret = input.softmax(dim, dtype=dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 GiB (GPU 0; 79.17 GiB total capacity; 59.77 GiB already allocated; 17.68 GiB free; 60.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM when reproducing h2o_20_a100_80 #4

OOM when reproducing h2o_20_a100_80 #4

machilusZ commented Sep 9, 2023

OOM when reproducing h2o_20_a100_80 #4

OOM when reproducing h2o_20_a100_80 #4

Comments

machilusZ commented Sep 9, 2023