-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Issues: hpcaitech/ColossalAI
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[FEATURE]: Support SP+PP in Llama etc.
enhancement
New feature or request
shardformer
#5866
opened Jun 27, 2024 by
GuangyaoZhang
[BUG]: ColossalChat train sft is skipped with opt-1.3b model
bug
Something isn't working
#5865
opened Jun 27, 2024 by
smash1999
1 task done
[BUG]: Colossal AI failed to load ChatGLM2
bug
Something isn't working
#5861
opened Jun 26, 2024 by
hiprince
1 task done
[BUG]: loading OPT 66B model - CPU runs out of memory
bug
Something isn't working
#5855
opened Jun 25, 2024 by
PurvangL
1 task done
[FEATURE]: Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM
enhancement
New feature or request
shardformer
#5853
opened Jun 25, 2024 by
GuangyaoZhang
Use gemini plugin and LowLevelZero to run llama2_7b. In the pulgin in gemini, set the policy to static, shard_param_frac, offload_optim_frac, and offload_param_frac to 0.0, making gemini equal to zero2, and set stage to 2 in LowLevelZero. Using bf16 for training, and comparing the two plugins, we found that the GPU memory usage of gemini is higher than that of LowLevelZero. Why is this? In principle, gemini should save more GPU memory
#5830
opened Jun 18, 2024 by
JJGSBGQ
[FEATURE]: LoRA with sharded model
enhancement
New feature or request
#5826
opened Jun 17, 2024 by
KaiLv69
Gradients are None after booster.backward
bug
Something isn't working
#5792
opened Jun 11, 2024 by
ArnaudFickinger
[BUG]: Shardformer failure with torch 2.3
bug
Something isn't working
#5757
opened May 27, 2024 by
Edenzzzz
1 task done
[BUG]: docker build cuda extension error
bug
Something isn't working
#5732
opened May 20, 2024 by
apachemycat
1 task done
[BUG]: TypeError: LlamaInferenceForwards.llama_causal_lm_forward() got an unexpected keyword argument 'shard_config'
bug
Something isn't working
#5729
opened May 17, 2024 by
hiprince
1 task done
[BUG]: No module named 'dropout_layer_norm'
bug
Something isn't working
#5726
opened May 17, 2024 by
apachemycat
1 task done
[BUG]: TypeError: _gen_python_code() got an unexpected keyword argument 'verbose'
bug
Something isn't working
#5673
opened Apr 29, 2024 by
Xingzhi107
[BUG]: GROK-1 does not support do_sample
bug
Something isn't working
#5672
opened Apr 28, 2024 by
vsmelov
[PROPOSAL]: Fix potential github action smells
enhancement
New feature or request
#5667
opened Apr 28, 2024 by
ceddy4395
1 task done
[BUG]: ColossalMoE Train: AssertionError: Parameters are expected to have the same dtype Something isn't working
torch.bfloat16
, but got torch.float32
bug
#5664
opened Apr 26, 2024 by
Camille7777
[BUG]: re-join str type error_msgs using Something isn't working
\n\t
in general_checkpoint_io
bug
#5615
opened Apr 21, 2024 by
ericxsun
[BUG] [Shardformer]: Error in blip2 testing with half precision
bug
Something isn't working
#5600
opened Apr 15, 2024 by
insujang
[BUG]: pretraing llama2 using "gemini" plugin, can not resume from saved checkpoints
bug
Something isn't working
#5597
opened Apr 15, 2024 by
jiejie1993
[BUG]: Running ColossalAI in H800 with torch 2.0
bug
Something isn't working
#5594
opened Apr 13, 2024 by
wxthu
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-06-05.