We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
老师您好,我是一名在读NLP萌新。我最近也正在研究垂直领域大模型的相关课题。我之前听自己师兄们说增量预训练阶段如果不混入通用领域数据模型会训傻掉。看了论文之后,发现通古大模型的增量预训练仅涉及古汉语和现代汉语(wiki-zh)。请问这边的wiki-zh充当的就是通用领域数据的角色吗?您这边是如何确定这样的古文领域/通用领域的数据配比问题的?为何确定Baichuan2大模型作为基座模型?增量预训练之后、SFT之前的模型性能是如何评价的?烦请解答,感谢!
The text was updated successfully, but these errors were encountered:
顺带提问,古汉语数据中不同类别数据增量预训练的epoch为何不一样?
Sorry, something went wrong.
No branches or pull requests
老师您好,我是一名在读NLP萌新。我最近也正在研究垂直领域大模型的相关课题。我之前听自己师兄们说增量预训练阶段如果不混入通用领域数据模型会训傻掉。看了论文之后,发现通古大模型的增量预训练仅涉及古汉语和现代汉语(wiki-zh)。请问这边的wiki-zh充当的就是通用领域数据的角色吗?您这边是如何确定这样的古文领域/通用领域的数据配比问题的?为何确定Baichuan2大模型作为基座模型?增量预训练之后、SFT之前的模型性能是如何评价的?烦请解答,感谢!
The text was updated successfully, but these errors were encountered: