-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
自定义数据集文档看不太懂 #1323
Comments
dataset_id 就是 modelscope的dataset_id |
dataset_path就是本地文件路径 或者huggingface的dataset_id: HF::{dataset_id} |
不想看源码有点难评……swift内部就是用这套东西维护数据集的, 自定义数据集核心需要一个函数: def get_custom_dataset(
# 这几个顺序给出
dataset_id: str,
subsets,
preprocess_func,
splits,
dataset_sample,
# 后面是命名参数
random_state=random_state,
dataset_test_ratio=dataset_test_ratio,
remove_useless_columns=remove_useless_columns,
use_hf=use_hf,
# 你的自定义参数
**kwargs,
)
如果你只是想注册魔搭或者HF上的模型,get_function用get_dataset_from_repo即可。然后使用自定义(或者有时候都不需要)的 |
@Jintao-Huang @zodiacg Hello, I am working on the SFT of InternVL. Do I have to specify the absolute path for the |
Just use --dataset is Ok |
自定义数据集可以写的详细一些吗?推荐直接命令行传参的方式,直接命令行传参中的dataset_id是怎么来的?是自己随便定义dataset_id还是怎么来的?不想看源码....
https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md#-%E6%8E%A8%E8%8D%90%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%8F%82%E6%95%B0%E7%9A%84%E5%BD%A2%E5%BC%8F
The text was updated successfully, but these errors were encountered: