-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Few-shot SuperGLUE的部分数据集效果复现问题 #15
Comments
正好有同样的问题想问。我这边在SuperGLUE上的实验发现有几个数据集分数与随机数种子有很高的关联性(与使用的代码关联性就更高了,用Jiant和Allennlp跑出来分数差异也有几个点)。CB这个数据集甚至能从70多波动到90多。不知道作者是怎么处理这些随机因素的? |
CB这个数据集,只用BERT-BASE-UNCASE跑十次随机数种子,差别也能到这个程度(Jiant的结果)。 <style> </style>f13_bb0.912281 |
请问下你跑的时候emb size设置的是768吗,其他代码有改动吗?我这边跑的rte,指标很低只有三四十不知为何 |
确实比较神奇,我试着用cb这个script 跑了,发现报错,prompt embedding 默认值是128, 因此替换bert embedding对不上,但是cb 这个script 它又不指定embedding 这个参数值? |
Thanks for your great work in reproducing P-tuning for few-shot SuperGLUE. In practice, we find few-shot learning's reproducibility extremely relates with environmental setting, hyper-parameters (e.g., batch-sizes, gradient-accumulation-step) and number of parallel GPUs. For example, in our experiment we use 8 V100 GPUs for a single dataset training, and if less GPUs or different type of GPUs are used, the performance can varies greatly. In light of the volatility challenge, in the following work FewNLU @zheng-yanan present a more robust evaluation framework for few-shot SuperGLUE. P-tuning is also re-implemented in the FewNLU framework. Please check it if you have trouble setting up the same environment for fair comparison. |
请问prompt embedding 的大小需要设置和预训练模型的embedding_dim一样吗?直接拿作者的代码跑,会报错,维度不匹配 |
您好,我在复现Few-shot SuperGLUE(即
FewGLUE_32dev
数据)实验时,CB、WSC、COPA数据集的结果和论文中存在一定差距(复现实验所有模型均基于albert-xxlarge-v2
这一个预训练模型,与论文设计一致,实验seed=42无修改):实验设置差异:
关于CB数据集的实验
关于WSC数据集的实验
关于COPA数据集的实验
python库版本差异
考虑到可能存在版本差异影响造成复现效果不同,在此列出与requirements.txt对应的python库版本(括号中为项目requirements的库版本):
由于设备cuda版本受限,torch相关库的版本与代码不同;而其他部分库如tqdm、tensorboardX等应该与效果无关。
不知道是否是因为以上库版本差异导致效果不同?
设备差异
全部复现实验在单张
GeForce RTX 3090
上进行。请问如何理解模型效果的差异?
The text was updated successfully, but these errors were encountered: