Lack of validation set? #13

guozixunnicolas · 2022-05-01T17:29:10Z

Hi there,

Thanks for the implementation! Appreciate if you could share more insight on why there's no valiadtion/test set involved during training?

Best,

wayne391 · 2022-05-03T04:21:47Z

Hi,

It's an interesting question. We did have such kind of discussion at the early stages. We used to run validation during training and found the validation loss would be extremely high and could not reflect the quality of generated results.

The conclusion is that "overfitting" is somewhat an important or necessary factor of a good generative LM model. Models with higher validation loss might generate better results because they have higher probabilities of "remembering" good sentences from humans. I recall that a paper mentioned this phenomenon as well (but I forget its title...).

Furthermore, the quality hugely depends on another factor - "sampling" at the "inference" stage. Combining the two factors, we considered that the runtime validation loss might not be very useful, so we discarded it in every following work.

guozixunnicolas · 2022-05-03T04:40:15Z

Hi,

Thanks for the detailed reply.

I remember in a beginner course project where I supervised some students training the bach chorale dataset using CNN. The results turn out to be pretty good, with all the kinda voice leading and counterpointal movement. I was a bit surprised to see CNN could produce such good results. After diving deep into the code and I realize that there's no validation set involved. After some exploration, the generated results are basically "copying" whatever they've seen from the training set which couldnt reflect the generation & generalizing ability of the model. Have you checked such "plagiarism" effect on the generated results?

I still believe a validation/test set is needed during training. Else, why bother using the SOTA model (i.e. transformer) right? Why not just using a super-overfitting CNN with much more parameters which would result in equally good results?

Regarding sampling, I believe you only used top-k/top-p/temperature-regularized sampling right(correct me if im wrong)? given the overfitting behavior, the logits would tend to heavily distributed to the overfitting token(e.g. [1e4, 1e1, 1e-1, 1e-2]), hence top-p/top-k wouldn't affect much I believe unless you apply a super-high temperature?

Happy to discuss!

dedededefo · 2022-07-14T03:25:56Z

Hi,

It's an interesting question. We did have such kind of discussion at the early stages. We used to run validation during training and found the validation loss would be extremely high and could not reflect the quality of generated results.

The conclusion is that "overfitting" is somewhat an important or necessary factor of a good generative LM model. Models with higher validation loss might generate better results because they have higher probabilities of "remembering" good sentences from humans. I recall that a paper mentioned this phenomenon as well (but I forget its title...).

Furthermore, the quality hugely depends on another factor - "sampling" at the "inference" stage. Combining the two factors, we considered that the runtime validation loss might not be very useful, so we discarded it in every following work.

Hi,How to generate validation_songs.json?There seems to be no mention in the description of the dataset file.I would appreciate it if you could answer me

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lack of validation set? #13

Lack of validation set? #13

guozixunnicolas commented May 1, 2022

wayne391 commented May 3, 2022

guozixunnicolas commented May 3, 2022

dedededefo commented Jul 14, 2022

Lack of validation set? #13

Lack of validation set? #13

Comments

guozixunnicolas commented May 1, 2022

wayne391 commented May 3, 2022

guozixunnicolas commented May 3, 2022

dedededefo commented Jul 14, 2022