Now fluentspeech.py also manages validation set. #271

umbertocappellazzo · 2022-06-14T13:23:25Z

Hi, I modified the fluentspeech.py file such that it can deal with the validation set as well (it was missing).
Cheers

TLESORT · 2022-06-14T13:31:15Z

Hi Umberto,
Thanks for your PR. :)

The thing is that in CL, you are not supposed to have a valid set in the beginning. You are supposed to produce it at each task from the training data.
I would prefer that you adapt split_train_val(dataset, split_val = 0.1) to your use case or create a new function compatible with your task type to realize the same function.

umbertocappellazzo · 2022-06-14T14:15:16Z

Hmm, I see. FluentSpeech Commands dataset provides the valid_data.csv file together w/ the train and test set, so I can't extract a portion of the train set because the valid already exists. Well, I'll see how to pull it off anyways 👍 Thank you Timothée :)

TLESORT · 2022-06-14T15:45:07Z

Is it possible to merge train and valid to create train data?
If there is no way out, we can maybe keep the "valid" option you propose but this is not a recommended practice... :)

umbertocappellazzo · 2022-06-14T16:07:27Z

I think, by and large, that it is possible, yet if we merge the train and valid, and then we split again, the original validation samples will not be the same as the new ones brought by split_train_val . In fact the validation set includes utterances from a certain number of speakers, and these speakers are different from the train set (and from the test set), and each speaker's utterances are kept into a specific subfolder. Therefore I think it would not be possible :/

arthurdouillard · 2022-06-15T13:05:10Z

@TLESORT I think we can allow having a define val dataset, I think a few others datasets in Continuum proposes that. It can be useful when everyone compare on the same val set.

TLESORT · 2022-06-15T14:10:33Z

Yes, we can potentially accept it.

But first, having the same valid set is not supposed to be a requirement. The mandatory thing is to have the same test set.
Second, if the valid set is designed with additional knowledge, it is considered supplementary supervision. In particular, if the valid set is designed with the knowledge of the whole dataset, it could mean there is some bias toward the future ...

I also think we have already some datasets with valid sets, but it is against my will ;D :P

So, we can merge this, but we need to keep in mind that some particularities in the scenario make it not perfectly rigorous...

umbertocappellazzo · 2022-06-15T17:14:21Z

Well, the split into train, test and valid has been made by the authors who created the corpus and I don't know whether they crafted then different sets. Since I'm the first to use FSC in a CL scenario, I think it could be ok to proceed in this way, and I understand your rigorousness for this matter. So, you have the last word about this.
I take advantage of this thread for asking one thing: does Continuum handle the case of unbalanced classes for rehearsal? I had a look at the I suppose not, but I wanna be sure. If the dataset contains unbalanced classes, it's not fair to keep the same # of samples for each class. If continnum doesn't cover this case, I can come up with a solution for my project and then I can make a PR (if you think this is worth it).

arthurdouillard · 2022-06-16T09:22:06Z

@umbertocappellazzo I'm merging this PR, and I create an issue to discuss what you're just talked about as it's quite different matter (#272).

arthurdouillard · 2022-06-16T09:38:39Z

@umbertocappellazzo note that it has been deployed in the version 1.2.4. :)

Thanks for your contrib!

Now fluentspeech.py also manages validation set.

9ded457

arthurdouillard merged commit 9c0458c into Continvvm:master Jun 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Now fluentspeech.py also manages validation set. #271

Now fluentspeech.py also manages validation set. #271

umbertocappellazzo commented Jun 14, 2022

TLESORT commented Jun 14, 2022 •

edited

Loading

umbertocappellazzo commented Jun 14, 2022

TLESORT commented Jun 14, 2022

umbertocappellazzo commented Jun 14, 2022

arthurdouillard commented Jun 15, 2022

TLESORT commented Jun 15, 2022

umbertocappellazzo commented Jun 15, 2022

arthurdouillard commented Jun 16, 2022

arthurdouillard commented Jun 16, 2022

Now fluentspeech.py also manages validation set. #271

Now fluentspeech.py also manages validation set. #271

Conversation

umbertocappellazzo commented Jun 14, 2022

TLESORT commented Jun 14, 2022 • edited Loading

umbertocappellazzo commented Jun 14, 2022

TLESORT commented Jun 14, 2022

umbertocappellazzo commented Jun 14, 2022

arthurdouillard commented Jun 15, 2022

TLESORT commented Jun 15, 2022

umbertocappellazzo commented Jun 15, 2022

arthurdouillard commented Jun 16, 2022

arthurdouillard commented Jun 16, 2022

TLESORT commented Jun 14, 2022 •

edited

Loading