Student forcing options/roll-out #77

kylebgorman · 2023-06-28T20:36:11Z

After #71, we now can control, for a given training batch, whether teacher or student forcing is used. Some recent work suggests that for sequence-to-sequence models there is an advantage to training with student forcing. Some other work recommends gradually rolling out student forcing during training. I propose that we:

experiment with a flag that simply enables student forcing during training and see if things still converge
also experiment with a linear, batchwise rollout of student forcing; that is:
- for each batch, we draw a random sample such that with probability p we use teacher forcing and with probability 1 - p we use student forcing
- we initialize with p = 1 and after the warmup phase, linearly decrement p so that p = 0 for the last batch

Note that the stochastic option (the second one) is somewhat different from what Bengio et al. do: they do this at the token level. However, this seems harder and slower to implement, so I am suggesting something simpler to start out with.

Both of these can be thought of as hyperparameter free (beyond the boolean decision of whether or not to use student forcing during training at all). If either work we can incorporate into the master branch.

kylebgorman added the enhancement New feature or request label Jun 28, 2023

bonham79 mentioned this issue Feb 20, 2024

Hard Monotonic Transducer #165

Closed

bonham79 mentioned this issue Jun 10, 2024

Generalization of expert teacher_forcing and monotonicity across model architectures #198

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Student forcing options/roll-out #77

Student forcing options/roll-out #77

kylebgorman commented Jun 28, 2023 •

edited

Loading

Student forcing options/roll-out #77

Student forcing options/roll-out #77

Comments

kylebgorman commented Jun 28, 2023 • edited Loading

kylebgorman commented Jun 28, 2023 •

edited

Loading