Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Student forcing options/roll-out #77

Open
kylebgorman opened this issue Jun 28, 2023 · 0 comments
Open

Student forcing options/roll-out #77

kylebgorman opened this issue Jun 28, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@kylebgorman
Copy link
Contributor

kylebgorman commented Jun 28, 2023

After #71, we now can control, for a given training batch, whether teacher or student forcing is used. Some recent work suggests that for sequence-to-sequence models there is an advantage to training with student forcing. Some other work recommends gradually rolling out student forcing during training. I propose that we:

  • experiment with a flag that simply enables student forcing during training and see if things still converge
  • also experiment with a linear, batchwise rollout of student forcing; that is:
    • for each batch, we draw a random sample such that with probability p we use teacher forcing and with probability 1 - p we use student forcing
    • we initialize with p = 1 and after the warmup phase, linearly decrement p so that p = 0 for the last batch

Note that the stochastic option (the second one) is somewhat different from what Bengio et al. do: they do this at the token level. However, this seems harder and slower to implement, so I am suggesting something simpler to start out with.

Both of these can be thought of as hyperparameter free (beyond the boolean decision of whether or not to use student forcing during training at all). If either work we can incorporate into the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant