Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added debugging to train.py #88

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Subhanshusethi
Copy link

  1. Added a valid indices check which, while loading the tokens file, ensures that the number of captions matches the number of embeddings. If not, the mismatched entries are filtered out.
  2. Fine-tuning GPT-2 is a major task that highly relies on the dataset. Removed single letters, special characters, and stop words (as defined by NLTK by default) to reduce the impact of connector words while training in the embedding space.

1. Added a valid indices check which, while loading the tokens file, ensures that the number of captions matches the number of embeddings. If not, the mismatched entries are filtered out.
2. Fine-tuning GPT-2 is a major task that highly relies on the dataset. Removed single letters, special characters, and stop words (as defined by NLTK by default) to reduce the impact of connector words while training in the embedding space.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant