Skip to content

This is a repo for the AM205 project - Fall 2021 - Harvard University.

Notifications You must be signed in to change notification settings

vitoriarlima/NLP-project

 
 

Repository files navigation

NLP-project

This project contains the code that was used to finetune a GPT-2 model on Wiktionary data. Specifically, two models were generated:

Demo.ipynb contains a demo for using the models.

Tuning

The code for finetuning GPT-2 can be found in tune/.

Some of the files therein:

  • benchmark.py -- Code for benchmarking the models on the train/validation/test splits of the dataset
  • demoing.py -- Simple demo script for sampling and extracting features from the model
  • get_vec.py -- Simple script for extracting a "word vector" from our model
  • guess.py -- Script for running the guessing game on defnitions
  • main.py -- Script for actually fine-tuning GPT-2 on the Wiktionary data
  • preprocess.py -- Script that takes the raw data obtained from Wiktionary and generates the splits of the dataset
  • sampler.py -- A script that is specifically for sampling from our tuned model

Sub-projects

In this project we have performed 3 subprojects. In the order from our Latex Writeup:

  • 3.2 Representation Geometry: the code for this can be found in the folder above dimreduct/
  • 3.3 Model Limitations & Gender Bias: the code for this can be found in the folder above Limitations_Bias/
  • 3.4 Feature Representation: the code for this can be found in the folder above Feature_Representations/

Latex Writeup

Lastly, you can find a PDF file of our report in the file AM205_group_project_WRITEUP.pdf where we describe in detail our project.

About

This is a repo for the AM205 project - Fall 2021 - Harvard University.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.8%
  • Python 0.2%