natural-stories-maze

This contains the materials, data, code, and write-ups associated with the Natural Stories Maze paper.

read_results.R takes the in Data/raw_data and produces Data/cleaned.rds
nat_stories.Rmd has non-modelling analysis looking at accuracy, comprehension, participant feedback; takes in Data/cleaned.rds and produces Analysis/models/comp.rds
models.Rmd has the modelling stuff
models/ has saved summaries of models and other pre-processed data objects for inclusion in the paper (paper should build without needing to run any of the models oneself)

Data

raw_data - what Ibex produces
cleaned.rds (generated by Analysis/read_results.R)
maze_pre_error.Rds is a cleaned up version of only pre-mistake data used for modelling, created by models.Rmd
SPR/ contains raw data from Futrell et al; first.rds is a cleaned-up version of first stories only created by models.Rmd

Materials

for_ns.js is the code to run the experiment (insert into Ibex maze framework)
for_ibex.txt is natural stories text split up in sentences with distractors
ibex_questions.txt is the natural stories questions
natural_stories_sentences.tsv is the text split into sentences
raw_questions.txt is the raw natural stories questions
practice.txt is the text and questions of practice items
practice_post_maze.txt is the practice items with distractors in Ibex maze format

Prep_code

nat_stories_prep.Rmd - takes raw Natural Stories materials and processes it for labels, Maze and model surprisals; also takes in tokenizations and surprisal and makes a nice table of them. This generates some of the files in Materials/
useful.py manages formatting for before and after running surprisals (Note: ngram, txl and grnn were run on a cluster with a precursor to lm-zoo. GPT was run with lm-zoo. For replicating/altering, I recommend using lm-zoo. TXL is not currently on lm-zoo)
natural_stories_surprisals.rds is used in models.Rmd
ns_pre_maze.txt is the natural stories sentences ready to get Maze distractors
other files are inputs or intermediate outputs to reformatting the natural stories materials for the experiment
predictors/ is all surprisal and frequency predictions and model tokenization patterns

Papers

Papers/Paper has the actual manuscript
Amlap_2020_talk contains abstract and slides for the presentation given at Amlap 2020
UCI_2021 contains slides for a lab meeting presentation
Images/ and many loose image files are just that
lab_meeting_2020 (.tex and .pdf) is from a pre-Amlap lab meeting presentation

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
Analysis		Analysis
Data		Data
Materials		Materials
Papers		Papers
Prep_code		Prep_code
.gitignore		.gitignore
README.md		README.md