E-QGen Dataset

This is the dataset of E-QGen: Educational Lecture Abstract-based Question Generation System

Introduction

This dataset is constructed by the method described in the paper E-QGen: Educational Lecture Abstract-based Question Generation System. We collected course transcripts from online courses on YouTube and match up with corresponding questions in the comment section. This dataset mainly focus on the lectures related to computer science. A total number of 356 golden pairs, 4,434 silver pairs and 4,829 platinum pairs is collected. Please check out the paper for more detailed collection procedure and dataset description. \ In this repo, we provide direct access to our dataset, which are the paragraph and question pairs.

Usage

Golden Pairs

Golden pairs are constructed by matching the timestamps back to the specific transcripts.

golden_pair_3agree.csv, golden_pair_2agree.csv
- The postfix of the file name shows that the number of LLMs are used while filtering out questions from comments.
golden_pair_3agree_notime_gpt4.csv, golden_pair_2agree_notime_notime_gpt4.csv
- The postfix _notime_gpt4 means that the timestamps of the questions are removed. Since removing timstamps may cause the sentence become strange, we use GPT-4 to refine the question comments.

Silver Pairs

silver_pairs.csv are collected by matching the comments without timestamps and the lecture paragraph. We compute the cosine similarity with PaLM, PaLM embedding and Sentence Transformer embeddings

Platinum Pairs

platinum_pairs.csv are generated by OpenAI GPT-4 model. We ask the GPT-4 model to generate 20 questions for each lecture paragraph.

Data Sources

Paragraphs and questions pairs are collected from MIT OpenCourseWare and Stanford Online YouTube Channel

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
pq_pairs		pq_pairs
README.md		README.md
prompt.md		prompt.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E-QGen Dataset

Introduction

Usage

Golden Pairs

Silver Pairs

Platinum Pairs

Data Sources

About

Releases

Packages

NYCU-NLP-Lab/E-QGen

Folders and files

Latest commit

History

Repository files navigation

E-QGen Dataset

Introduction

Usage

Golden Pairs

Silver Pairs

Platinum Pairs

Data Sources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages