This file summarizes the data contained in the .csv
files used in this project.
Contains written books from all main-series Elder Scrolls games (Arena, Daggerfall, Morrowind, Oblivion, Skyrim, Elder Scrolls Online) with the following information:
Column Name | Description |
---|---|
title |
Name of the document in-game |
author |
In-game author of the document |
description |
(Optional, could be empty) A brief summary of the text |
game |
Python List containing strings, identifying which game(s) the document has appeared in |
text |
The actual text, scraped from fan wiki |
word_count |
The book's word count (according to nltk.word_tokenize ). |
url |
URL to the scraped webpage |
There are:
- 5446 Entries
- 2106 Unique Authors (anonymous are counted as 1)
Contains the quest dialogue from Torchlight II with the following information:
Column Name | Description |
---|---|
speaker |
Who says/displays the text |
text |
Text from quest |
word_count |
The text's word count (according to nltk.word_tokenize ). |
dialogtype |
Describes the purpose of the dialogue in-relation to the quest |
quest_displayname |
Name of the quest associated with the text as displayed in-game |
quest_name |
Name of the quest associated with the text as displayed in-game-engine (NON-QUEST entries are hooked up to names here) |
There are:
- 1008 Entries
- 9 Dialogue Types
- 84 Unique Speakers
Abbreviated as KOTOR, contains the following information, inside kotor.pkl
:
Column Name | Description |
---|---|
speaker |
Speaker of dialogue |
listener |
In-game intended listener |
text |
Transciption of dialogue |
word_count |
The text's word count (according to nltk.word_tokenize ). |
animation |
Python List describing what animations the character does while saying the dialogue |
next |
Python List containing the next "chunks" of dialogue (if any) |
previous |
Python List containing the previous "chunks" of dialogue (if any) |
comment |
Comments (if any) left by the developer in any in-game files |
Data pertaining to the animations listed in the animation
column can be found in meta_kotor.pkl
.
There are:
- 29213 Entries
- 538 Unique Speakers
- 152 Unique Listeners (Including the PLAYER)
- 31 Associated Animation
Very simple .csv
, containing only the following information:
Column Name | Description |
---|---|
character |
Name of the speaker |
text |
Text dump containing all dialogue (partially tagged in .xml format) from the speaker throughout their various appearances in the game |
word_count |
The text's word count (according to nltk.word_tokenize ). |
There are:
- 55 Characters
- Tags Include:
<description>
with their associated end tags
Another extremely simple .csv
made via using spaCy's ner
model to tag entities within the texts, containing only the following information:
Column Name | Description |
---|---|
entity |
Name of the entity |
tag |
ent.label_ production as given (potential inaccuracies) |
source |
Game from which the entity came |
NOTE: Only the data from Hollow Knight and Torchlight are from the full text data, KOTOR and The Elder Scrolls were taken as samples due to memory limitations. Tagger accuracy estimated to be around 89% as per spaCy's website.
A .csv
containing the following regarding various orders and requests capture from my regular expressions and spaCy's Matcher:
Column Name | Description |
---|---|
speech_act_type |
Type of speech act |
text |
Text captured |
game |
Original game |
NOTE: Despite my best efforts, there are false negatives and positives. Although I assume the precision and recall rates are good, there have been no formal tests to prove this. Use at your own discretion.