Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Labelling #3

Open
viswajithiii opened this issue Mar 6, 2017 · 1 comment
Open

Data Labelling #3

viswajithiii opened this issue Mar 6, 2017 · 1 comment

Comments

@viswajithiii
Copy link
Owner

viswajithiii commented Mar 6, 2017

As a first pass, we will do the following labelling.

Bookkeeping spreadsheet: here. Make sure to add to this every time you create a new article, creating a new id for it and putting in the link.

For each article, create a text file with name the article_id.txt with the following format:

Line 1: article_id (a001, a002 ...)
Line 2: URL
Line 3: Headline
Line 4: Byline (If multiple authors, separate by semicolon ("Poorna Kumar; Viswajith Venugopal"))
Line 5 onwards: Body

(For TSVs, article_id can be a001p, a002p, ... for Poorna's and a001v, a002v, ... for Viswa's.)

Now, the annotation is in a text file with name article_id.tsv, and is of the following format (one line per person mentioned):

Full Name, Gender, Number of times mentioned (only by part or full name, NOT by pronoun), Says something (yes/no), Number of words quoted, Source/subject (src/sub), Adjectives (a comma separated list), Expert/non-expert, Profession/Role(s)

UPDATE (Week of March 13th): As per Maneesh's instructions, I'm also adding a 'Quotes' column at the end of new articles I annotate. These contain the raw quotes that the person says. Different quotes that a person says are delimited by the special token ''.

@poorna-kumar
Copy link
Collaborator

poorna-kumar commented May 16, 2017

List of small problems, to be looked at by Viswa if possible:

  • a068: Article on Afghan cricket scene. Really not clear on who is a source versus a subject.
  • a069: Article on American citizens detained in North Korea. Not clear about whether the detainees are subjects or neither source nor subject.
  • a070: Article on Met's Opera House. Should we mark Verdi and Strauss as mentions?
  • a096: Is Hughes a source? (I think yes). In that case, this is a good example of an article where the source is in the first paragraph. Also, in general, is Sean Spicer an expert source? In this article I have called him a source (debatable) and a non-expert (debatable).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants