Readings for courses that I teach on natural language processing
Akkasi 2016: On the challenges of tokenizing biomedical text.
Artstein and Poesio: Calculation and interpretation of inter-annotator agreement.
Banko and Brill: How to think critically about the effects of data quantity.
Chapman Negex: The classic work on negation in clinical texts.
Church LiLT: What we gain and what we lose when we focus on machine learning for natural language processing.
Cohen and Demner-Fushman: Book-length coverage of the fields and history of biomedical natural language processing.
Conway and O'Connor: Social media, mental health, and Big Data.
Cruz-Diaz: On the complexity of tokenization of biomedical text.
Fokkens 2013: Things that you would not believe affect reproducibility in natural language processing---and yet they do, they do.
Fort, Amazon Mechanical Turk: Ethics of linguistic data construction.
Friedman-Kra-Rzhetsky: Sketches of two very distinct forms of biomedical language--clinical documents, and scientific journal articles.
Goutte and Gaussier: How to think about precision, recall, and F-measure.
Hand 2006: The problem with complex classifiers.
He and Kayaalp: The complexities of tokenization of biomedical text.
NaturallyOccurringDataAssumption: What is the best way to test natural language processing systems?
Névéol and Zweigenbaum: A review of clinical natural language processing research.
NominalizationAlternations: A quantitative descriptive study of a common phenomenon in biomedical language that is more complicated than it might look.
Pedersen Empiricism Is Not A Matter Of Faith: On the importance of making your code available.
Pestian Sentiment Analysis of Suicide Notes: Using natural language processing to study suicidality.
Reinlander: Natural language processing at scale -- lessons learned from a case study in the biomedical domain.
Sarker et al.: Lots of good information on social media and on pharmacovigilance. Also a nice example of how to write a review article.
Steedman: The implications of Zipf's Law for natural language processing--and for linguistics.
Temnikova: Mathematical properties of the language of a genre of clinical texts.
Wu: How to think critically about the relationship between optimization and generalization.