Skip to content
This repository has been archived by the owner on Oct 19, 2024. It is now read-only.
/ pos Public archive

A command-line utility for tagging part of speech for words in text.

License

Notifications You must be signed in to change notification settings

Flight-School/pos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pos

pos is a command-line utility for tagging part of speech (POS) for words in text.

$ echo "The quick brown fox jumps over the lazy dog." | pos
DETERMINER	The
ADJECTIVE	quick
ADJECTIVE	brown
NOUN	fox
VERB	jumps
PREPOSITION	over
DETERMINER	the
ADJECTIVE	lazy
NOUN	dog

For more information about natural language processing, check out Chapter 7 of the Flight School Guide to Swift Strings.


Requirements

  • macOS 10.12+

Installation

Install pos with Homebrew using the following command:

$ brew install flight-school/formulae/pos

Usage

Text can be read from either standard input or file arguments. Tagged words are written to standard output on separate lines.

Reading from Piped Standard Input

$ echo "Designed by Apple in California." | pos
VERB	Designed
PREPOSITION	by
NOUN	Apple
PREPOSITION	in
NOUN	California

Reading from Standard Input Interactively

$ pos
This text is being typed into standard input.
DETERMINER	This
NOUN	text
VERB	is
VERB	being
VERB	typed
PREPOSITION	into
ADJECTIVE	standard
NOUN	input

Reading from a File

$ cat german-pangram.txt
Falsches Üben von Xylophonmusik quält jeden größeren Zwerg

$ pos german-pangram.txt
ADJECTIVE	Falsches
NOUN	Üben
PREPOSITION	von
NOUN	Xylophonmusik
VERB	quält
DETERMINER	jeden
ADJECTIVE	größeren
NOUN	Zwerg

Advanced Usage

pos can be chained with Unix text processing commands, like cut, sort, uniq, comm, grep sed, and awk.

Filtering Tags

$ pos german-pangram.txt | grep NOUN | cut -f2
Üben
Xylophonmusik
Zwerg

Additional Details

Tagged words are written to standard output on separate lines. Each line consists of the part of speech tag (see table below), followed by a tab (\t), followed by the token:

^(?<tag>([A-Z]+)\t(?<token>.+)$

pos uses NLTagger when available, falling back on NSLinguisticTagger for older versions of macOS.

Part of Speech Tags

  • ADJECTIVE
  • ADVERB
  • CLASSIFIER
  • CONJUNCTION
  • DETERMINER
  • IDIOM
  • INTERJECTION
  • NOUN
  • NUMBER
  • PARTICLE
  • PREPOSITION
  • PRONOUN
  • VERB

License

MIT

Contact

Mattt (@mattt)