Part-of-Speech (POS) Tagging

So far every step has treated words as interchangeable atoms: a token is a token, counted and filtered without regard to its role in the sentence. But the same word can play different roles. In "I caught a fish", "fish" is a noun. In "We fish every weekend", "fish" is a verb. To a frequency counter these are identical; grammatically they could not be more different.

Part-of-speech (POS) tagging assigns each word its grammatical category — noun, verb, adjective, adverb, and so on. It is the first step that pays attention to syntax rather than just the word's spelling, and it unlocks a lot: better lemmatization, extracting "all the nouns", recognizing names, and finding grammatical patterns.

The problem POS tagging solves: words wear many hats

A word's category depends on context, not just the word itself. This is why POS tagging cannot be a simple dictionary lookup — the same spelling needs different tags in different sentences.

A tagger must read the surrounding words to decide. NLTK's default tagger, the averaged perceptron tagger, does exactly that: it was trained on a large hand-tagged corpus and predicts each word's tag from features of the word and its neighbors. Let us watch it disambiguate "fish".

Same word, two sentences, two different tags — NN (noun) in the first, VBP (a verb tag) in the second. The tagger got there from context. This is the reason POS tagging is more than a lookup table.

Reading the tags: the Penn Treebank tagset

pos_tag returns a list of (word, tag) tuples. The tags come from the Penn Treebank tagset, which is more fine-grained than "noun/verb" — it distinguishes singular from plural nouns, verb tenses, and so on. You do not need to memorize all ~36 tags, but you should recognize the common families.

Tag	Meaning	Example
`NN` / `NNS`	noun, singular / plural	dog / dogs
`NNP` / `NNPS`	proper noun, singular / plural	Alice / Americas
`VB` / `VBD` / `VBG` / `VBZ`	verb: base / past / gerund / 3rd-person	run / ran / running / runs
`JJ` / `JJR` / `JJS`	adjective / comparative / superlative	big / bigger / biggest
`RB`	adverb	quickly
`DT`	determiner	the, a
`IN`	preposition / subordinating conjunction	in, of
`PRP`	personal pronoun	I, you, it
`CC`	coordinating conjunction	and, but

The first letter is the shortcut

You can decode most tags from their first letter: N… is a noun, V… is a verb, J… is an adjective, R… is an adverb. This is so useful that we will use it in code in a moment to convert Penn tags into the simpler categories a lemmatizer wants. When you only care about coarse categories, tag[0] or tag.startswith("NN") is often all you need.

Here is a full sentence tagged, laid out as a tree of word-to-tag mappings.

A real payoff: extracting syntactic patterns

Once words carry tags, you can pull out grammatical patterns cheaply. Want every noun in a document (to guess what it is about)? Keep tokens whose tag starts with NN. Want adjectives (useful for opinion mining)? Keep JJ. This "filter by tag" move is the backbone of simple information extraction.

POS tagging makes lemmatization accurate

Remember the big lemmatization gotcha: WordNetLemmatizer assumes every word is a noun unless told otherwise, so verbs come back unchanged. POS tagging is the missing piece. Tag first, convert each Penn tag to the coarse category WordNet understands, then lemmatize with that category. This is the standard, accurate lemmatization recipe.

Compare the two lines. Without POS, "running" and "eaten" survive unchanged. With POS, "running" → "run" and "eaten" → "eat". The tagger supplied the context the lemmatizer needed. This tag-then-lemmatize pattern is worth committing to memory — it is how accurate normalization is done in practice.

Tagging is contextual, and not perfect

Because the tagger predicts from context, it can be wrong — especially on short, ungrammatical, or unusual text (headlines, tweets, product titles). It is right the large majority of the time on well-formed English, but treat its output as a strong guess, not gospel. Lowercasing or stripping punctuation before tagging tends to hurt accuracy, since the tagger uses capitalization and punctuation as clues — another reason to tag relatively early.

QuestionSelect one

Why can't part-of-speech tagging be done with a simple dictionary that maps each word to one fixed tag?

Dictionaries are too slow to look words up in

A word's part of speech depends on context — "fish" is a noun in "I caught a fish" but a verb in "we fish on weekends" — so the same word needs different tags in different sentences

Every word in English has exactly one possible part of speech

Tagging only works on numbers

Real-world uses of POS tags

Accurate lemmatization (as above) — the most common pairing.
Named-entity recognition builds on proper-noun tags (NNP) to find people, places, and organizations.
Information extraction: pull all noun phrases to summarize "what" a document discusses, or adjective–noun pairs for opinion mining ("great battery", "slow service").
Grammar and writing tools flag, e.g., a sentence with no verb.
Search: knowing a query word is a verb vs. a noun can sharpen results.

Your turn: extract the nouns

Write a function get_nouns(text) that returns a list of the words in text that are tagged as nouns — that is, words whose Penn Treebank tag starts with "NN" (this covers NN, NNS, NNP, and NNPS, so both common and proper nouns count).

Steps: word-tokenize the text, run pos_tag on the tokens, then keep the words whose tag starts with "NN".

For example, get_nouns("The hungry cat chased a small mouse.") should return ["cat", "mouse"].

Check your understanding

QuestionSelect one

What does pos_tag return when given a list of tokens?

A single string naming the sentence's overall grammar

A list of (word, tag) tuples, one per token, where the tag is the word's part of speech in context

A list of only the nouns

The lemmatized form of each word

QuestionSelect one

In the Penn Treebank tagset, which tags would the filter tag.startswith("NN") match?

Only NN (singular common nouns)

NN, NNS, NNP, and NNPS — singular and plural common nouns and proper nouns

All verbs

Determiners and prepositions

QuestionSelect one

Why does tagging before lemmatizing produce better results than lemmatizing alone?

Tagging makes the text shorter

The tag tells the lemmatizer each word's part of speech, so verbs lemmatize as verbs ("running" → "run") instead of being left unchanged under the default noun assumption

Lemmatizing first would delete all the verbs

Tagging removes stopwords automatically

QuestionSelect one

A teammate runs the tagger on a batch of all-lowercase, punctuation-stripped product titles and gets noticeably worse tags than on clean sentences. What is the most likely reason?

The tagger only works in the morning

The tagger uses capitalization and punctuation as contextual clues; stripping them away beforehand removes signal the model relies on, lowering accuracy

POS tagging cannot be applied to product titles at all

The titles need to be translated first

You can now label words by grammatical role. Notice a limitation though: every step so far treats words individually. But "New York", "machine learning", and "not good" are multi-word units whose meaning lives in the combination. To capture local word order, we turn to n-grams.

The problem POS tagging solves: words wear many hats

Reading the tags: the Penn Treebank tagset

A real payoff: extracting syntactic patterns

POS tagging makes lemmatization accurate

Real-world uses of POS tags

Your turn: extract the nouns

Check your understanding

Part-of-Speech (POS) Tagging

The problem POS tagging solves: words wear many hats

Reading the tags: the Penn Treebank tagset

A real payoff: extracting syntactic patterns

POS tagging makes lemmatization accurate

Real-world uses of POS tags

Your turn: extract the nouns

Check your understanding

On this page