Logical and Character Vectors
Two specialized vector types that power filtering, categorization, and labeling — the bread and butter of real data work.
So far our vectors have held numbers. But in a real dataset, you will see two other types just as often:
- Logical vectors — every element is
TRUEorFALSE. These are the engine of filtering. - Character vectors — every element is a string. These hold names, categories, labels, IDs, anything textual.
This page is about both.
Logical vectors
A logical vector is just a vector whose elements are TRUE or
FALSE:
You rarely type logical vectors out by hand. Almost always, you produce them by comparing a vector to something:
Notice the result is a vector of TRUE/FALSE values, one per
input. Every comparison operator is vectorized, just like
arithmetic.
Counting and summarizing logicals
Here is one of the most useful R tricks of all time: sum() and
mean() work on logical vectors, because R quietly treats
TRUE as 1 and FALSE as 0.
Read that out loud:
sum(scores >= 80)= "count the scores that are 80 or above"mean(scores >= 80)= "proportion of scores that are 80 or above"any(scores == 100)= "is there at least one perfect score?"all(scores > 50)= "are they all above 50?"
These four lines cover an astonishing amount of real-world analytical questions.
Combining logical conditions
You can combine logicals with the operators & (and), | (or),
and ! (not):
Use & and | (single character) for vector-wise logic. R also
has && and || (double character), but these are for
single-value control flow and behave differently — for data
analysis, always use & and |.
Filtering with logicals
The real payoff: you can use a logical vector as an index into
another vector. R keeps only the positions where the logical is
TRUE.
This is the foundation of filtering. Later we will see how
dplyr::filter() builds on this exact pattern for whole data
frames — but understanding it on vectors first is what makes the
whole library click.
Character vectors
Strings in R live inside character vectors. Each element is a string enclosed in double or single quotes:
Like everything else in R, character functions are vectorized:
toupper() doesn't take one string — it takes a whole vector and
returns a transformed one.
Pasting strings together
paste() and paste0() glue strings (and any other type, after
coercion) into one:
The two arguments people most confuse are sep and collapse.
The rule is:
sep= "what goes between the parallel inputs"collapse= "what goes between the elements of the result, when folding into one string"
Detecting patterns in text
For checking which strings contain something, the workhorse is
grepl() ("global regular expression, logical"). It takes a
pattern and a vector of strings, and returns a logical vector.
(Setting fixed = TRUE tells grepl() to treat the pattern as a
plain string, not a regular expression. Regex is a whole topic of
its own — for now, fixed = TRUE keeps things simple.)
Categorical data: a peek at factors
When a character vector represents one of a fixed set of categories (like "low" / "medium" / "high"), R has a specialized type called a factor. We won't dwell on factors here, but quick exposure:
Factors look like character vectors but carry information about which categories are valid and in what order. They become important when you plot or run models — but for today, character vectors will do.
Test your understanding
Given x <- c(10, 20, 30, 40), what does sum(x > 15) return?
Hint: x > 15 is a logical vector, and sum() adds up TRUE (1) and FALSE (0) — it never sees the original numbers.
90 (the sum of values greater than 15)
3
TRUE
0.75
What does mean(c(TRUE, TRUE, FALSE, TRUE)) evaluate to?
3
0.75
TRUE
An error
Given names <- c("Ana", "Bo", "Cara") and ages <- c(40, 25, 33), what does names[ages > 30] return?
c(40, 33)
c(TRUE, FALSE, TRUE)
c("Ana", "Cara")
An error
Mini challenge: who passed?
You're given names and exam scores. Produce a character vector
passed containing only the names of students who scored 70 or
higher.
Using a logical index, build a character vector passed containing the names of every student who scored 70 or higher.
One last topic before we leave vectors behind: what happens when a
value is simply missing — R's special NA marker, and the
small set of rules for dealing with it.
Vectorized Computation
Why R lets you write `prices * 1.08` to add tax to a thousand prices at once — and why that one idea changes how you think about programming.
Missing Values (NA)
Real data is full of holes. R has a first-class concept — `NA` — for representing "I don't know," and a small set of rules for working with it correctly.