Vectorized Computation
Why R lets you write `prices * 1.08` to add tax to a thousand prices at once — and why that one idea changes how you think about programming.
In most programming languages, if you want to add 1 to every number in a list, you write a loop:
for each item in list:
item = item + 1In R, you write:
xs + 1That's it. No loop. R applies the + 1 to every element of xs
automatically. This is called vectorized computation, and once
you internalize it, your R code becomes shorter, clearer, and
often much faster.
Arithmetic on whole vectors
Every basic operator works element-by-element on vectors:
No loops, no temporary variables. The expression describes the shape of the answer, and R fills it in.
Two vectors of the same length
When you combine two vectors, R lines them up element-by-element:
This is the heart of how data analysts think: you have parallel columns, and you operate on them as wholes.
Recycling: when lengths differ
If the two vectors are different lengths, R recycles the shorter one — it repeats it as needed to match the length of the longer one.
This is most useful in the very common case where one vector is length 1 (a "scalar"). When the lengths don't divide evenly, R will warn you — that warning almost always means a bug:
The result is computed (R recycles (10, 20) to (10, 20, 10, 20, 10) and adds), but the warning is R saying: "Are you sure you
meant this?" Almost always: no. Either trim or extend one of the
vectors deliberately.
Built-in summary functions
R has a small army of functions that take a vector and return a single summary value. You will use these constantly:
Notice how every one of these functions has the same shape: vector in, summary out. Once you have this mental model, half of "data analysis in R" is just knowing which summary function to call.
Cumulative and rolling operations
A few functions take a vector in and give a vector of the same length out, computing as they go:
A small but mighty example: standardizing a vector
A common data-prep step is standardizing a vector — subtracting the mean and dividing by the standard deviation, so the result has mean 0 and standard deviation 1.
Three lines, no loops. This is the archetypal R operation: a small chain of vectorized expressions producing a transformed column.
Why vectorized code is faster, too
In most interpreted languages, a for loop in user code has
overhead at every step. In R, every vectorized operation is
implemented in highly-optimized C under the hood — when you write
x + y, R does the whole computation in one call to a fast
internal routine.
This means vectorized R is often 10–100x faster than the equivalent loop, on top of being shorter and easier to read.
You can — and sometimes should — write loops in R. But your first instinct should always be: "is there a vectorized way to express this?" 95% of the time, there is.
Test your understanding
What does c(1, 2, 3) * 10 return?
60 (the product of all elements times 10)
10 20 30
An error — different lengths
c(10, 2, 3)
What value does mean(c(2, 4, 6, 8)) produce?
4
5
6
20
In R, you almost never need to write a for loop to apply an operation to every element of a vector. Why?
Because for loops are forbidden in R.
Because operations and most built-in functions are vectorized — they automatically apply to every element.
Because R secretly rewrites loops as parallel code.
Because R cannot iterate over vectors.
Mini challenge: convert Celsius to Fahrenheit
Given a vector of temperatures in Celsius, produce a vector of
temperatures in Fahrenheit using the formula F = C * 9/5 + 32.
Given celsius, compute fahrenheit as a vector. No loops needed — use vectorized arithmetic.
Next we will look at the two specialized vector types that get heavy use in data work: logical vectors (which power filtering) and character vectors (which carry every label, name, and category in your dataset).