Variables and Assignment
How R stores and retrieves values by name, why R uses the strange-looking `<-` arrow, and how the simple act of naming things is the most underrated skill in programming.
Programming, fundamentally, is naming things and acting on those names. R's variable system is delightfully simple: you write a name, an arrow, and a value, and R remembers it.
The leftward-pointing arrow <- is R's assignment operator.
You will see it everywhere.
Why <- and not =?
If you have used most other languages, you have used = for
assignment. R does support = for assignment too — and it works in
most places. But the <- arrow is strongly preferred, for two
historical and one practical reason:
- History. S used
<-from the start (it was originally typed as a single character on APL keyboards back in the 1970s). R inherited the convention. - Distinctness from
==. In R,x = 5andx == 5are both meaningful (assignment vs. comparison). The arrow<-removes any ambiguity at a glance. - Distinctness from function arguments. In a function call like
mean(x = 1:10), the=is not assignment — it is naming an argument. Reserving<-for true assignment keeps the two worlds visually separate.
Most modern editors (including the editor on this page) let you
type <- with a single shortcut. In RStudio it is Alt + -. In
this course we will use <- everywhere.
What is a "valid" variable name?
R variable names can include letters, numbers, ., and _, but
must start with a letter or a dot. They are case-sensitive.
Some examples:
x # ok
my_var # ok and preferred style
myVar # also ok (camelCase)
my.var # ok in R (the dot is allowed!)
.x # ok (starts with a dot — used for "hidden" variables)
2x # NOT ok — cannot start with a number
my-var # NOT ok — `-` is the subtraction operatorA few names are reserved by the language and cannot be used as
variables: if, else, for, while, function, return,
TRUE, FALSE, NULL, NA, Inf, NaN. (Mostly you would
never want to use these as variable names anyway.)
The diagram captures everything that happens. The "workspace" (formally the global environment) is just a key-value store: names point to values. When R sees a name in an expression, it looks it up in the workspace.
The naming question is the whole game
Most beginners think the hard part is the syntax. It is not. The hard part is naming.
Compare these two equivalent computations:
# Version A — terrible names
x <- read.csv("data.csv")
y <- x[x$z > 100, ]
m <- mean(y$w)# Version B — meaningful names
sales <- read.csv("data.csv")
big_orders <- sales[sales$amount > 100, ]
avg_revenue <- mean(big_orders$revenue)The two versions are byte-identical to the computer in what they do. They are worlds apart for a human reader. Reading version B, you can almost guess the analysis without thinking. Reading version A, you have to work out what each name means as you go.
Three quick guidelines for naming:
- Use descriptive names. Prefer
customer_ageoverx. - Use lowercase with underscores.
customer_age, notCustomerAgeorcustomerAge. (Both styles work, butlower_snake_caseis the dominant convention in modern R.) - Save abbreviations for things that are common in your domain.
bmiis fine.cagfor "customer age group" is not — writecustomer_age_group.
Assignment is silent
When you assign, R does not print anything. Only by referencing the variable (or by wrapping the assignment in parentheses) do you see the value.
The "wrap in parentheses" trick is sometimes useful for quick interactive exploration. Most of the time, though, you will assign on one line and inspect on the next.
Updating variables
You can overwrite a variable by assigning again. R does not warn you — it just replaces the old value.
Notice the line count <- count + 1. This is not a math
equation (which would be nonsense — no number equals itself plus
one). It is a command: "compute count + 1, then store the
result back into count." The right side is evaluated first, the
left side receives the result. This pattern shows up constantly.
What a "value" really is
When you write x <- c(1, 2, 3), R does three things:
- Creates a vector with the three numbers.
- Stores that vector somewhere in memory.
- Adds an entry to your workspace mapping the name
xto that stored vector.
When you later write x + 1, R:
- Looks up
xin the workspace. - Finds the vector it points to.
- Evaluates the expression on that vector.
This is a mental model that pays off later. When we talk about
copying data, modifying data, and what happens "behind the
scenes" inside dplyr pipelines, you will be able to reason
about it because you understand that names are pointers to
values, not values themselves.
A subtle but important point: copy on assignment
In R, when you assign one variable to another, you generally get an independent copy. Changing one does not change the other.
This copy-on-modify behavior is one of R's most thoughtful design choices. It makes R code very safe: you can pass a variable into a function, and the function cannot secretly mutate your data. The downside is that R can be memory-inefficient for very large datasets — but for the kind of work in this course, it's exactly what you want.
(Many other languages — Python, Java, JavaScript — work differently here. If you are coming from one of those, this is one of R's most pleasant surprises.)
Listing and removing variables
You can ask R what variables exist in the current workspace, and you can remove them.
In WebR, every code block starts with a fresh workspace, so you
rarely need rm() interactively. But in long sessions or shared
scripts it can be useful for cleaning up.
Test your understanding
Which line of R assigns the value 7 to a variable called count?
count == 7
count <- 7
count ~ 7
count := 7
After running:
x <- 5
y <- x
y <- y + 1
What is the value of x?
6
5
NA
An error
Which of these is not a valid R variable name?
my_var
avg.price
2nd_quarter
.hidden
Why does R conventionally use <- for assignment instead of =?
= is forbidden in R.
To keep assignment visually distinct from comparison (==) and from named function arguments (mean(x = 1:10)).
Because = is reserved for SQL.
Because R does not understand the = symbol.
Mini challenge: a running budget
Define a variable budget (your starting balance), then make
three "purchase" assignments that each subtract some amount from
budget. After the last assignment, budget should hold the
correct remaining balance.
Start with budget <- 100. Then make three updates so that after all three:
- subtract 25 (groceries)
- subtract 12 (coffee)
- subtract 8 (parking)
After all three updates, budget should equal 100 - 25 - 12 - 8 = 55.
With variables in hand, we are ready to talk about R's most distinctive design choice: the fact that everything is a vector. That is the next page.
R as a Calculator
Get fluent with R's basic arithmetic and comparison operators. Before R is "the language of data science," it is the world's most powerful pocket calculator — and the calculator habits matter.
Vectors Everywhere
The single most important fact about R — there are no "scalars," only vectors. Once this clicks, the entire language makes sense.