Practical R for Beginners

The Age of Data Statistics Before Computers The Rise of Statistical Computing From S to R Why R Matters Today Reproducible Analysis

Thinking in Data Your First R Program R as a Calculator Variables and Assignment

Vectors Everywhere Vectorized Computation Logical and Character Vectors Missing Values (NA)

Data Frames Inspecting a Dataset Subsetting and Filtering Tidy Data Principles

The dplyr Verbs Grouped Analysis Reshaping Data

Summary Statistics Exploring Distributions Relationships Between Variables

Principles of Visualization The ggplot2 Grammar Interpreting Plots

Uncertainty and Variability Sampling and Distributions Intuition for Inference

Writing Your Own Functions Scripts and Projects

Mini Project Walkthrough Next Steps

From S to R

The unlikely story of two New Zealand statisticians who wrote a teaching tool that quietly grew into the language of modern statistics, displacing the commercial system it was originally meant to imitate.

In 1991, Ross Ihaka and Robert Gentleman were colleagues in the statistics department at the University of Auckland, New Zealand. They were both interested in computing — Ihaka had a PhD in statistics from Berkeley, Gentleman had done postdoctoral work on genetics — and both had used the S language at previous institutions.

They had a small, local problem: they wanted to teach introductory statistics with a hands-on computer lab. They wanted students to play with data, not just read about it. The natural tool for this was S-PLUS — but Auckland could not afford site licenses for every student machine.

So they decided to write their own.

A weekend project that wouldn't end

They began on what they later described as a small experiment. They took inspiration from S in syntax — they wanted the language to feel familiar to anyone who already knew S — but built the implementation from scratch, drawing also on a Lisp-family language called Scheme for the way variables and environments worked under the hood.

They named it R: partly because both their first names started with R, partly as a nod to "S minus 1" in the alphabet, partly because, like S, the single-letter name was easy to type at a prompt.

The first internal version ran in late 1991. For several years it was used only by themselves and their students. Then, in 1993, they made a fateful decision: they posted the source code to the StatLib archive at Carnegie Mellon, freely available, and sent a quiet announcement to the s-news mailing list.

People started using it.

What made R take off

By the late 1990s, R was a curiosity in a fast-growing ecosystem of open-source tools (Linux had reached 1.0 in 1994, Apache in 1995, Perl was at its peak). But it was not obvious it would win. SAS was firmly entrenched in business. S-PLUS was the polished commercial version of S that already had everything R was trying to build. SPSS had captured the social sciences. Why did R succeed?

A few reasons stand out.

It was free, and the license meant it would stay that way. R was released under the GNU General Public License (GPL). This meant anyone — students, researchers, companies in developing countries, statisticians at small institutions — could install it, share it, and build on it without paying anyone. For a research field that runs on graduate students, this was decisive.

It was a real programming language. Unlike SAS or SPSS, where "writing your own analysis" felt like fighting the system, R was designed so that users could extend it themselves. Anyone could write a function, package it up, and share it with the world.

CRAN. In 1997, Kurt Hornik and Friedrich Leisch — Austrian statisticians collaborating with the R team — launched the Comprehensive R Archive Network. CRAN was a single, central repository where anyone could submit a package, and where anyone could install one with a single command. Combined with R's extensibility, this turned R into a platform — a place where the world's statistical methods were collected.

Academic credibility. R was built by statisticians, used by statisticians, and adopted in the statistics curriculum at one university after another through the 2000s. New PhDs entered industry already fluent in R, and brought it with them.

By 2010, CRAN had over 2,000 packages. By 2020, over 16,000. Today there are more than 20,000. Whatever obscure statistical method you need — from non-parametric regression to phylogenetic tree estimation — there is almost certainly an R package for it, often written by the original inventor of the method.

R was always an "S dialect"

Crucially, R was never advertised as "better than S." Ihaka and Gentleman wrote in their famous 1996 paper:

"We have developed a language for data analysis and graphics which is closely related to the S language… The new language, which we call R, attempts to be both compatible with S and different in important ways."

The intentional compatibility meant that code written in S could often run in R with little or no change. Books written for S were useful as books for R. Decades of accumulated wisdom about statistical computing came with the language for free.

Compare that to the cost of starting a brand new language: every book, every tutorial, every method would have had to be rewritten from scratch. R inherited the entire S culture and then expanded it.

In 2017, John Chambers — the original creator of S — gave the keynote at the useR! conference. He spoke generously about R as "the realization of S in the open-source world." S-PLUS, the commercial product, was already in decline. The free, community-built descendant had outgrown the commercial parent.

What R inherited from S — and what it added

Let's look at the very same code idea, written so it would run in S in 1985 or in R today. The semantics are nearly identical.

Code Block

R 4.6.0

That same expression — weights ~ heights as a model formula — was an S innovation. R inherited it directly. Notice how readable it is: "regress weights on heights." That is the fingerprint of a language designed by people who wanted to make analysis feel close to how you'd describe it in a sentence.

But R added things S never had. A modern R script can use lazy evaluation tricks to build entire chains of operations:

Code Block

R 4.6.0

The |> operator (called the native pipe, added to R in 4.1 in 2021) lets you read code left-to-right: "take mtcars, group by cylinder, then summarize." That is not S syntax — it is a modern R idiom built on top of decades of community experimentation. The same query in raw S would have been three or four nested function calls, much harder to read.

R today

In 2025 R is one of the two most-used languages in data science (the other being Python). They are often used together. R's strengths are:

Statistical depth: if a method exists in academia, R probably has a high-quality implementation.
Visualization: ggplot2 is widely considered the best general plotting library in any language.
Reproducibility: tools like R Markdown and Quarto make it easy to produce documents (reports, papers, slides) that include live code and embedded results.
Domain ecosystems: in genomics (Bioconductor), in epidemiology, in psychometrics, in econometrics, R is the lingua franca.

R is not the right tool for every problem. It is not what you would pick to build a web backend, a large machine-learning training pipeline, or a real-time control system. But for thinking with data — exploring, modeling, visualizing, communicating — it is extraordinarily well-suited.

Test your understanding

QuestionSelect one

Who created R?

John Chambers and Rick Becker at Bell Labs.

Ross Ihaka and Robert Gentleman at the University of Auckland.

Hadley Wickham as part of the tidyverse project.

The R Foundation as an institutional project.

QuestionSelect one

What is CRAN?

The interactive R user interface.

A book series about R.

The Comprehensive R Archive Network — a worldwide repository of R packages.

The compiler used to build the R interpreter.

QuestionSelect one

Which of the following best explains why R has been so successful in academic statistics?

R is faster than any other language for numerical computing.

It is free, extensible, and lets users contribute packages that the entire community can use.

R was bundled with university computer science textbooks.

R is the only language that supports the t-test.

Mini challenge: build a simple analysis

Many famous datasets ship with R. One of them is women, with heights and weights of 15 American women. Practice the S-inherited idioms: subset a vector, compute a summary, and fit a linear model.

Challenge

R 4.6.0

Fit a height/weight model

Using the built-in women dataset (already loaded):

Assign the women's average weight (mean of women$weight) to avg_weight.
Fit a linear model weight ~ height and assign it to model.
Extract the slope coefficient (the "height" coefficient) into a numeric scalar slope.

In the next chapter we will zoom out one more time and answer the "so what?" question — why R, specifically, matters in 2025, even in a world that also has Python, Julia, and a thousand other tools.

The Rise of Statistical Computing

How the arrival of mainframes and the invention of dedicated statistical languages — SPSS, SAS, and especially the S language at Bell Labs — transformed how data analysis was done.

Why R Matters Today

In a world full of programming languages, what is the case for R in 2025? A look at why statisticians, scientists, journalists, and analysts keep choosing it — and where it sits in the modern data-science stack.

On this page

A weekend project that wouldn't end What made R take off R was always an "S dialect"What R inherited from S — and what it added R today Test your understanding Mini challenge: build a simple analysis