From S to R
The unlikely story of two New Zealand statisticians who wrote a teaching tool that quietly grew into the language of modern statistics, displacing the commercial system it was originally meant to imitate.
In 1991, Ross Ihaka and Robert Gentleman were colleagues in the statistics department at the University of Auckland, New Zealand. They were both interested in computing — Ihaka had a PhD in statistics from Berkeley, Gentleman had done postdoctoral work on genetics — and both had used the S language at previous institutions.
They had a small, local problem: they wanted to teach introductory statistics with a hands-on computer lab. They wanted students to play with data, not just read about it. The natural tool for this was S-PLUS — but Auckland could not afford site licenses for every student machine.
So they decided to write their own.
A weekend project that wouldn't end
They began on what they later described as a small experiment. They took inspiration from S in syntax — they wanted the language to feel familiar to anyone who already knew S — but built the implementation from scratch, drawing also on a Lisp-family language called Scheme for the way variables and environments worked under the hood.
They named it R: partly because both their first names started with R, partly as a nod to "S minus 1" in the alphabet, partly because, like S, the single-letter name was easy to type at a prompt.
The first internal version ran in late 1991. For several years it
was used only by themselves and their students. Then, in 1993, they
made a fateful decision: they posted the source code to the
StatLib archive at Carnegie Mellon, freely available, and sent
a quiet announcement to the s-news mailing list.
People started using it.
What made R take off
By the late 1990s, R was a curiosity in a fast-growing ecosystem of open-source tools (Linux had reached 1.0 in 1994, Apache in 1995, Perl was at its peak). But it was not obvious it would win. SAS was firmly entrenched in business. S-PLUS was the polished commercial version of S that already had everything R was trying to build. SPSS had captured the social sciences. Why did R succeed?
A few reasons stand out.
It was free, and the license meant it would stay that way. R was released under the GNU General Public License (GPL). This meant anyone — students, researchers, companies in developing countries, statisticians at small institutions — could install it, share it, and build on it without paying anyone. For a research field that runs on graduate students, this was decisive.
It was a real programming language. Unlike SAS or SPSS, where "writing your own analysis" felt like fighting the system, R was designed so that users could extend it themselves. Anyone could write a function, package it up, and share it with the world.
CRAN. In 1997, Kurt Hornik and Friedrich Leisch — Austrian statisticians collaborating with the R team — launched the Comprehensive R Archive Network. CRAN was a single, central repository where anyone could submit a package, and where anyone could install one with a single command. Combined with R's extensibility, this turned R into a platform — a place where the world's statistical methods were collected.
Academic credibility. R was built by statisticians, used by statisticians, and adopted in the statistics curriculum at one university after another through the 2000s. New PhDs entered industry already fluent in R, and brought it with them.
By 2010, CRAN had over 2,000 packages. By 2020, over 16,000. Today there are more than 20,000. Whatever obscure statistical method you need — from non-parametric regression to phylogenetic tree estimation — there is almost certainly an R package for it, often written by the original inventor of the method.
R was always an "S dialect"
Crucially, R was never advertised as "better than S." Ihaka and Gentleman wrote in their famous 1996 paper:
"We have developed a language for data analysis and graphics which is closely related to the S language… The new language, which we call R, attempts to be both compatible with S and different in important ways."
The intentional compatibility meant that code written in S could often run in R with little or no change. Books written for S were useful as books for R. Decades of accumulated wisdom about statistical computing came with the language for free.
Compare that to the cost of starting a brand new language: every book, every tutorial, every method would have had to be rewritten from scratch. R inherited the entire S culture and then expanded it.
In 2017, John Chambers — the original creator of S — gave the keynote at the useR! conference. He spoke generously about R as "the realization of S in the open-source world." S-PLUS, the commercial product, was already in decline. The free, community-built descendant had outgrown the commercial parent.
What R inherited from S — and what it added
Let's look at the very same code idea, written so it would run in S in 1985 or in R today. The semantics are nearly identical.
That same expression — weights ~ heights as a model formula —
was an S innovation. R inherited it directly. Notice how
readable it is: "regress weights on heights." That is the
fingerprint of a language designed by people who wanted to make
analysis feel close to how you'd describe it in a sentence.
But R added things S never had. A modern R script can use lazy evaluation tricks to build entire chains of operations:
The |> operator (called the native pipe, added to R in 4.1 in
2021) lets you read code left-to-right: "take mtcars, group by
cylinder, then summarize." That is not S syntax — it is a
modern R idiom built on top of decades of community experimentation.
The same query in raw S would have been three or four nested
function calls, much harder to read.
R today
In 2025 R is one of the two most-used languages in data science (the other being Python). They are often used together. R's strengths are:
- Statistical depth: if a method exists in academia, R probably has a high-quality implementation.
- Visualization:
ggplot2is widely considered the best general plotting library in any language. - Reproducibility: tools like
R MarkdownandQuartomake it easy to produce documents (reports, papers, slides) that include live code and embedded results. - Domain ecosystems: in genomics (Bioconductor), in epidemiology, in psychometrics, in econometrics, R is the lingua franca.
R is not the right tool for every problem. It is not what you would pick to build a web backend, a large machine-learning training pipeline, or a real-time control system. But for thinking with data — exploring, modeling, visualizing, communicating — it is extraordinarily well-suited.
Test your understanding
Who created R?
John Chambers and Rick Becker at Bell Labs.
Ross Ihaka and Robert Gentleman at the University of Auckland.
Hadley Wickham as part of the tidyverse project.
The R Foundation as an institutional project.
What is CRAN?
The interactive R user interface.
A book series about R.
The Comprehensive R Archive Network — a worldwide repository of R packages.
The compiler used to build the R interpreter.
Which of the following best explains why R has been so successful in academic statistics?
R is faster than any other language for numerical computing.
It is free, extensible, and lets users contribute packages that the entire community can use.
R was bundled with university computer science textbooks.
R is the only language that supports the t-test.
Mini challenge: build a simple analysis
Many famous datasets ship with R. One of them is women, with
heights and weights of 15 American women. Practice the
S-inherited idioms: subset a vector, compute a summary, and fit a
linear model.
Using the built-in women dataset (already loaded):
- Assign the women's average weight (
meanofwomen$weight) toavg_weight. - Fit a linear model
weight ~ heightand assign it tomodel. - Extract the slope coefficient (the "height" coefficient) into a numeric scalar
slope.
In the next chapter we will zoom out one more time and answer the "so what?" question — why R, specifically, matters in 2025, even in a world that also has Python, Julia, and a thousand other tools.
The Rise of Statistical Computing
How the arrival of mainframes and the invention of dedicated statistical languages — SPSS, SAS, and especially the S language at Bell Labs — transformed how data analysis was done.
Why R Matters Today
In a world full of programming languages, what is the case for R in 2025? A look at why statisticians, scientists, journalists, and analysts keep choosing it — and where it sits in the modern data-science stack.