Practical R for Beginners

The Age of Data Statistics Before Computers The Rise of Statistical Computing From S to R Why R Matters Today Reproducible Analysis

Thinking in Data Your First R Program R as a Calculator Variables and Assignment

Vectors Everywhere Vectorized Computation Logical and Character Vectors Missing Values (NA)

Data Frames Inspecting a Dataset Subsetting and Filtering Tidy Data Principles

The dplyr Verbs Grouped Analysis Reshaping Data

Summary Statistics Exploring Distributions Relationships Between Variables

Principles of Visualization The ggplot2 Grammar Interpreting Plots

Uncertainty and Variability Sampling and Distributions Intuition for Inference

Writing Your Own Functions Scripts and Projects

Mini Project Walkthrough Next Steps

Statistics Before Computers

A short history of statistics from the gambling tables of the 17th century to the rooms full of human "computers" who carried out calculations by hand — and why their constraints still shape how we work today.

The word "computer" used to mean a person.

If you read a scientific paper from the 1930s and it says "the computations were performed by Miss J. Smith," it does not mean some machine. It means a real woman, sitting at a desk with a mechanical adding machine and a stack of paper, working through equations by hand for weeks at a time.

To appreciate why R looks the way it does, it helps to remember what statistics was like in that world.

A craft built around scarcity

The intellectual roots of statistics go back to the 1600s, when a small group of mathematicians began studying games of chance.

Blaise Pascal and Pierre de Fermat corresponded in 1654 about how to fairly split a wager when a game was interrupted — inventing the basic logic of probability in the process.
John Graunt published Natural and Political Observations Made upon the Bills of Mortality in 1662, using parish death records to argue, for the first time, that you could learn things about populations by carefully counting them.
Jakob Bernoulli proved (around 1700) the law of large numbers: if you flip a fair coin enough times, the proportion of heads will reliably approach one half. This sounds obvious now — but it was the first mathematical bridge between probability (a theoretical idea) and data (what we actually observe).

For the next two hundred years, statistics grew slowly. A few names you may have heard of:

Carl Friedrich Gauss (early 1800s) — developed least-squares estimation while trying to predict the orbit of the asteroid Ceres from a handful of telescope sightings.
Adolphe Quetelet (1830s) — coined "social physics" and introduced the idea of the average person.
Francis Galton (late 1800s) — invented regression and correlation while studying inheritance.
Karl Pearson and R. A. Fisher (early 1900s) — built the foundations of modern statistical inference.

Every one of these breakthroughs was achieved without a computer. The arithmetic was done by hand, sometimes with mechanical calculators (the Brunsviga, the Marchant, the Friden), but most of the thinking — and most of the bookkeeping — was paper-and-pencil work.

What a real analysis looked like in 1920

Imagine you are R. A. Fisher, working at Rothamsted Experimental Station in the 1920s. You want to know whether a new fertilizer improves wheat yields. You design an experiment with, say, 8 varieties of wheat planted in 5 plots each — 40 plots total. At harvest, you have 40 numbers.

The analysis you want to perform is what we now call ANOVA (analysis of variance) — a way to ask "is the difference between varieties bigger than the random variation between plots?" The formula involves computing sums, sums of squares, and ratios of those quantities.

For just 40 numbers, this might be:

An hour to enter the numbers neatly into a worksheet (probably twice, by two different people, to catch transcription errors).
Two hours to compute the various sums of squares by hand.
An hour to assemble the ANOVA table and look up the F-statistic in a printed table.

That is about half a day of careful work to analyze one experiment. And Fisher actually did this kind of thing routinely. The famous Statistical Methods for Research Workers (1925) is full of worked examples that were each many hours of arithmetic.

Why statistical methods are so 'computational'

This is why many classical statistical techniques have such strange-looking shortcuts and rules of thumb. They were not designed for elegance — they were designed to minimize the number of multiplications a human had to do by hand. When you see weird formulas in old textbooks, that is often what you are looking at: hand-calculation tricks frozen into the method.

Tables, slides, and human computers

To make this feasible, the statisticians of the early 20th century built three kinds of tools.

Printed tables. Books of values for the normal distribution, the t-distribution, the F-distribution, the chi-squared distribution — all computed once, painstakingly, and printed so that everyone else could look up the answer instead of computing it. Some of these tables took years of work to produce.

Mechanical calculators. The Brunsviga (introduced in 1892) and its successors let you crank a handle to multiply or divide. Logarithms and slide rules helped further. By the 1930s a typical research office had several of these.

Human computers. Large projects — astronomy, ballistics, actuarial science — were carried out by teams of (often female) mathematicians doing arithmetic in parallel. NASA's early space programs ran this way well into the 1960s, as immortalized in the book and film Hidden Figures.

That whole pipeline — every step — was manual. If you found a transcription error at step E, you might have to redo hours of work.

A taste of "manual statistics" in R

Let us reproduce a calculation Fisher might have done. We have yields from 5 plots growing two different varieties of wheat and want to know whether they look reliably different.

Code Block

R 4.6.0

Every line above is a step Fisher would have done by hand with a pencil and a Brunsviga. With R, it took us a second. And — crucially — if we want to check this against R's built-in t-test, it is one more line:

Code Block

R 4.6.0

Initialization code (R)read-only

Look at the t value in the output — it matches what we computed manually. That is the gift of programming for statistics: you can build up an answer the long way to understand it, then trust the short way for production.

What carried over — and what didn't

The constraints of the pre-computer era shaped statistics in ways that are still with us:

The obsession with small samples. Many classical tests (the t-test, ANOVA) were designed for n=10 or n=30, not n=10,000. That is because hand-computing anything bigger was infeasible.
The use of closed-form formulas. A formula is something you can compute once and be done. Modern alternatives — bootstrapping, simulation, Bayesian sampling — require thousands of computations per analysis and were simply impossible before computers.
The cultural emphasis on publication of methods, not data. Datasets were small enough to fit in a paper appendix. There was no need for a system to share gigabyte-scale data files.

But once computers arrived in the second half of the 20th century, the possibilities of statistics exploded. Suddenly an analyst could try methods that had been theoretically known for decades but practically unreachable. We will see that next.

Test your understanding

QuestionSelect one

Why were many classical statistical methods designed around very small sample sizes (say, n = 10 to n = 30)?

Larger samples are statistically invalid.

Hand-computation made larger samples impractical at the time.

Statisticians had not yet discovered that more data is better.

The math was always intended for small n on philosophical grounds.

QuestionSelect one

In the early 1900s, what did the word "computer" usually refer to?

A mainframe machine in a glass-walled room.

A research grant for computation.

A person — often a woman — whose job was to do arithmetic.

A pocket calculator.

QuestionSelect one

Why did pre-computer statisticians rely so heavily on printed tables of distribution values (e.g. the normal, t, F, chi-squared distributions)?

The math was too complicated to ever understand.

It was illegal to compute these values yourself.

Computing those values from scratch was prohibitively expensive — so once computed, the answers were published once and reused everywhere.

They contained errors that statisticians could correct over time.

A small history-flavored challenge

In the 1908 paper that introduced the t-test, William Sealy Gosset (writing under the pen-name "Student") used a dataset of yields from a barley experiment. Let's reproduce a tiny version. Compute and store three quantities about the vector yield: the mean, the standard deviation, and the t-statistic against the hypothesis that the true mean is 40.

Challenge

R 4.6.0

Compute a one-sample t-statistic by hand

Given the vector yield (already loaded), define:

m — the mean of yield
s — the standard deviation of yield
t_stat — the one-sample t-statistic, (m - 40) / (s / sqrt(n)), where n is the length of yield.

You can use base R only — no packages needed.

In the next chapter, we will see how the arrival of computers turned this slow, careful craft into something altogether new — and why the most influential of those tools came from an unexpected place: Bell Labs.

The Age of Data

How data went from a luxury that scientists collected by hand to a flood that drives science, business, and everyday life — and why this changed everything about how we analyze it.

The Rise of Statistical Computing

How the arrival of mainframes and the invention of dedicated statistical languages — SPSS, SAS, and especially the S language at Bell Labs — transformed how data analysis was done.

On this page

A craft built around scarcity What a real analysis looked like in 1920 Tables, slides, and human computers A taste of "manual statistics" in R What carried over — and what didn't Test your understanding A small history-flavored challenge