Practical R for Beginners

The Age of Data Statistics Before Computers The Rise of Statistical Computing From S to R Why R Matters Today Reproducible Analysis

Thinking in Data Your First R Program R as a Calculator Variables and Assignment

Vectors Everywhere Vectorized Computation Logical and Character Vectors Missing Values (NA)

Data Frames Inspecting a Dataset Subsetting and Filtering Tidy Data Principles

The dplyr Verbs Grouped Analysis Reshaping Data

Summary Statistics Exploring Distributions Relationships Between Variables

Principles of Visualization The ggplot2 Grammar Interpreting Plots

Uncertainty and Variability Sampling and Distributions Intuition for Inference

Writing Your Own Functions Scripts and Projects

Mini Project Walkthrough Next Steps

Intuition for Inference

Confidence intervals and p-values are the lingua franca of applied statistics — and the most misinterpreted ideas in all of science. Let's build correct intuition for what they really mean.

You've now seen that any statistic computed from a sample wiggles. Inference is the art of saying useful things about a population despite that wiggling.

We'll cover three workhorse ideas:

Confidence intervals — a range of plausible values for an unknown parameter.
p-values — how surprising your data would be under a skeptical "no effect" assumption.
Effect size — how big a difference is, separate from how confident we are that it exists.

We'll do all of them in R using built-in functions like t.test(). The math is secondary; the meaning is the goal.

A confidence interval, plainly

A 95% confidence interval for a parameter is a range constructed from your data such that, across many hypothetical repetitions of the study, 95% of intervals constructed this way would contain the true parameter.

Read that carefully. It is not "there's a 95% chance the true value is in this specific interval." The true value is fixed; the interval is what wiggles from sample to sample.

The intuition is easier to see in a simulation:

Code Block

R 4.6.0

Run it a few times. You'll see the fraction of CIs containing the truth hovers around 0.95. That's what "95% confidence" means. Each individual CI either covers the truth or doesn't — we just don't know which.

Visualize it:

Code Block

R 4.6.0

Most lines cross the red truth-line. A few miss. That's sampling variability made visible.

A p-value, plainly

A p-value is the probability, if there were no real effect, of seeing a result as extreme as the one you got — or more extreme.

Small p-value → your observed result is surprising under "no effect" → you have evidence of some effect.

Large p-value → your observed result is unsurprising under "no effect" → you have no strong evidence of an effect (but also no strong evidence against one — absence of evidence is not evidence of absence).

Let's run a quick test:

Code Block

R 4.6.0

The output gives you:

t: a standardized measure of how far apart the groups look relative to the noise.
df: degrees of freedom (sample-size-related).
p-value: how surprising this gap (or bigger) would be if the two populations actually had identical means.
95 percent confidence interval: a plausible range for the true difference in means.

If p < 0.05, by convention we say the result is "statistically significant." That convention is useful — but also widely abused.

Three things p-values do NOT mean

NOT "the probability the null hypothesis is true."
NOT "the probability your result was a fluke."
NOT "the probability your finding will replicate."

A p-value answers exactly one question: given a specific skeptical assumption (the null), how surprising is what we saw? That's it.

Effect size: significance ≠ importance

Two studies can both have p < 0.001 and tell wildly different stories:

Code Block

R 4.6.0

A and B might both look "highly significant" by p-value, but the effect sizes differ by ~100×. A is a real but trivial difference; B is a real and large one. Always report — and think about — the effect, not just the p.

A complete inference workflow

Putting the pieces together: load data, summarize, plot, test, interpret.

Code Block

R 4.6.0

A grounded reading of that output:

The boxplot suggests group 2 trends higher on average.
The estimated difference is reported, with a CI.
If p < 0.05, we have decent evidence the two drugs differ in their effect — but always re-check the CI to see by how much.

That last step — looking at the CI, not only the p-value — is the single best habit you can develop. The CI tells you both direction and magnitude of the difference and how much it could plausibly be larger or smaller.

Common pitfalls

p-hacking. Trying many tests and reporting only the "significant" ones. With 20 random tests under no-effect, you expect about 1 to land below p=0.05 by chance.
Overinterpreting non-significance. "p = 0.07" is not "no effect" — it's "we couldn't reliably distinguish it from zero with this sample size."
Ignoring sample size. With huge n, any tiny gap becomes significant. With small n, even large real effects can be missed.
Confusing CI with prediction interval. A CI is about the parameter, not about individual future values.

Test your understanding

QuestionSelect one

"A 95% confidence interval for the mean is [45, 55]." The most accurate reading is:

Hint: the true mean is a fixed (if unknown) number; what changes from study to study is the interval you compute.

There is a 95% chance the true mean is between 45 and 55.

95% of the data falls between 45 and 55.

If we repeated the study many times and built CIs the same way each time, about 95% of those intervals would contain the true mean.

The mean is exactly 50.

QuestionSelect one

A study reports p = 0.001 for a difference in means. Which is correct?

Hint: a p-value assumes the null hypothesis is true and asks how surprising the data would be under it — not how likely the null itself is.

The probability the null hypothesis is true is 0.001.

The probability the finding is a fluke is 0.001.

If there were no real difference between the groups, the chance of seeing a difference at least this extreme would be about 0.001.

The difference is definitely large.

QuestionSelect one

Why is it important to report effect size in addition to a p-value?

p-values are unreliable.

Effect size determines the p-value entirely.

A result can be statistically significant (especially with large n) yet practically meaningless — effect size tells you whether it matters in the real world.

Confidence intervals are not used in practice.

Mini challenge: full t-test workflow

Using the built-in mtcars dataset, test whether cars with automatic transmission (am == 0) and manual transmission (am == 1) have different mean miles-per-gallon. Save the output of t.test() to a variable tt, and pull out:

est_diff — the difference of sample means (manual − auto)
ci — the 95% confidence interval for that difference (length-2 numeric)
p — the p-value

Challenge

R 4.6.0

t-test workflow on mtcars

Run t.test(mpg ~ am, data = mtcars) and from its returned object set: tt to the test result, est_diff to mean(manual) - mean(automatic) (note: am = 1 is manual, am = 0 is automatic), ci to the confidence interval as a length-2 numeric, and p to the p-value.

You now understand the conceptual core of statistical inference: sampling, uncertainty, intervals, and tests. The next two pages shift back into the craft of writing analysis code that's clean, reusable, and reproducible.

Sampling and Distributions

Almost every dataset is a sample drawn from some bigger population. Understanding sampling — and the surprisingly orderly behavior of sample averages — turns raw data into evidence.

Writing Your Own Functions

Functions are how analysis code stays understandable as it grows. Learn to write small, well-named functions that capture intent instead of copy-pasting logic.

On this page

A confidence interval, plainly A p-value, plainly Three things p-values do NOT mean Effect size: significance ≠ importance A complete inference workflow Common pitfalls Test your understanding Mini challenge: full t-test workflow