Practical R for Beginners

The Age of Data Statistics Before Computers The Rise of Statistical Computing From S to R Why R Matters Today Reproducible Analysis

Thinking in Data Your First R Program R as a Calculator Variables and Assignment

Vectors Everywhere Vectorized Computation Logical and Character Vectors Missing Values (NA)

Data Frames Inspecting a Dataset Subsetting and Filtering Tidy Data Principles

The dplyr Verbs Grouped Analysis Reshaping Data

Summary Statistics Exploring Distributions Relationships Between Variables

Principles of Visualization The ggplot2 Grammar Interpreting Plots

Uncertainty and Variability Sampling and Distributions Intuition for Inference

Writing Your Own Functions Scripts and Projects

Mini Project Walkthrough Next Steps

Data Frames

R's spreadsheet-on-steroids. A data frame is just a collection of equal-length vectors — but that simple idea is enough to organize 90% of the data you'll ever work with.

A real dataset rarely lives as a single vector. It lives as a table: rows for observations, columns for variables. In R, that table is called a data frame — and it's the central data structure for nearly all statistical and data work.

A data frame is, fundamentally, just a list of equal-length vectors, displayed as a table. Each column is a vector. All columns must have the same length. That's it.

Creating a data frame from scratch

You can build one with data.frame():

Code Block

R 4.6.0

When you print a data frame in WebR, you get a real rendered HTML table. Each column has its own type — name and dept are character, salary is numeric, remote is logical.

The columns are vectors. The table just displays them side-by-side.

Inspecting a data frame

R provides a small toolkit of inspection functions. You will use these at the start of every real analysis:

Code Block

R 4.6.0

Of these, str() and summary() are the two you will use the most. str() shows the shape of the data; summary() shows the content.

Accessing columns

A data frame is a list of columns. You access a column with $ or with [["..."]]:

Code Block

R 4.6.0

Once you've grabbed a column, you're back in vector land — every trick from the last four pages works.

Indexing rows and columns: `[row, col]`

A data frame can also be indexed with two-dimensional [ ] notation. The convention is [rows, columns]. Leave a slot empty to mean "all":

Code Block

R 4.6.0

That last line is the classic data-frame filter: use a logical vector built from one of the columns to select rows.

Adding and modifying columns

Adding a new column is just like assigning to a name with $:

Code Block

R 4.6.0

Vectorized arithmetic works exactly as you'd hope — employees$salary * 0.10 produces a new vector of bonuses, which is then stored as a new column.

Built-in datasets: your sandbox

R comes with dozens of built-in datasets you can experiment with freely. A few favorites we'll use throughout the course:

Code Block

R 4.6.0

Real datasets, even classics like these, have quirks: missing values, weird scales, oddly-named columns. Half the joy of EDA is finding them.

Test your understanding

QuestionSelect one

Fundamentally, what is a data frame in R?

A single multi-dimensional array

An Excel file

A list of equal-length vectors (one per column), displayed as a table

A SQL table

QuestionSelect one

Which expression returns the mpg column of mtcars as a vector?

mtcars[mpg]

mtcars$mpg

mtcars.mpg

mtcars(mpg)

QuestionSelect one

What does mtcars[mtcars$mpg > 25, ] return?

The single value of mpg greater than 25

The column mpg filtered

All rows of mtcars where mpg > 25, with all columns kept

An error

Mini challenge: build and summarize a small data frame

Build a data frame students with columns name (character), grade (integer), and score (numeric), then compute the avg_score of all students.

Challenge

R 4.6.0

Your first data frame

Create a data frame called students with these three rows:

"Ada", grade 10, score 92
"Ben", grade 11, score 78
"Cleo", grade 10, score 85

Then assign avg_score to the mean of the score column.

Now that we can create and inspect data frames, the next page focuses on the very first thing a real data analyst does with a new dataset: look at it.

Missing Values (NA)

Real data is full of holes. R has a first-class concept — `NA` — for representing "I don't know," and a small set of rules for working with it correctly.

Inspecting a Dataset

Before you analyze a dataset, you have to *meet* it. The five-minute ritual every analyst performs the moment a new dataset lands on their desk.

On this page

Creating a data frame from scratch Inspecting a data frame Accessing columns Indexing rows and columns: [row, col]Adding and modifying columns Built-in datasets: your sandbox Test your understanding Mini challenge: build and summarize a small data frame