Dataslope logoDataslope

Data Frames

R's spreadsheet-on-steroids. A data frame is just a collection of equal-length vectors — but that simple idea is enough to organize 90% of the data you'll ever work with.

A real dataset rarely lives as a single vector. It lives as a table: rows for observations, columns for variables. In R, that table is called a data frame — and it's the central data structure for nearly all statistical and data work.

A data frame is, fundamentally, just a list of equal-length vectors, displayed as a table. Each column is a vector. All columns must have the same length. That's it.

Creating a data frame from scratch

You can build one with data.frame():

Code Block
R 4.6.0

When you print a data frame in WebR, you get a real rendered HTML table. Each column has its own type — name and dept are character, salary is numeric, remote is logical.

The columns are vectors. The table just displays them side-by-side.

Inspecting a data frame

R provides a small toolkit of inspection functions. You will use these at the start of every real analysis:

Code Block
R 4.6.0

Of these, str() and summary() are the two you will use the most. str() shows the shape of the data; summary() shows the content.

Accessing columns

A data frame is a list of columns. You access a column with $ or with [["..."]]:

Code Block
R 4.6.0

Once you've grabbed a column, you're back in vector land — every trick from the last four pages works.

Indexing rows and columns: [row, col]

A data frame can also be indexed with two-dimensional [ ] notation. The convention is [rows, columns]. Leave a slot empty to mean "all":

Code Block
R 4.6.0

That last line is the classic data-frame filter: use a logical vector built from one of the columns to select rows.

Adding and modifying columns

Adding a new column is just like assigning to a name with $:

Code Block
R 4.6.0

Vectorized arithmetic works exactly as you'd hope — employees$salary * 0.10 produces a new vector of bonuses, which is then stored as a new column.

Built-in datasets: your sandbox

R comes with dozens of built-in datasets you can experiment with freely. A few favorites we'll use throughout the course:

Code Block
R 4.6.0

Real datasets, even classics like these, have quirks: missing values, weird scales, oddly-named columns. Half the joy of EDA is finding them.

Test your understanding

QuestionSelect one

Fundamentally, what is a data frame in R?

A single multi-dimensional array

An Excel file

A list of equal-length vectors (one per column), displayed as a table

A SQL table

QuestionSelect one

Which expression returns the mpg column of mtcars as a vector?

mtcars[mpg]

mtcars$mpg

mtcars.mpg

mtcars(mpg)

QuestionSelect one

What does mtcars[mtcars$mpg > 25, ] return?

The single value of mpg greater than 25

The column mpg filtered

All rows of mtcars where mpg > 25, with all columns kept

An error

Mini challenge: build and summarize a small data frame

Build a data frame students with columns name (character), grade (integer), and score (numeric), then compute the avg_score of all students.

Challenge
R 4.6.0
Your first data frame

Create a data frame called students with these three rows:

  • "Ada", grade 10, score 92
  • "Ben", grade 11, score 78
  • "Cleo", grade 10, score 85

Then assign avg_score to the mean of the score column.

Now that we can create and inspect data frames, the next page focuses on the very first thing a real data analyst does with a new dataset: look at it.

On this page