Aggregation Basics

Sum, mean, median, min, max — the simple reductions that turn many rows into one number, and the subtle choices that change what they mean.

An aggregation turns many values into one. The mean of a column, the count of rows, the maximum price, the sum of revenue — these are all aggregations.

This chapter covers single-Series aggregations. The next chapter introduces groupby, which is aggregations per group.

The reduction methods

Every numeric Series supports these:

Method	Returns the…
`.sum()`	Sum
`.mean()`	Arithmetic mean
`.median()`	Middle value
`.min()`	Minimum
`.max()`	Maximum
`.std()`	Standard deviation
`.var()`	Variance
`.quantile(q)`	Quantile (e.g. 0.25)
`.count()`	Number of non-null values
`.nunique()`	Number of distinct values
`.first()`	First value
`.last()`	Last value

How aggregations handle missing values

By default, NaN values are skipped. That sounds harmless but matters more than you think.

Compare with what you might naively expect: dividing by 5 (the length) would give 14.0. Pandas divides by 3 (the count of non-null values) and gives 23.33. This is almost always what you want — but be aware.

Mean vs median

The two most-cited "averages" mean different things.

Mean — sum / count. Sensitive to outliers.
Median — middle value when sorted. Robust to outliers.

A single outlier moved the mean from ~63 (where the typical person sits) to over 100. The median is unaffected. When reporting "typical" values to a non-technical audience, the median is often a better choice — and you should usually report both.

A useful habit

For any "typical X" question, look at the mean and the median. If they agree, the distribution is roughly symmetric. If they diverge, dig in: outliers or skew may be hiding important information.

NaN

14.0

23.33 (= (10+20+40) / 3)

70.0

QuestionSelect one

A salary column has values [50, 55, 60, 60, 65, 70, 75, 500]. Which statistic is least affected by the 500 outlier?

The mean

The sum

The median

The max

QuestionSelect one

What does s.value_counts(normalize=True) produce?

The counts of unique values

A list of unique values

The proportions (fractions of total) of each unique value

A sorted Series

Creating New Columns

Computed columns, conditional columns, mapped columns, and the in-place vs. assign trade-off.

GroupBy Operations

Split-apply-combine — the most powerful pattern in Pandas, and arguably in all of data analysis.

Aggregation Basics

The reduction methods

How aggregations handle missing values

Mean vs median

`.describe()` — the multi-statistic summary

`.agg()` — multiple aggregations at once

Aggregating filtered subsets

`value_counts` — a one-shot category breakdown

Counting nulls and uniques

Check your understanding

On this page