Measures of Center

You have a column of numbers — salaries, response times, order values — and someone asks the most innocent question in data science: "what's the typical value?" That word "typical" is doing a lot of work. It hides a choice between three different summaries — the mean, the median, and the mode — that can disagree wildly. Picking the wrong one is how a perfectly honest analyst ends up reporting a number that no real person in the data would recognize.

A measure of center collapses a whole distribution down to one number that's supposed to stand in for "the middle." The trouble is that "the middle" isn't one idea. This page is about knowing which center you're actually asking for, and when each one quietly lies.

The three centers, and what each one means

Mean (the arithmetic average): add everything up, divide by the count. It's the balance point of the data — the spot where the values on either side exactly counterweight each other.
Median: sort the values and take the middle one (or the average of the two middle ones). Half the data sits below it, half above. It answers "what's the value of the typical member?"
Mode: the most frequently occurring value. It answers "what's the most common outcome?" — the only one of the three that makes sense for categories like favorite color or payment method.

When data is roughly symmetric and has no extreme values — like that ticket count — all three land in the same neighborhood, and the choice barely matters. The interesting cases are when they diverge. That divergence is not a nuisance; it's information about the shape of your data.

They agree only when the data is symmetric

For a perfectly symmetric, single-peaked distribution, the mean, median, and mode all coincide. The further apart they drift, the more skewed your data is — so the gap between mean and median is itself a quick read on shape (which we explore in Shape and Outliers).

Where the mean lies: skew and outliers

The mean's defining property — it's the balance point — is also its weakness. A balance point is dragged toward heavy values, no matter how few of them there are. One billionaire in a room of teachers pulls the average net worth into the millions, even though that number describes nobody present. This is not a rare edge case; it's the default for money, sizes, durations, and counts, which are almost all right-skewed (a long tail stretching toward large values).

Adding a single person moved the mean by tens of thousands but nudged the median by almost nothing. That's the whole story in one example: the median is robust — resistant to a handful of extreme values — while the mean is sensitive to them. Neither is "right" in the abstract; they answer different questions. But if you report the mean salary of that company, you'll quote a number that's higher than what almost everyone earns.

Misconception: the mean is always 'the typical value'

On skewed data the mean is not typical — it's pulled toward the long tail and can sit above most of your observations. "Average household income" is famously misleading for exactly this reason: a few very high earners lift the mean well above what a middle household actually makes. On right-skewed data, the median is usually the honest "typical."

Misconception: 'average' always means the mean

In everyday speech "the average" defaults to the mean, but statistically average just means "a measure of center" — the median and mode are averages too. When a report says "the average user," always ask which center they computed and whether the data is skewed.

A quick MCQ before we go further

QuestionSelect one

A dataset of home prices in a neighborhood is strongly right-skewed: most homes are modest, but a few mansions sell for 10x the rest. A realtor wants to advertise the "typical" price. Which measure is the most honest summary?

The mean, because it uses every value in the data

The median, because half the homes are above it and half below, unaffected by the extreme high sales

The mode, because it's the single most common price

The mean, because medians throw away information

When each measure is the right tool

The decision is mostly about data type and shape:

Mode is the only center that works for categorical data (you can't average "red" and "blue"), and it's the right call for asking "what's the most common outcome?" It also flags multimodal data — two peaks usually mean two groups mixed together, and no single center describes them well.
Mean is the natural choice for symmetric numeric data with no extreme outliers. It uses every value, has convenient mathematical properties, and underlies most of the inference later in this course (the mean of a sample is what the central limit theorem is about).
Median is the right call whenever data is skewed or outlier-prone: incomes, house prices, response times, file sizes, wait times. It answers "what does the middle case look like?"

A practical habit: report both

When you're unsure, compute the mean and the median and look at the gap. If they're close, report the mean. If they diverge, that gap is telling you the data is skewed — report the median as the headline and mention the mean only with context. Showing both is often the most honest move.

Three useful variations (keep these light)

Most of the time, mean/median/mode are all you need. But three relatives show up often enough to recognize:

Trimmed mean. Chop off a percentage from each end (say the top and bottom 10%), then take the mean of what's left. It's a compromise: more robust than the mean, but still uses most of the data. This is exactly how Olympic judging and many sensor pipelines discard extremes before averaging.

Weighted mean. When some observations count more than others — larger stores, more-reliable sensors, bigger survey strata — give each a weight. A company-wide average satisfaction score should weight each team by headcount, not treat a 3-person team the same as a 300-person one.

Geometric mean. For rates, ratios, and growth factors (returns, percent changes, fold-changes), the arithmetic mean overstates typical growth. The geometric mean multiplies the values and takes the nth root, which is the correct "average factor" for things that compound.

Notice the last two lines: raising the geometric mean to the 4th power exactly reproduces the total compounded growth, while the arithmetic mean would not. That's the tell that you're in geometric-mean territory — whenever the quantities multiply rather than add.

When to reach for the geometric mean

Use it for anything expressed as a rate or multiplier that compounds: investment returns, population growth, "our traffic grew 3x then 0.5x then 2x." Averaging those with the arithmetic mean overstates typical growth. For plain additive quantities (heights, temperatures, dollars), stick with the arithmetic mean or median.

Putting it together: a robust center summary

In real EDA you rarely report a single number — you report a small summary and let the gaps between numbers tell you about shape. Here's the pattern you'll reuse constantly: compute several centers at once and read them together.

The positive mean_minus_median is your skew alarm. When it's large and positive, lead with the median.

You're given a pandas Series prices of right-skewed product prices. Build a summary you can trust on skewed data.

Compute a dictionary called result with exactly these keys (all values plain Python float):

"mean" — the arithmetic mean
"median" — the median
"trimmed_mean" — the 10% trimmed mean (drop 10% from each end) using scipy.stats.trim_mean
"skew_gap" — mean minus median

Then set a boolean report_median to True if skew_gap is greater than 0 (i.e. right-skewed, so the median is the more honest headline), else False.

Use the provided prices Series.

Check your understanding

QuestionSelect one

You compute the mean monthly spend of your users as $84 and the median as $41. What is the most defensible reading of this gap?

The data is left-skewed, so a few very low spenders pull the mean down

The data is right-skewed; a minority of high spenders inflate the mean, so the median ($41) better reflects a typical user

The mean must be wrong because it should be close to the median

Half of all users spend exactly $84

QuestionSelect one

A survey asks for respondents' favorite payment method (cash, credit, debit, mobile). Which measure of center even makes sense for this column?

The mean of the four options

The median payment method

The mode — the most frequently chosen payment method

None of them; categorical data has no center

QuestionSelect one

An investment returns +50% one year and −50% the next. Someone reports the "average annual return" as 0% using the arithmetic mean. Why is the geometric mean the better tool here?

The arithmetic mean of percentages is always undefined

Returns compound multiplicatively, and $100 → $150 → $75 is a real loss; the geometric mean of the growth factors (1.5 and 0.5) captures that, while the arithmetic mean hides it

The geometric mean is just a more precise version of the arithmetic mean

Because percentages can be negative

QuestionSelect one

Which statement about the trimmed mean is accurate?

It is identical to the median

It is more sensitive to outliers than the ordinary mean

It discards a fixed percentage of the smallest and largest values, then averages what remains — a middle ground between mean and median in robustness

It can only be used on symmetric data

Key takeaways

Mean = balance point; uses all data; great for symmetric data but dragged around by skew and outliers.
Median = middle value; robust; the honest "typical" for skewed data like income, prices, and durations.
Mode = most common value; the only center for categorical data and a flag for multimodality.
The gap between mean and median is a free read on skew — when they diverge, lead with the median.
Reach for the geometric mean for compounding rates and the trimmed/weighted mean when you need robustness or unequal weights.

The three centers, and what each one means

Where the mean lies: skew and outliers

A quick MCQ before we go further

When each measure is the right tool

Three useful variations (keep these light)

Putting it together: a robust center summary

Check your understanding

Measures of Center

The three centers, and what each one means

Where the mean lies: skew and outliers

A quick MCQ before we go further

When each measure is the right tool

Three useful variations (keep these light)

Putting it together: a robust center summary

Check your understanding

On this page