Model Interpretation and Feature Importance

A model that works is not enough — you usually need to know why it works. How to ask a model which features drove its predictions, and the traps in every answer it gives.

Suppose you have built a model that predicts, with 96% accuracy, which breast tumors are malignant. Wonderful — but a doctor about to act on that prediction will immediately ask a question your accuracy score cannot answer: why? Which measurements made the model say "malignant"? Would it have said the same thing if one value were a little different? A number on a held-out set tells you that the model is good. It does not tell you how the model thinks, and for most real decisions, that second question matters just as much.

This page is about interpretation — opening the box and asking a trained model which features it leaned on, and how much. You will learn the three most common ways to do this in scikit-learn, and, just as important, the ways each one can quietly mislead you.

Why interpretability matters

A perfectly accurate model you cannot explain is, in many settings, nearly useless. Interpretability is not a nicety bolted on at the end; it is often the difference between a model that ships and one that sits in a notebook.

Trust. A clinician, a loan officer, or a judge will not — and often legally cannot — act on a prediction they cannot scrutinize. "The model said so" is not a reason anyone can stand behind.
Debugging. When a model behaves strangely, importance scores are your flashlight. If a customer-churn model leans almost entirely on a customer_id column, you have found a bug — that column should carry no real signal, and the model has latched onto an artifact.
Fairness. If a model's decisions hinge heavily on a feature that is a proxy for a protected attribute (a zip code standing in for race, say), you need to see that before it causes harm.
Stakeholder buy-in. "Late payments and high credit utilization drive this risk score" is a sentence a business can understand, challenge, and act on. A wall of weights is not.
Scientific insight. Sometimes the model is a means to an end: understanding which factors relate to an outcome is the actual goal, and the predictions are secondary.

Interpretation answers a different question than evaluation

Evaluation asks "is this model any good?" and is answered by metrics on held-out data. Interpretation asks "why does it predict what it predicts?" and is answered by importance scores and explanations. A model can score well and still be uninterpretable, or be perfectly interpretable and score poorly. You usually need both kinds of answer.

Coefficients as importance — for linear and logistic models

The most directly interpretable models are the linear ones. A LinearRegression or LogisticRegression assigns each feature a single number — a coefficient — and the prediction is built by multiplying each feature by its coefficient and adding them up. A coefficient's sign tells you the direction of the relationship; its magnitude hints at how strongly that feature pushes the prediction.

This is the appeal of linear models: their parameters are an explanation, for free. Let us read the coefficients of a logistic regression on the breast-cancer data, which ships with named features.

Each coefficient is a small story: this measurement, all else equal, pushes the prediction one way or the other by that much. For a stakeholder, that is gold.

But there is a catch, and it is a big one.

Coefficients depend on feature SCALE

A coefficient's size depends on the units of its feature. A feature measured in millimeters will get a coefficient a thousand times larger than the same feature in meters — yet nothing about the model's behavior changed. You cannot compare raw coefficients across features on different scales. Either standardize the features first (as above), so a one-unit change means "one standard deviation" for every feature, or do not rank features by raw coefficient magnitude at all.

There is a second, subtler trap. When two features are correlated, linear models can split their shared influence between them almost arbitrarily — giving one a large coefficient and the other a tiny one, or even flipping a sign — without changing predictions at all. The model is indifferent to how it divides credit between redundant features, so reading a single coefficient as "this feature's importance" can badly mislead you.

Correlated features confuse coefficients

If two features carry overlapping information, a linear model may load all the weight onto one and starve the other, even reversing a coefficient's sign. The pair together matters, but neither coefficient alone tells you that. With correlated features, treat individual coefficients with suspicion. The breast-cancer data, for instance, has several near-duplicate "mean / worst / error" measurements of the same quantity.

Tree and forest importances — `feature_importances_`

Tree-based models offer a different, built-in notion of importance. As a decision tree grows, every split on a feature reduces impurity (it makes the resulting groups purer in their labels). scikit-learn adds up how much each feature reduced impurity across all the splits that used it, and reports the totals as feature_importances_. A feature that was chosen for many high-value splits scores high; one never split on scores zero.

This works for a single tree and, more reliably, for a whole random forest (averaging over many trees smooths out the noise of any single one).

Two conveniences jump out. Tree importances are always non-negative and sum to 1, so you can read them as "share of the model's decision-making attributed to this feature." And unlike coefficients, they are scale invariant — a tree splits on thresholds, so the units of a feature do not matter. That alone makes them friendlier than raw coefficients.

But impurity-based importance has well-known biases you must know about.

Impurity importance is biased toward high-cardinality features

Impurity-based feature_importances_ tends to inflate features with many distinct values (high cardinality) — continuous numbers, or an ID-like column — because such features offer more places to split and can reduce impurity on the training data almost by accident. The notorious symptom: add a column of pure random noise with many unique values, and a forest may assign it a non-trivial importance. Treat impurity importance as a useful first look, not the final word.

A second limitation: impurity importance is computed from how the tree was built on the training data, so it reflects training-set structure, not necessarily what helps on new data. For an importance measure tied to actual predictive performance, we need a different idea.

Permutation importance — model-agnostic and harder to fool

Permutation importance asks the most direct question imaginable: if I destroy the information in this feature, how much worse does the model get?

The procedure is beautifully simple. Take a trained model and a held-out set. Measure its score. Now randomly shuffle one feature's column — scrambling the link between that feature and the target while keeping its distribution identical — and measure the score again. If the score collapses, that feature was important; if the score barely moves, the model was not really relying on it. Repeat for every feature.

This has three big advantages. It is model-agnostic — it works on any fitted estimator, from a linear model to a forest to a pipeline, because it only needs .predict(). It measures importance against real predictive performance (the score you care about), not training-set impurity. And by evaluating on held-out data, it asks what helps on new data, not what the model memorized.

Because the shuffle is random, permutation importance is itself a random quantity — that is why we repeat it (n_repeats=20) and get a mean and a standard deviation. A feature whose importance is well above its own noise band is genuinely being used; one whose importance is a hair from zero (within its standard deviation) probably is not.

Permutation importance on TEST data, not training data

Run permutation importance on held-out data whenever you can. On the training set, an overfit model may look like it depends heavily on features it merely memorized. On held-out data, you measure what actually helps the model generalize — which is almost always the question you care about.

Permutation importance is not flawless either. With strongly correlated features it can understate importance: if two columns carry the same information, shuffling just one leaves the other intact, so the model barely suffers and both look unimportant — even though the information they share is crucial. No single importance method is immune to correlated features; this is a recurring theme, not a quirk of one technique.

Three lenses, three biases

You now have three ways to ask "which features matter," and each sees the model differently:

Coefficients (linear models): a direction and a magnitude per feature, but only comparable after scaling, and shaky under correlation.
Impurity importance (feature_importances_): free from trees, scale invariant, but biased toward high-cardinality features and tied to the training set.
Permutation importance: model-agnostic and tied to real held-out performance, but can understate correlated features and costs extra compute.

When they agree, trust the ranking. When they disagree, that disagreement is itself a clue worth investigating.

A picture is worth a hundred printed numbers

Importance scores are far easier to read as a bar chart than as a column of numbers. Here we fit a forest and draw its top impurity importances with matplotlib — the kind of plot you will make constantly.

A horizontal bar chart, sorted, with the most important feature on top, is the standard way to present feature importance to a human. It turns a model into a one-glance story: these few measurements are doing most of the work.

Global vs. local explanations

Everything so far has been a global explanation: a single ranking that summarizes the model's behavior across the whole dataset. "Flavanoids and color intensity drive this wine classifier" is a global statement.

There is a second flavor: local explanations, which explain one specific prediction. "For this particular patient, the high worst-radius value is what pushed the model toward 'malignant'" is a local statement. Global tells you how the model behaves on average; local tells you why it made the call it made for a single case — which is often exactly what a person affected by the decision wants to know.

Global and local answer different questions

A global explanation summarizes the model overall (one ranking of features). A local explanation justifies a single prediction (why this row got this output). Specialized tools such as SHAP and LIME focus on local explanations and go beyond this course, but it is worth knowing the distinction: "important in general" and "decisive for this case" are not the same thing.

The misconception that matters most: importance is not causation

Here is the single most important sentence on this page. A feature being important to a model does not mean it causes the outcome. Importance is a statement about the model and the data it was trained on, not about the world.

A model can lean heavily on a feature for reasons that have nothing to do with cause and effect:

The feature may be a proxy for the true cause. Ice-cream sales might be a top predictor of drownings — not because ice cream causes drowning, but because hot weather drives both. A model happily uses the proxy.
The feature may leak the answer. If "number of reminder letters sent" is a top predictor of loan default, it may be that the bank sends letters because it already suspects default — the feature is a consequence, not a cause.
Causation may run the other way, or both features may share a hidden common cause, as with the ice cream above.

This is the same lesson as the old statistics adage correlation is not causation, wearing machine-learning clothes. A high importance score means "the model found this feature useful for prediction given the data it saw." Whether intervening on that feature in the real world would change the outcome is a causal question that importance scores cannot answer — it takes an experiment, or careful causal reasoning, to know.

Do not read importance as a to-do list

The deadliest misuse of feature importance is treating it as advice for action: "color intensity is the most important feature, so let us change color intensity to change the outcome." Importance describes prediction, not intervention. A feature can dominate a model and yet be a useless lever in reality, because changing it would not change the cause it was merely standing in for. To learn what to do, you need causal evidence, not an importance ranking.

When NOT to over-trust importance

When features are strongly correlated. As we saw, every method distorts under correlation — splitting credit, hiding it, or shuffling it ineffectively. Inspect your correlations before reading too much into any single feature's rank.
When you have not scaled, for coefficients. Ranking raw coefficients across features on different scales compares millimeters to kilograms. It is meaningless without standardization.
When the model itself is poor. A model that does not generalize gives importance scores that explain its mistakes. Establish that the model works on held-out data first; only then is "why" a question worth asking.
When you need causal answers. Importance is the wrong tool for "what should we change?" Reach for an experiment instead.

Real-world applications

Interpretation runs alongside prediction across every serious domain. A credit team must, often by law, tell a rejected applicant which factors drove the decision — straight from model coefficients or importances. A hospital validating a diagnostic model checks that it leans on clinically sensible measurements, not on an artifact like which scanner produced the image. A churn team uses importance to brief the business on why customers leave, turning a model into a strategy. In each case the model's accuracy opened the door, but its interpretability is what let people walk through it.

Your turn

You will train a random forest on the wine dataset and pull out its most important features.

The data is loaded for you: X, y, and feature_names.
Fit a RandomForestClassifier(n_estimators=200, random_state=0) and store it in forest.
Read its feature_importances_ into a variable called importances.
Find the index of the single most important feature and store the feature's name (a string from feature_names) in top_feature. Hint: importances.argmax() gives the index.
Build a list top3 of the names of the three most important features, ordered from most to least important.

The hidden tests check that the forest is fitted, that importances has one value per feature and sums to about 1, that top_feature is the correct name, and that top3 has the right three names in order.

Check your understanding

QuestionSelect one

Why must you standardize features before comparing the raw coefficients of a linear or logistic regression as "importances"?

Because unstandardized features crash LogisticRegression

Because a coefficient's magnitude depends on its feature's units, so unscaled coefficients compare quantities measured on different scales

Because standardizing makes all coefficients exactly equal

Because coefficients are otherwise always negative

QuestionSelect one

A colleague adds a column of pure random noise (with many unique values) to the training data and is alarmed that the random forest's impurity-based feature_importances_ gives it a non-trivial score. What is going on?

The forest is broken and should be reinstalled

The noise column genuinely causes the target

Impurity-based importance is biased toward high-cardinality features, which offer many split points and can reduce training impurity by chance

Random forests cannot handle continuous features

QuestionSelect one

What does permutation importance actually measure for a given feature?

The coefficient the model assigns to that feature

How many times the feature appears in the training data

How much the model's score drops when that feature's values are randomly shuffled, breaking its link to the target

The correlation between that feature and every other feature

QuestionSelect one

Why is it better to compute permutation importance on a held-out set rather than on the training data?

Held-out data is always larger than training data

It is the only set permutation importance can run on

On the training set an overfit model can look dependent on features it merely memorized, while held-out data measures what actually helps it generalize

Training data has no labels to shuffle

QuestionSelect one

A churn model ranks "number of support tickets" as its most important feature. A manager concludes: "Let us reduce support tickets and customers will stop churning." Why is this reasoning flawed?

Because permutation importance is more accurate than impurity importance

Because the feature should have been scaled first

Because high importance means the feature predicts churn, not that it causes churn — tickets may be a symptom of an underlying problem, and suppressing them would not fix the cause

Because random forests cannot be used for churn prediction

Model Interpretation and Feature Importance

On this page