The Exploratory Workflow
The rhythm of working with data — load, peek, chart, question, repeat
By now you've met all the chart-building pieces. This page is about the rhythm of using them — the loop a real analyst follows when sitting down with a new dataset. The loop is short, and once you internalize it, you'll feel much more confident in front of an unknown dataset.
The loop
It's an iteration. You make a chart, you read it, you ask the next question. After 5-15 iterations, you usually have a story worth sharing.
Step 1: Load and peek
Whether the data comes from a CSV, a database, or a built-in dataset, your first 30 seconds should always look the same:
This single block answers most "what is this dataset?" questions. Make it a habit. Many bugs later in the analysis trace back to not noticing, in the first 30 seconds, that one column had the wrong dtype or one value was suspicious.
Step 2: Chart every numeric column
Before asking any specific question, just histogram every numeric column. This is your "vibe check" — you'll spot skewed distributions, weird spikes, and outliers immediately.
You'll see four histograms. In about 15 seconds you know the distribution of every numeric column.
Step 3: Ask a real question
Now pick a real question. For the tips dataset, a natural one:
"Do larger parties tip a higher percentage?"
The chart almost writes itself once the question is precise:
Read the chart: is there a pattern? Maybe smaller parties tip a bigger percentage on average. Interesting. That observation is the next question: "Why? Is it because lunch parties are smaller than dinner parties, and lunch tips are different?"
Step 4: Investigate surprising points
When the chart shows something you didn't expect, investigate before believing it. Hover over the outlier. Filter to those rows and look at them.
You found three rows with very high tip percentages — all tiny bills. That's a real pattern: small bills tend to round up to "I have a $5 minimum tip in my head." Good — you've turned an outlier into an insight.
Step 5: Iterate until the story is done
Real exploratory analysis rarely ends with one chart. You'll cycle through many iterations, and your DataFrame will grow with derived columns, filtered slices, and reshaped views. That's normal. Use a Jupyter notebook (or this page's CodeBlocks) and keep the iteration visible — your future self will thank you.
Exploratory vs presentation
The charts you make during exploration are not the charts you publish. Exploratory charts are quick, ugly, and iterative. Presentation charts are slow, polished, and intentional.
A common mistake: showing your exploratory work in a meeting as if it were finished. The audience needs the polished version — with titles, labels, annotations, and just the charts that matter.
| Exploratory | Presentation |
|---|---|
| Many small charts | One or two important charts |
| Default labels | Hand-tuned titles and annotations |
| One per question | Each one polished |
| For you | For them |
| Throwaway | Lasts |
A heuristic: "what would a stranger see?"
Every time you finish a chart, look at it as if you'd never seen the data before. Can you tell what it's saying? Is the title helpful? Are the axes labeled? Is the message obvious?
If not, you have one more iteration to do.
Check your understanding
What should you do first when you receive a new dataset?
Make the most beautiful chart you can.
Start a machine-learning model.
Peek at it — df.head(), df.shape, df.dtypes, df.describe() — to understand what you've got.
Filter to a small subset and stop there.
Why is histogramming every numeric column a useful early step?
It's required by Plotly.
It encrypts the data.
It gives a "vibe check" on the data — you'll spot skewed distributions, weird spikes, and outliers in about 15 seconds.
It computes correlations.
What's the key difference between exploratory and presentation charts?
Exploratory charts use Python, presentation charts use Excel.
Exploratory charts are interactive, presentation charts are static.
Exploratory charts are quick, many, and for you; presentation charts are polished, few, and for the audience.
Exploratory charts use color, presentation charts don't.