Python for Analysis
The slice of Python you actually need to be productive in Pandas — and the slices you can safely skip until later.
You do not need to be a "Python programmer" to use Pandas. You need a small, well-chosen subset of Python — enough to read the code you write, enough to recognize errors, and enough to glue together the Pandas calls that do the real work.
This chapter is that subset. If you already know Python, skim. If you do not, read carefully — every concept here will appear in later chapters.
Variables and assignment
A variable is a name that points to a value. The = sign
assigns a value to a name.
In Pandas you will mostly assign DataFrames to variables:
df = pd.read_csv("orders.csv")
clean = df.dropna()The names are short on purpose — they appear a lot. Common conventions:
df— the current DataFrame.df_a,df_b— when you have multiple.s— a Series (one column).g— a GroupBy object.idx— an Index.
The basic data types
Pandas wraps these, but you will sometimes pass them around directly:
| Type | Example | Used in Pandas for |
|---|---|---|
int | 42 | Counts, IDs, integer columns |
float | 3.14 | Continuous measurements, NaN-able |
str | "hello" | Text columns, labels, column names |
bool | True/False | Filters, boolean columns |
None | None | Missing values (often becomes NaN) |
list | [1, 2, 3] | Multiple column names, multi-select |
dict | {"a": 1} | Column-to-value mappings, renames |
tuple | (1, 2) | Shape, fixed-size groups |
A few examples:
f-strings — formatted text
You will use f-strings constantly to assemble report-ready messages.
The format specifiers (:,.2f, :.1%, :,) are worth
memorizing because they show up everywhere in analyst code.
Comparisons and booleans
Pandas filtering is built entirely on boolean comparisons.
Pandas uses vectorized versions of these on whole columns —
df["age"] > 30 returns a column of booleans, one per row.
We will see this constantly.
Lists and indexing
Lists hold an ordered sequence of items. You index them with integers, starting at 0.
The slice syntax start:stop will reappear in Pandas's iloc
selector, with the same exclusive-end behavior.
Dictionaries
A dictionary maps keys to values. You will use them most often as small lookup tables and for column renames.
In Pandas: df["currency"].map(exchange) looks up the rate for
every row in one shot. We will see .map() later.
Functions
A function bundles up reusable logic. The classic shape:
Two Python features that matter for Pandas:
- Default arguments (
retire_at=65) make functions easier to call. - Docstrings (the triple-quoted line) document intent. Get in the habit early.
You will write functions that operate on whole columns:
def grade(score):
if score >= 90:
return "A"
if score >= 80:
return "B"
return "C"
df["letter"] = df["score"].apply(grade)We will see .apply() in detail later.
Lambdas — small inline functions
A lambda is a one-line anonymous function. Useful for short
column transformations.
In Pandas: df["price"].apply(lambda p: p * 1.08) is a common
pattern.
Importing libraries
Every analytical script starts with imports. The two essentials:
You will eventually add matplotlib.pyplot as plt,
plotly.express as px, and others as the chapters call for
them.
What you do not need to know yet
It is a long list. Don't worry about:
- Classes and
self. - Decorators.
- Generators and
yield. - Context managers and
with. - Type hints beyond very simple ones.
- Async /
await.
You will pick these up gradually if you need them. Pandas is forgiving — the surface area you need to be productive is small.
Pandas is mostly chained calls
A typical line of analyst code looks like:
df.groupby("department")["salary"].mean().round(0).sort_values()
Reading right to left through the chain — take df, group by department, pull out salary, average it, round, sort — is exactly how you should read it. Each step transforms the result of the previous step.
Try a small workout
Write a function called take_home(salary, tax_rate) that returns the take-home pay (after tax). Then apply it to a list of three salaries with three different tax rates and store the results in a list called results.
For example, take_home(100000, 0.30) should return 70000.0.
Use it on:
- salary 100000, tax rate 0.30
- salary 80000, tax rate 0.22
- salary 50000, tax rate 0.18
Store the three take-home values in a list called results in that order.
Check your understanding
What does f"{rate:.1%}" print when rate = 0.075?
0.075
7.5
7.5%
75.0%
Why are list slices like names[1:3] "exclusive at the end"?
It is a Python bug
By convention; names[1:3] returns items at index 1 and 2, so that len(names[1:3]) == 3 - 1 == 2 — the slice length equals the difference of the indices
It is a NumPy convention only
Slices always exclude the first item
Which of these Python features can you safely skip when starting with Pandas?
f-strings
Functions
Lists and dictionaries
Decorators and async/await
The Analyst Mindset
Habits of thought that separate analysts who produce trustworthy work from those who produce plausible-looking numbers.
Notebooks and Environments
Conceptually understanding scripts vs notebooks, virtual environments, packages, and reproducibility — even though Dataslope handles the setup for you.