Python for Analysis

You do not need to be a "Python programmer" to use Pandas. You need a small, well-chosen subset of Python — enough to read the code you write, enough to recognize errors, and enough to glue together the Pandas calls that do the real work.

This chapter is that subset. If you already know Python, skim. If you do not, read carefully — every concept here will appear in later chapters.

Variables and assignment

A variable is a name that points to a value. The = sign assigns a value to a name.

In Pandas you will mostly assign DataFrames to variables:

df = pd.read_csv("orders.csv")
clean = df.dropna()

The names are short on purpose — they appear a lot. Common conventions:

df — the current DataFrame.
df_a, df_b — when you have multiple.
s — a Series (one column).
g — a GroupBy object.
idx — an Index.

The basic data types

Pandas wraps these, but you will sometimes pass them around directly:

Type	Example	Used in Pandas for
`int`	`42`	Counts, IDs, integer columns
`float`	`3.14`	Continuous measurements, NaN-able
`str`	`"hello"`	Text columns, labels, column names
`bool`	`True`/`False`	Filters, boolean columns
`None`	`None`	Missing values (often becomes NaN)
`list`	`[1, 2, 3]`	Multiple column names, multi-select
`dict`	`{"a": 1}`	Column-to-value mappings, renames
`tuple`	`(1, 2)`	Shape, fixed-size groups

A few examples:

f-strings — formatted text

You will use f-strings constantly to assemble report-ready messages.

The format specifiers (:,.2f, :.1%, :,) are worth memorizing because they show up everywhere in analyst code.

Comparisons and booleans

Pandas filtering is built entirely on boolean comparisons.

Pandas uses vectorized versions of these on whole columns — df["age"] > 30 returns a column of booleans, one per row. We will see this constantly.

Lists and indexing

Lists hold an ordered sequence of items. You index them with integers, starting at 0.

The slice syntax start:stop will reappear in Pandas's iloc selector, with the same exclusive-end behavior.

Dictionaries

A dictionary maps keys to values. You will use them most often as small lookup tables and for column renames.

In Pandas: df["currency"].map(exchange) looks up the rate for every row in one shot. We will see .map() later.

Functions

A function bundles up reusable logic. The classic shape:

Two Python features that matter for Pandas:

Default arguments (retire_at=65) make functions easier to call.
Docstrings (the triple-quoted line) document intent. Get in the habit early.

You will write functions that operate on whole columns:

def grade(score):
    if score >= 90:
        return "A"
    if score >= 80:
        return "B"
    return "C"

df["letter"] = df["score"].apply(grade)

We will see .apply() in detail later.

Lambdas — small inline functions

A lambda is a one-line anonymous function. Useful for short column transformations.

In Pandas: df["price"].apply(lambda p: p * 1.08) is a common pattern.

Importing libraries

Every analytical script starts with imports. The two essentials:

You will eventually add matplotlib.pyplot as plt, plotly.express as px, and others as the chapters call for them.

What you do not need to know yet

It is a long list. Don't worry about:

Classes and self.
Decorators.
Generators and yield.
Context managers and with.
Type hints beyond very simple ones.
Async / await.

You will pick these up gradually if you need them. Pandas is forgiving — the surface area you need to be productive is small.

Pandas is mostly chained calls

A typical line of analyst code looks like:

df.groupby("department")["salary"].mean().round(0).sort_values()

Reading right to left through the chain — take df, group by department, pull out salary, average it, round, sort — is exactly how you should read it. Each step transforms the result of the previous step.

Try a small workout

Write a function called take_home(salary, tax_rate) that returns the take-home pay (after tax). Then apply it to a list of three salaries with three different tax rates and store the results in a list called results.

For example, take_home(100000, 0.30) should return 70000.0.

Use it on:

salary 100000, tax rate 0.30
salary 80000, tax rate 0.22
salary 50000, tax rate 0.18

Store the three take-home values in a list called results in that order.

Check your understanding

QuestionSelect one

What does f"{rate:.1%}" print when rate = 0.075?

0.075

7.5

7.5%

75.0%

QuestionSelect one

Why are list slices like names[1:3] "exclusive at the end"?

It is a Python bug

By convention; names[1:3] returns items at index 1 and 2, so that len(names[1:3]) == 3 - 1 == 2 — the slice length equals the difference of the indices

It is a NumPy convention only

Slices always exclude the first item

QuestionSelect one

Which of these Python features can you safely skip when starting with Pandas?

f-strings

Functions

Lists and dictionaries

Decorators and async/await

Variables and assignment

The basic data types

f-strings — formatted text

Comparisons and booleans

Lists and indexing

Dictionaries

Functions

Lambdas — small inline functions

Importing libraries

What you do not need to know yet

Try a small workout

Check your understanding

Python for Analysis

On this page