Dataslope logoDataslope

Python for Analysis

The slice of Python you actually need to be productive in Pandas — and the slices you can safely skip until later.

You do not need to be a "Python programmer" to use Pandas. You need a small, well-chosen subset of Python — enough to read the code you write, enough to recognize errors, and enough to glue together the Pandas calls that do the real work.

This chapter is that subset. If you already know Python, skim. If you do not, read carefully — every concept here will appear in later chapters.

Variables and assignment

A variable is a name that points to a value. The = sign assigns a value to a name.

Code Block
Python 3.13.2

In Pandas you will mostly assign DataFrames to variables:

df = pd.read_csv("orders.csv")
clean = df.dropna()

The names are short on purpose — they appear a lot. Common conventions:

  • df — the current DataFrame.
  • df_a, df_b — when you have multiple.
  • s — a Series (one column).
  • g — a GroupBy object.
  • idx — an Index.

The basic data types

Pandas wraps these, but you will sometimes pass them around directly:

TypeExampleUsed in Pandas for
int42Counts, IDs, integer columns
float3.14Continuous measurements, NaN-able
str"hello"Text columns, labels, column names
boolTrue/FalseFilters, boolean columns
NoneNoneMissing values (often becomes NaN)
list[1, 2, 3]Multiple column names, multi-select
dict{"a": 1}Column-to-value mappings, renames
tuple(1, 2)Shape, fixed-size groups

A few examples:

Code Block
Python 3.13.2

f-strings — formatted text

You will use f-strings constantly to assemble report-ready messages.

Code Block
Python 3.13.2

The format specifiers (:,.2f, :.1%, :,) are worth memorizing because they show up everywhere in analyst code.

Comparisons and booleans

Pandas filtering is built entirely on boolean comparisons.

Code Block
Python 3.13.2

Pandas uses vectorized versions of these on whole columns — df["age"] > 30 returns a column of booleans, one per row. We will see this constantly.

Lists and indexing

Lists hold an ordered sequence of items. You index them with integers, starting at 0.

Code Block
Python 3.13.2

The slice syntax start:stop will reappear in Pandas's iloc selector, with the same exclusive-end behavior.

Dictionaries

A dictionary maps keys to values. You will use them most often as small lookup tables and for column renames.

Code Block
Python 3.13.2

In Pandas: df["currency"].map(exchange) looks up the rate for every row in one shot. We will see .map() later.

Functions

A function bundles up reusable logic. The classic shape:

Code Block
Python 3.13.2

Two Python features that matter for Pandas:

  • Default arguments (retire_at=65) make functions easier to call.
  • Docstrings (the triple-quoted line) document intent. Get in the habit early.

You will write functions that operate on whole columns:

def grade(score):
    if score >= 90:
        return "A"
    if score >= 80:
        return "B"
    return "C"

df["letter"] = df["score"].apply(grade)

We will see .apply() in detail later.

Lambdas — small inline functions

A lambda is a one-line anonymous function. Useful for short column transformations.

Code Block
Python 3.13.2

In Pandas: df["price"].apply(lambda p: p * 1.08) is a common pattern.

Importing libraries

Every analytical script starts with imports. The two essentials:

Code Block
Python 3.13.2

You will eventually add matplotlib.pyplot as plt, plotly.express as px, and others as the chapters call for them.

What you do not need to know yet

It is a long list. Don't worry about:

  • Classes and self.
  • Decorators.
  • Generators and yield.
  • Context managers and with.
  • Type hints beyond very simple ones.
  • Async / await.

You will pick these up gradually if you need them. Pandas is forgiving — the surface area you need to be productive is small.

Pandas is mostly chained calls

A typical line of analyst code looks like:

df.groupby("department")["salary"].mean().round(0).sort_values()

Reading right to left through the chain — take df, group by department, pull out salary, average it, round, sort — is exactly how you should read it. Each step transforms the result of the previous step.

Try a small workout

Challenge
Python 3.13.2
Compute take-home pay

Write a function called take_home(salary, tax_rate) that returns the take-home pay (after tax). Then apply it to a list of three salaries with three different tax rates and store the results in a list called results.

For example, take_home(100000, 0.30) should return 70000.0.

Use it on:

  • salary 100000, tax rate 0.30
  • salary 80000, tax rate 0.22
  • salary 50000, tax rate 0.18

Store the three take-home values in a list called results in that order.

Check your understanding

QuestionSelect one

What does f"{rate:.1%}" print when rate = 0.075?

0.075

7.5

7.5%

75.0%

QuestionSelect one

Why are list slices like names[1:3] "exclusive at the end"?

It is a Python bug

By convention; names[1:3] returns items at index 1 and 2, so that len(names[1:3]) == 3 - 1 == 2 — the slice length equals the difference of the indices

It is a NumPy convention only

Slices always exclude the first item

QuestionSelect one

Which of these Python features can you safely skip when starting with Pandas?

f-strings

Functions

Lists and dictionaries

Decorators and async/await

On this page