Dataslope logoDataslope

Spreadsheets vs Code

A side-by-side look at how the same analysis feels in Excel versus Pandas — and why analysts increasingly choose code even when the data is small.

You have already heard the high-level argument: spreadsheets are manual, code is reproducible. This chapter makes the comparison concrete with side-by-side scenarios so you can feel, in your bones, where each tool wins and where each one loses.

Two tools, one job

Pretend you are an analyst at a coffee company. Every Monday morning a CSV file lands in your inbox with last week's orders. You need to produce:

  • Total revenue.
  • Top-selling product.
  • Revenue by region.
  • A chart for the weekly team meeting.

Let us do this both ways and compare the feel.

The spreadsheet way

  1. Open Excel. Double-click the CSV.
  2. Excel asks how to import. Pick "general." Click through.
  3. Click the column headers to confirm types. Notice that the phone column got auto-converted to scientific notation (5.55E+11). Sigh, fix.
  4. Select the revenue column. Read the sum off the status bar. Copy it into a separate "Summary" sheet.
  5. Sort the table by product, eyeball the top one. Type it into the summary sheet.
  6. Sort by region. Use =SUMIF(B:B, "West", D:D) for each region. Copy values into summary sheet.
  7. Highlight summary cells, Insert > Chart, pick bar.
  8. Save as weekly_report_2024_03_18.xlsx. Email it.
  9. Next Monday: do it all again.

The code way

import pandas as pd

df = pd.read_csv("orders.csv")

print("Total revenue:", df["revenue"].sum())
print("Top product:", df.groupby("product")["revenue"].sum().idxmax())
print(df.groupby("region")["revenue"].sum())
df.groupby("region")["revenue"].sum().plot.bar()

Save the file as weekly_report.py. Next Monday: re-run.

What the comparison actually proves

The "code" version takes more upfront learning. There is no pretending otherwise. But once the learning is paid for, every repeated run is essentially free — and the script is itself a record of how the analysis was done.

That is the central trade-off:

SpreadsheetCode
First-time costLowHigher
Cost of next runSame as first runNear zero
Self-documentingNo (only final state)Yes (the source)
Audit trailWeakStrong (with git)
Handles 5 million rowsPainfully or not at allYes
Easy to share visuallyYesRequires hosting
Easy for non-coders to editYesNo

There is no winner in the abstract — only winners for a given problem.

The 'recurring' test

A useful heuristic: if you will do this analysis more than twice, write it in code. If it is genuinely a one-off (a meeting prep, a quick sanity check), reach for the spreadsheet. You will not regret using code for recurring work, and you will not regret using a spreadsheet for true one-offs.

Five common spreadsheet pains, in Pandas

1. Accidental edits

In Excel, every interaction modifies the file. There is no "diff" of what you changed; no git log of why. In Pandas, your script is the file. Diffs are line-by-line. Reviewers can ask "why did you switch from mean to median on line 42?"

2. Copy-paste from another sheet

In Excel, this is invisible after the fact. Was that column copied from Q4_sales.xlsx or Q4_sales_FINAL_v2.xlsx? In Pandas, the source filename or URL is right there in your pd.read_csv(...) call.

3. The 65k-row wall and slowness

.xls capped at 65,536 rows. .xlsx raised it but became slow. Pandas handles tens of millions of rows on a laptop, and millions interactively.

4. Manually-tracked transformations

In a spreadsheet, "I filtered out test users, dropped Saturday, and converted everything to USD" lives in someone's head. In Pandas:

Code Block
Python 3.13.2

Anyone reading this code knows exactly what you did. There is no oral history.

5. "Refresh the report"

In Excel, a refresh means an analyst manually re-doing the work, or building a fragile chain of Power Query / VBA / OFFSET formulas that one person on the team understands. In Pandas, refreshing is python report.py — or scheduling that command to run automatically every morning.

Where spreadsheets still win

It is important not to overstate the case. Pandas can be overkill for:

  • Quick exploratory totals on a tiny file. Excel is faster for "how much did I spend at the grocery store last month?"
  • Sharing a working model with a non-technical stakeholder. The CFO wants to play with the model — change assumptions, see results. A spreadsheet is the right vehicle.
  • Visual layout. Excel is also a layout tool — borders, shading, merged cells, embedded charts. For documents whose appearance matters, it is still hard to beat.

A mature analyst uses both. The pipeline often looks like: raw data → Pandas (clean, summarize, validate) → Excel (final presentation for humans).

A hybrid example

Pandas can write .xlsx files directly. So you can do the intelligence in code and the presentation in Excel — best of both worlds. (We will return to this in the Exporting Cleaned Data chapter.)

Code Block
Python 3.13.2

In a real workflow you would write to a real .xlsx file and email it (or upload it to a shared drive). The analysis lives in Python so it is reproducible; the delivery lives in Excel so the audience can read it.

Conceptual comparison: manual vs reproducible

That right-hand pipeline — same inputs, same outputs, audit trail — is the whole game. Pandas is the most popular tool for building it in the Python world.

What to take away

  • This course is not anti-spreadsheet. It is pro- reproducibility.
  • The single biggest reason analysts move from spreadsheets to code is not speed; it is the ability to re-run an analysis reliably.
  • The second biggest reason is transparency: code is the record of the analysis. Spreadsheets are only the result.
  • The third is scale: code does not care whether the dataset has fifty rows or fifty million.

Check your understanding

QuestionSelect one

Which of these is the chapter's stated biggest practical reason analysts move from spreadsheets to code?

Code is shorter

Code is more fun

Code-based analyses are reproducible — re-running on next week's data gives the same results without manual effort

Code uses less memory

QuestionSelect one

In the chapter's "recurring test," what is the heuristic for choosing between a spreadsheet and Pandas?

If you will do this analysis more than twice, write it in code

If the dataset has more than 50 rows, use Pandas

If the analysis has more than 3 steps, use Excel

If you have meetings on Mondays, use Pandas

QuestionSelect one

Which of these is still a legitimate strength of spreadsheets versus Pandas?

Handling millions of rows

Reproducibility

Version control via git

Letting a non-technical stakeholder play with assumptions in a visual model

On this page