Why Python Became the Language of Analytics
How a general-purpose scripting language quietly became the workhorse of modern data work
If you walked into a data team at any major company today and asked "what language do you use?", the most common answer — by a wide margin — would be Python. This was not always the case, and it was not even obvious twenty years ago. The story of how Python ended up there is worth understanding, because it explains why this course uses Python at all.
The competition
In the early 2000s, the analytics world ran on three or four mutually-suspicious languages:
- SAS, SPSS, and Stata — proprietary statistical packages with enormous installed bases in academia, government, and pharma.
- R — a free, open-source statistical language designed by and for statisticians.
- MATLAB — dominant in engineering and physics.
- Excel + VBA — the lingua franca of business.
- SQL — the universal language for getting data, but not analyzing it deeply.
Python was nowhere on this list. In 2005, Python was a "scripting language" — popular with web developers and system administrators but viewed with mild suspicion by serious analysts. It had no good numerical library, no good plotting library, no good DataFrame library, no notebook.
Three libraries that changed everything
Three libraries — and a notebook — turned Python into an analytics platform between 2005 and 2014.
NumPy: arrays
NumPy (Numerical Python, 2005) gave Python its first proper
numerical array type. Suddenly you could write a + b to add two
arrays of a million numbers in a single line, and it would run as
fast as C. Every other analytics library since has been built on
top of NumPy.
pandas: DataFrames
pandas (2008, by Wes McKinney) added the DataFrame — a labeled, spreadsheet-like table that lives in memory. If NumPy made Python numerical, pandas made Python tabular. A DataFrame looks and behaves a lot like a sheet in Excel — but you can have billions of rows, you can write code to manipulate it, and you can chain operations together cleanly.
DataFrames are the central object of the rest of this course. Plotly Express expects a DataFrame as input. Every chart on every page of this course starts with a pandas DataFrame.
Matplotlib: plotting
Matplotlib (2003) was the original Python plotting library — designed deliberately to feel like MATLAB's plotting commands so that scientists could switch over with zero friction. Matplotlib is the grandparent of every Python visualization library, including Plotly Express. It is still excellent for static, publication-quality figures.
Jupyter: the notebook
In 2010, Fernando Pérez released the IPython Notebook — a web-based environment where you could write Python and prose side by side, and see chart outputs appear inline. Renamed Jupyter in 2014, the notebook is now the default working environment for data scientists, ML researchers, and quantitative analysts worldwide.
Why Python won
There were better statistical languages (R), better numerical languages (MATLAB, Julia), and better systems languages (C++, Rust). Why did Python — a deliberately unspecialized language — end up at the center?
Three reasons:
- Python is "good enough" at everything. You can write a data pipeline, train a machine-learning model, build a web app, and write a script to email yourself the results — all in one language. R analysts had to drop into Python or Bash for "the other stuff"; Python analysts rarely had to leave.
- Python reads like English. A line of pandas code is often readable to someone who has never seen pandas. This matters enormously when you're sharing analyses with non-programmers.
- The ecosystem became self-reinforcing. Once every machine- learning library, every cloud SDK, every bioinformatics tool, and every visualization library had a Python interface, not using Python became the costly choice.
Why Python is great for visualization, specifically
For visualization in particular, Python offers a sweet spot:
- DataFrames in, charts out. Plotly Express, Matplotlib, Seaborn, and Altair all consume pandas DataFrames. You spend almost no time on data wrangling between cleaning and plotting.
- Code is the recipe. A Plotly Express chart is one line of Python. That one line is the definitive, reproducible record of how that chart was made. Six months later, you can re-run it and get the same result.
- You can embed charts everywhere. Inside a Jupyter notebook, inside a static HTML report, inside an interactive web application, inside a slide deck. The same Python chart code works in all of them.
A tiny tour
Here is a hint of what a Python visualization workflow feels like. Don't worry about understanding every line yet — we will unpack all of this in detail soon.
That is the entire workflow: load data into a DataFrame, optionally manipulate it, call one Plotly Express function. The chart is interactive and shippable. This pattern will repeat on essentially every page from here on.
Check your understanding
Which Python library introduced the DataFrame — a labeled, spreadsheet-like table that has become the central object of Python data analysis?
NumPy
Matplotlib
pandas
Jupyter
Which of the following is the best reason that Python "won" the analytics ecosystem?
Python is faster than C++ for numerical code.
Python charts are prettier than R charts.
Python is "good enough" at everything from data pipelines to ML to web apps, so analysts rarely have to leave it.
Python is the only language with notebooks.
What is a Jupyter notebook?
A spreadsheet application.
A database management tool.
A web-based environment for writing code, prose, and seeing chart outputs inline — the default working environment for many data scientists.
A type of Python web framework.