Arrays and Tensors
How NumPy arrays represent vectors, matrices, and higher-rank tensors, and how to manipulate them without copies
The single data structure that makes scientific Python possible is
the NumPy ndarray — a contiguous block of memory plus a tiny
header describing how to interpret it as a multidimensional grid.
This page builds your mental model of that structure and shows the
operations you will reach for over and over in the rest of the
course.
What an ndarray actually is
A NumPy array has three pieces:
- A dtype — e.g.
float64,int32,complex128— which says how many bytes each element takes and how to interpret them. - A shape — a tuple like
(3, 4)or(2, 3, 5)— which says how to index the buffer as a multidimensional grid. - A strides tuple — how many bytes to skip in the buffer to step along each axis.
Most array operations — slicing, transposing, reshaping — just change the header. The underlying bytes are not copied. That is what makes NumPy so fast.
The transpose has different strides but the same bytes. Reads
through B jump through memory differently — sometimes faster,
sometimes slower than reads through A. The numerical content is
identical.
Creating arrays
The handful of constructors you will use 99% of the time:
For random data, prefer the modern default_rng() interface (it is
reproducible and stateless):
Indexing: views, fancy, and boolean
There are three indexing modes, and they behave very differently in terms of memory.
Basic indexing (slices, integers, :, None) returns a view.
A view shares memory with the original; modifying the view modifies
the original.
Fancy indexing (an array of integer indices) returns a copy.
Boolean indexing (a boolean mask of the same shape) also returns a copy.
Knowing whether you have a view or a copy is the difference between an in-place algorithm and a silent bug.
Reshape, ravel, and the order question
reshape re-interprets the existing bytes; it only copies if it
cannot. ravel flattens to 1-D as a view when possible. flatten
always copies.
Vector, matrix, and tensor arithmetic
All the standard arithmetic ops are element-wise. Matrix-style
products use @ (or np.matmul / np.dot).
The * versus @ distinction is among the most important pieces
of vocabulary in scientific Python. Element-wise when you want
arithmetic; matmul when you want linear-algebra composition.
Higher-rank tensors
A "tensor" in scientific computing usually means a NumPy array with axes. They show up everywhere images (height width channels), video (frames height width channels), time-series ensembles, and PDE solution states live.
The general-purpose tool for tensor contractions is np.einsum,
which lets you write Einstein-summation notation directly.
einsum is slower to read than @ but enormously more
expressive: contractions, traces, outer products, transposes,
diagonal extractions all fit in the same notation.
A multi-file linear-algebra utility
A real scientific project organizes its kernels and its drivers
into separate modules. Here is a small example: a routine that
projects a batch of vectors onto the span of a set of basis
vectors, separated into a linalg.py helper module and a main.py
driver.
Two files, twelve lines of real code, a complete projection operator that runs at BLAS speed on millions of points.
Practice: build a small tensor
Implement weighted_centroid(points, weights) that takes:
pointsof shape(n, d): $n$ points in $\mathbb{R}^d$weightsof shape(n,): non-negative weights
and returns the weighted centroid
$$ c = \frac{\sum_i w_i , p_i}{\sum_i w_i} \in \mathbb{R}^d $$
as a 1-D array of shape (d,). Do it without any explicit
Python loops — use broadcasting and a single reduction.
Check your understanding
What is the shape of np.einsum("ijk,kl->ijl", A, B) if A has shape (2, 3, 4) and B has shape (4, 5)?
(2, 3, 4)
(2, 3, 5)
(4, 5)
(2, 5)
Which of the following array operations on a 2-D NumPy array returns a copy (rather than a view)?
A[1:5, ::2]
A.T
A.reshape(-1) on a C-contiguous array
A[A > 0] (boolean indexing)