Why Machine Learning Exists

Machine learning is not a default — it is what you reach for when explicit rules break down. We explore the kinds of problems that defeat hand-coding, and build a clear decision process for when to use ML and when not to.

On the previous page we drew the line between traditional programming (you write the rules) and machine learning (the rules are learned from examples). That raises an obvious question: if writing rules is simpler, cheaper, and more predictable, why does machine learning exist at all? Why not always just write the rules?

The honest answer is that for a huge number of important problems, you cannot write the rules — not because you are not clever enough, but because the rules are too numerous, too subtle, or change too fast for any human to express in code. Machine learning exists to handle exactly those problems, and the skill of an experienced practitioner is largely knowing which problems those are.

The mindset for this page

Machine learning is a tool with real costs — data, complexity, the ever-present risk of being confidently wrong. A good engineer reaches for it deliberately, not reflexively. By the end of this page you should be able to look at a problem and make a reasoned call: rule, or model?

When rules break down

Let us look at the classic problems that defeated rule-writing and launched the field. In each one, smart people genuinely tried to hand-code a solution first — and the rules collapsed under their own weight.

Spam filtering

We met this on the last page. You start with sensible rules — block certain words, certain senders, certain patterns. But spam is adversarial: a human on the other end actively rewrites their messages to slip past whatever rule you wrote. "Free money" becomes "fr33 m0ney" becomes "f.r.e.e m·o·n·e·y." Every rule you add, they route around. The number of rules grows without bound and your filter is always one step behind. A model trained on fresh labeled examples, by contrast, can be retrained as the spam evolves, learning the new tricks from data instead of waiting for you to notice them.

Recognizing handwriting and images

Try to write the rules for recognizing a handwritten digit. What is a "7"? A horizontal stroke and a diagonal one — except when it has a little cross through it, or the top is wavy, or it is slanted, or someone's "7" looks like your "1." Now multiply that by every person's handwriting on Earth. The rules are effectively infinite, and they contradict each other. Yet a child recognizes digits effortlessly, and a model trained on labeled examples of digits does too. The pattern is real and learnable — it is just not writable.

Recommendations

Why would you write a rule for what show someone should watch next? "If they liked a comedy, recommend another comedy" is laughably crude. Real taste is a tangle of genre, mood, time of day, what their friends watch, what is new, and a thousand interactions you could never enumerate. Recommendation engines learn these patterns from the behavior of millions of users — no hand-coded taste profile could ever keep up.

Fraud detection

Fraud looks almost like legitimate activity — that is the whole point of fraud. The signal is a faint, shifting combination of dozens of factors: amount, location, timing, merchant, device, the sequence of recent transactions. Worse, fraudsters adapt the moment they understand your rules. The pattern is too subtle to specify and too dynamic to fix in place, which is precisely the territory where learning from data wins.

The common thread

Notice what these problems share. The relationship between input and output is real — these are not random — but it is too complex to write down, and often changing over time. That triad (real, complex, shifting) is the signature of a machine learning problem. When you see it, a model is worth considering. When you do not, a rule is usually better.

The key decision: prefer a simple rule when one works

Here is the single most useful heuristic in this entire page, and it runs opposite to the way machine learning is usually marketed:

Reach for the simplest thing that works. Use a rule when you can write one. Reach for machine learning only when the pattern is genuinely too complex to specify by hand.

This is not anti-machine-learning; it is pro-engineering. A simple rule you understand is easier to build, test, debug, explain, and trust than a model. Models bring real burdens — collecting and cleaning data, the risk of silent failure, the difficulty of explaining a decision, the maintenance of a thing that can quietly degrade as the world drifts. You take on those burdens willingly when the payoff justifies them, and not before.

A quick way to feel the difference: imagine explaining your solution to a new teammate. If you can say "we block these three sender domains" in one sentence, you have a rule, and you should keep it. If your explanation collapses into "well, it depends on the interaction of a dozen things," you have a machine learning problem.

Walk that flowchart for a moment. Three gates stand between you and a model, and a "no" at any one of them sends you elsewhere. Only problems that clear all three — no simple rule, data available, mistakes tolerable — are good fits. That filter alone will save you from most of the ways people misuse machine learning.

Complexity is a cost, not a virtue

A frequent mistake — especially when machine learning is exciting and new — is to use it because it is impressive, not because the problem demands it. A model where a rule would do is harder to maintain, harder to explain, more likely to fail silently, and slower to ship. Choosing the simpler correct solution is a mark of seniority, not a lack of ambition.

A concrete contrast: rule vs. model

Let us make the tradeoff tangible with a tiny example. Suppose we want to classify iris flowers, and we try the rule-first approach. With a quick look at the data, a single threshold on petal length perfectly separates one species (setosa) from the others. For that part of the problem, a one-line rule is not just adequate — it is better than a model: simpler, exact, and instantly explainable.

A simple rule nailed it, and you should be delighted — that is the rule-first principle paying off. But now try to separate the other two species, versicolor and virginica. Their measurements overlap heavily; no single clean threshold divides them. This is where rule-writing stalls and a learned model earns its place.

For the easy split, a rule won. For the hard split, the learned model pulls ahead by using all four measurements together in a way no tidy threshold could. That is the whole argument of this page in one example: rules for the simple, models for the complex. A mature solution often uses both — cheap rules where they suffice and a model only where the problem genuinely needs one.

The lesson of the contrast

The question is never "rules or machine learning?" in the abstract. It is "which tool fits this specific sub-problem?" Strong practitioners reach for the cheapest tool that works and escalate to machine learning only where the pattern truly demands it.

Why rules explode while models scale

It is worth understanding why hand-written rules collapse on the hard problems, because the mechanism is the same every time: the number of rules needed grows faster than any human can write them.

Think about classifying images of animals. A first rule might be "four legs and fur → mammal." But then you need exceptions for animals photographed sitting, animals partly hidden, animals at odd angles, unusual breeds, poor lighting, and on and on. Each exception spawns sub-exceptions. The rule set does not grow linearly with the problem's difficulty — it grows combinatorially, because real-world variation multiplies. A human author hits a wall: they cannot enumerate cases fast enough, and the rules begin to contradict each other.

A learned model sidesteps this entirely. It does not store a rule per case; it learns a general pattern from examples that automatically covers cases the author never explicitly considered. Show it ten thousand labeled animal photos and it generalizes to the ten-thousand-and-first — including poses and lighting no rule-writer anticipated. The model scales with data (which you can often gather in bulk) instead of with hand-authored rules (which you cannot).

The crossover point

For an easy problem, a few rules suffice and beat a model. As a problem grows more complex, the rules needed pile up until writing and maintaining them costs more than the value they provide. Somewhere in between is a crossover point: below it, rules win; above it, learning from data wins. Much of an engineer's judgment is sensing which side of that line a problem sits on.

The other half: models adapt, rules stand still

There is a second reason machine learning exists, distinct from complexity: the world changes, and a model can be retrained while a rule cannot adapt on its own. Spam evolves, fraud tactics shift, customer tastes drift. A hand-written rule keeps doing exactly what it said until a human rewrites it. A model, by contrast, can be retrained on fresh data and absorb the new pattern automatically.

We can watch this in miniature. Below, the "world" changes between an old regime and a new one (the relationship between input and output shifts). A fixed rule, tuned to the old world, grows stale. A model simply retrained on new data tracks the change.

The fixed rule and the un-updated model both do poorly once the world shifts, but retraining the model on fresh data brings its error right back down. That is the adaptability machine learning buys you — at the price of having to notice the drift and do the retraining, which is exactly the maintenance cost we will weigh in a moment.

Adaptability is a capability, not an automatic behavior

A model does not retrain itself. "Models adapt" really means "models can be retrained on new data, whereas a rule has to be rewritten by hand." In practice you still need to monitor for drift and trigger retraining. The advantage is that the mechanism for adapting — feed it new labeled examples — is built in, rather than requiring a human to re-derive the logic.

QuestionSelect one

A spam-detection team finds that their hand-written rules need constant manual updates as spammers invent new tricks, and the rule list has grown to thousands of entries that sometimes conflict. Which two properties of the problem most justify switching to machine learning?

The dataset is small and the answer is an exact formula

The problem is simple and never changes over time

The pattern is too complex to enumerate as rules (the rule list explodes), and it keeps changing — so a model that generalizes from data and can be retrained fits better

Machine learning is newer, so it must be the better choice here

A quick check

QuestionSelect one

What do spam filtering, handwriting recognition, and fraud detection have in common that makes them good fits for machine learning?

They all involve very small datasets

They can each be solved perfectly by a short list of if/else rules

In each, the input-to-output relationship is real but too complex (and often too changeable) to specify by hand, while labeled examples are available to learn from

They all have a single, exact mathematical formula as the answer

Counting the costs: what machine learning demands

Choosing machine learning is choosing to take on a set of ongoing costs. Knowing them up front keeps you from being surprised later.

You need data — usually a lot of it, and it must be labeled. A model learns from examples paired with answers. Gathering those examples and labeling them correctly is often the most expensive and time-consuming part of a project, far more than the modeling itself. No data, no model.

You accept that the model will sometimes be wrong. Even an excellent model makes mistakes. You are trading the exactness of a rule for the flexibility of a learned pattern. If a single wrong answer is catastrophic or legally unacceptable, that trade may not be worth it.

You take on maintenance. The world drifts. Customer behavior shifts, new fraud tactics appear, product catalogs change. A model trained on last year's data slowly goes stale — a phenomenon called drift — and must be monitored and retrained. A rule, by contrast, keeps doing exactly what it says until you change it.

You may sacrifice some interpretability. Some models are transparent; many are not. If you must justify every individual decision — to a regulator, a customer, or a court — an opaque model can be a liability. We have a whole page later on interpreting models, but it cannot make every model fully explainable.

The cost people forget: silent failure

A buggy rule usually fails loudly — it throws an error or returns an obvious nonsense value. A model can fail silently: it keeps returning confident, plausible-looking predictions that are quietly wrong because the world has drifted away from its training data. This is why honest evaluation and ongoing monitoring are not optional extras — they are the only way you will ever notice the model has gone bad.

When NOT to use machine learning

Let us collect the cases — already hinted at above — where you should deliberately not reach for a model. This list will save you more grief than any algorithm.

A simple rule already works. The golden case. If "orders over 10,000 units get a discount" solves it, write that. Do not model it.
You have no relevant labeled data and cannot get any. Learning requires examples. With none, machine learning is simply not on the table yet — gather data first.
Mistakes are unacceptable. For logic with a known-correct answer that must be exactly right every time (tax math, access control, anything safety-critical and deterministic), use rules. Models trade exactness for flexibility, and here you cannot afford that trade.
The relationship is genuinely arbitrary. If there is no real pattern linking inputs to outputs, no model can find one. Machine learning detects structure that exists; it does not invent structure that does not.
The cost of the system exceeds the value of the predictions. Building, serving, and maintaining a model is real work. For a low-stakes problem, a rough rule or even a manual process may be the wiser investment.

A grounding example of 'no pattern to find'

Imagine trying to predict the outcome of a fair coin flip from the time of day. There is no relationship — the coin does not care what time it is. A model trained on this will appear to find patterns in the training data (random noise always contains accidental-looking structure), but those patterns will not hold on new flips. This is a preview of overfitting, which the generalization chapters dissect: a model can always fit noise, so finding "signal" on the training set proves nothing on its own.

Real-world applications, framed by this decision

With the rule-vs-model lens, you can see why machine learning took over certain domains and left others alone.

Email and security (spam, malware, intrusion, fraud): adversarial and ever-changing — rules cannot keep up, so learning from fresh data wins.
Perception (vision, speech, handwriting): patterns humans recognize effortlessly but cannot articulate as rules — ideal for learning from examples.
Personalization (recommendations, search ranking, ads): preferences too individual and numerous to hand-code — learned from behavior at scale.
Forecasting (demand, prices, risk, churn): outcomes shaped by many interacting, shifting factors — flexible models beat fixed formulas.

And, just as tellingly, the domains where rules still dominate: accounting, billing, access control, regulatory compliance — places where the logic is exact, must be correct, and must be explainable. Nobody trains a model to add up an invoice, and they are right not to.

Your turn

Below is a set of described scenarios. Your job is to decide, for each, the better tool — a hand-written rule or a learned model — using the decision process from this page. This is the judgment the whole page has been building toward.

For each scenario, decide whether the better first tool is a hand-written rule or a learned machine learning model, applying the decision process from this page (prefer a simple rule when one works; reach for ML when the pattern is too complex or changeable to specify by hand).

Fill in the dictionary choices so that each scenario key maps to the string "rule" or the string "model":

"flag_orders_over_10000" — flag any order whose total exceeds 10,000 units for manual review.
"detect_fraud" — spot fraudulent credit-card transactions from a faint, shifting combination of dozens of signals.
"recognize_handwritten_digits" — read handwritten digits from scanned images.
"compute_sales_tax" — compute sales tax from a known, fixed tax rate and the order total.
"recommend_next_video" — recommend the next video from a user's complex viewing history.

The hidden tests check each individual answer.

Check your understanding

QuestionSelect one

According to the decision process on this page, what should you try first for a new problem?

The most powerful machine learning model available

A deep neural network, then simplify if it is too slow

The simplest thing that could work — a hand-written rule, if one can solve the problem

Whatever approach is currently most popular

QuestionSelect one

A team wants to compute shipping cost as a fixed table: a known rate per weight bracket and destination zone. Should they use machine learning?

Yes — any business logic benefits from a learned model

Yes — but only after collecting millions of past shipments

No — this is an exact, known formula; a lookup rule computes it perfectly, and a model would add cost, error, and complexity for no benefit

It cannot be decided without training a model first

QuestionSelect one

Which of the following is a genuine cost of choosing machine learning over a hand-written rule?

It is impossible to make any predictions until the model is 100% accurate

It requires no data at all, unlike rules

It needs labeled example data, it will sometimes be wrong, and it must be monitored and retrained as the world drifts

It always runs faster than the equivalent rule

QuestionSelect one

Why is "the model fails silently" considered an especially dangerous cost of machine learning?

Because models always crash loudly, making bugs obvious

Because silent failures only happen during training, never in production

Because a drifting model can keep returning confident, plausible-looking predictions that are quietly wrong, so the problem may go unnoticed without monitoring

Because silent failures are easy to ignore since they never affect users

QuestionSelect one

You try to predict a fair coin's outcome from the time of day, and your model scores well on the training data. What is the right conclusion?

The model has discovered a genuine pattern linking time of day to coin flips

Coins must be biased by the time of day after all

There is no real relationship; the model has merely fit accidental noise in the training data, and it will not generalize to new flips

The test set must be broken

Why Machine Learning Exists

On this page