Pentium FDIV: the math bug that embarrassed Intel worldwide

December 20, 2025 • February 3, 2026 • Read: 19 min • Views: 0

Was this helpful?

Nothing in ops is scarier than a system that’s fast, stable, and wrong. You can page on latency. You can page on crashes. But if your compute layer silently miscalculates and still returns a perfectly plausible number? That’s how you ship bad trades, bad risk, bad science, and bad invoices—at scale.

The Pentium FDIV bug is the canonical story: a tiny, boring defect in a floating-point division unit that turned Intel’s flagship CPU into a global case study in “correctness is a feature.” It wasn’t a meltdown of servers. It was worse: an error that could hide in plain sight.

What actually happened (and what FDIV means)

FDIV is the x86 floating-point division instruction. In 1994, researchers and users discovered that some Intel Pentium processors produced incorrect results for a small set of floating-point division operations. Not “a bit off in the last decimal,” but measurably wrong in ways that could matter for numerical software.

The bug lived in the hardware implementation of the division algorithm in the Pentium’s floating-point unit (FPU). Under specific operand patterns, the CPU used incorrect intermediate values, producing an incorrect quotient. Most divides were fine. Many people never hit it. That’s precisely why it became infamous: it was rare enough to slip through, yet real enough to break trust.

Reliability engineers obsess over failure modes. FDIV wasn’t a typical availability incident; it was a correctness incident. Those are harder. Your SLIs don’t scream. Your dashboards stay green. Your customers quietly lose money and then loudly lose faith.

One quote that should be printed above every production on-call desk: “Hope is not a strategy.” — General Gordon R. Sullivan. In correctness land, “we’ve never seen it” is just hope with a nicer font.

Short joke #1: The FDIV bug proved you can have “high availability” and still be unavailable to the truth.

Why it mattered: silent errors beat loud outages

If a storage controller panics, you fail over. If a cluster node dies, you replace it. If a CPU returns the wrong answer with the right confidence? You may never know which rows are poisoned. You can’t “restart” your way out of bad math any more than you can fsck your way out of a forged ledger.

In operational terms, the FDIV incident forced a public conversation about:

Hardware as part of your trusted computing base. Your threat model isn’t just attackers; it’s bugs.
Correctness SLIs. If you don’t measure “right,” you’ll only measure “fast.”
Recall economics. Replacing chips costs money. Not replacing them costs credibility—sometimes more.
Communication. Engineering errors can be forgiven; evasive messaging usually isn’t.

Intel eventually offered replacements, but not before the story escaped into mainstream media. That part matters because it’s a classic corporate failure mode: treating a technical defect like a PR annoyance, instead of an integrity breach.

Fast facts and historical context (the stuff people forget)

It surfaced in 1994, when Pentium was a premium brand and “floating point” was increasingly relevant beyond academia.
The issue was deterministic for certain operand pairs: same inputs, same wrong output. That made it reproducible, not a cosmic ray ghost.
The root cause was missing entries in a lookup table used by the division algorithm (details below). Not a rounding corner; a table defect.
Most workloads never noticed because they didn’t do enough sensitive floating-point division to hit the bad cases.
Spreadsheet and finance usage increased the blast radius because “business math” had moved onto desktops where Pentium was king.
IEEE 754 wasn’t the villain; the implementation was. Standards can’t save you from bad silicon.
Independent verification mattered: the issue became undeniable because people could reproduce it across machines.
It influenced procurement culture—more buyers started asking about errata, stepping revisions, and validation, not just MHz.
It prefigured modern “silent data corruption” thinking that now shows up in storage (checksums), memory (ECC), and distributed systems (end-to-end validation).

How the FDIV bug worked: table-driven division and missing entries

Hardware division is expensive in gates and latency. CPUs use clever algorithms to approximate reciprocals and refine them. In the Pentium’s FPU, the division operation used a lookup table to seed the approximation. That table maps certain bits of the divisor into constants that drive an iterative refinement process (think “start close, then converge”).

Here’s the key failure mode: some table entries were wrong (effectively missing/zeroed), so the initial approximation could be off enough that the refinement converged to the wrong result. Not wildly wrong, but wrong beyond acceptable floating-point error.

From an SRE perspective, this is a nightmare class of bug because:

It is input-dependent and therefore appears “random” in production, even though it’s deterministic.
It can be data-dependent, meaning only certain datasets trigger it (a specific distribution of divisors).
It is non-crashing. There’s no signal unless you validate outputs.
It is platform-specific. Your staging environment might not match production stepping.

Engineers sometimes ask: why didn’t tests catch it? Because the space of floating-point inputs is astronomically large, and “typical” tests tend to focus on boundary values (0, 1, powers of two, denormals) rather than the weird middle where approximation tables bite you.

Hardware verification improved over time, but the deeper lesson remains: correctness needs independent checking, not just faith in a vendor, a standard, or a clean CI run.

Reproducing and detecting it like an SRE

You probably don’t have a 1994 Pentium lying around in a rack (if you do, please don’t connect it to anything). Still, the operational method matters: define a known-bad test, run it across fleet segments, compare results, and isolate by hardware signature.

A classic reproduction uses carefully chosen operands where the Pentium’s FDIV result diverges from correct division. The exact constants aren’t the point here; the point is building a cross-check harness that can detect “CPU A disagrees with CPU B” without requiring you to know the true answer in advance.

In production systems, that often looks like:

Run computations twice with different implementations (hardware vs software, or two libraries).
Compare results within an acceptable tolerance envelope.
Escalate when mismatch rate crosses a threshold.
Tag outputs with provenance so you can identify which hosts produced suspect results.

That’s the same philosophy you use in storage when you checksum and scrub: trust, but verify—and verify end-to-end.

Practical tasks: commands, outputs, and decisions (12+)

These are real operational tasks you can run on Linux fleets to reduce the risk of “silent wrong math,” identify heterogeneity, and build guardrails. Each task includes: command, sample output, what it means, and what decision you make.

Task 1: Identify CPU model and stepping across a host

cr0x@server:~$ lscpu | egrep 'Model name|Vendor ID|CPU family|Model:|Stepping:'
Vendor ID:           GenuineIntel
Model name:          Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
CPU family:          6
Model:               79
Stepping:            1

Output meaning: Model/stepping is how you correlate to known errata (including historical ones like FDIV, and modern ones like speculative execution mitigations).

Decision: If you run mixed steppings, plan for correctness validation by segment and pin high-integrity workloads to known-good steppings.

Task 2: Capture microcode version (because microcode sometimes mitigates issues)

cr0x@server:~$ dmesg | grep -i microcode | tail -n 3
[    0.452112] microcode: microcode updated early to revision 0xb00003a, date = 2021-05-11
[    0.452948] microcode: CPU0 sig=0x406f1, pf=0x1, revision=0xb00003a
[    0.453015] microcode: Microcode Update Driver: v2.2.

Output meaning: Confirms whether you’re running updated microcode. FDIV was a hardware table defect and not “microcode-fixable” in practice, but many modern CPU issues are.

Decision: If microcode is stale, treat it like a security patch backlog. Roll it with your kernel update cadence.

Task 3: Inventory fleet CPU signatures at scale (example using SSH fanout)

cr0x@server:~$ for h in app01 app02 app03; do echo "== $h =="; ssh $h "lscpu | egrep 'Model name|Stepping'"; done
== app01 ==
Model name:          Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
Stepping:            1
== app02 ==
Model name:          Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
Stepping:            1
== app03 ==
Model name:          Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
Stepping:            2

Output meaning: app03 differs. That’s how rare correctness bugs become “only happens on Tuesdays.”

Decision: Quarantine outliers from sensitive workloads until validated.

Task 4: Check whether ECC memory is present and enabled

cr0x@server:~$ sudo dmidecode -t memory | egrep -i 'Error Correction Type|Total Width|Data Width' | head -n 6
Error Correction Type: Multi-bit ECC
Total Width: 72 bits
Data Width: 64 bits
Error Correction Type: Multi-bit ECC
Total Width: 72 bits
Data Width: 64 bits

Output meaning: ECC doesn’t prevent FDIV, but it prevents a huge class of “wrong math” due to memory bit flips.

Decision: If a fleet segment lacks ECC, don’t run high-integrity numerical workloads there. Period.

Task 5: Look for machine check events (hardware errors) in logs

cr0x@server:~$ journalctl -k | egrep -i 'mce|machine check|hardware error' | tail -n 5
Jan 21 08:11:02 server kernel: mce: [Hardware Error]: CPU 7: Machine Check: 0 Bank 5: bea0000000000108
Jan 21 08:11:02 server kernel: mce: [Hardware Error]: TSC 0 ADDR fef1c140 MISC d012000100000000
Jan 21 08:11:02 server kernel: mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1705824662 SOCKET 0 APIC 14 microcode b00003a

Output meaning: Indicates underlying hardware instability. Not FDIV-specific, but it tells you the platform can’t be trusted blindly.

Decision: If MCEs show up, schedule hardware replacement and move correctness-critical jobs away immediately.

Task 6: Run a quick floating-point self-check with software cross-compare (simple harness)

cr0x@server:~$ python3 - <<'PY'
import random, math, decimal
decimal.getcontext().prec = 80
def check(n=20000):
    bad = 0
    for _ in range(n):
        a = random.uniform(1e-100, 1e100)
        b = random.uniform(1e-100, 1e100)
        # hardware float
        hf = a / b
        # high precision decimal
        da = decimal.Decimal(str(a))
        db = decimal.Decimal(str(b))
        df = da / db
        # compare within relative tolerance
        if hf != 0.0:
            rel = abs((decimal.Decimal(hf) - df) / df)
            if rel > decimal.Decimal("1e-12"):
                bad += 1
    return bad
print("mismatches:", check())
PY
mismatches: 0

Output meaning: “0 mismatches” doesn’t prove perfection; it reduces suspicion. If mismatches spike, you’ve got a correctness incident.

Decision: If mismatches > 0, widen investigation: same code on different hosts, check CPU signatures, validate compiler flags and math libraries.

Task 7: Identify compiler flags that may change numerical behavior

cr0x@server:~$ gcc -Q -O2 --help=optimizers | egrep 'fast-math|unsafe-math|finite-math-only|fp-contract' | head -n 6
  -ffast-math                 		[disabled]
  -funsafe-math-optimizations 		[disabled]
  -ffinite-math-only          		[disabled]
  -ffp-contract               		[off]

Output meaning: Fast-math style flags can legally break IEEE expectations. That’s “optimization” with a correctness mortgage.

Decision: For finance/science/crypto/integrity workloads, ban these flags in release builds unless you’ve proven they’re safe.

Task 8: Verify which libm / libc you are actually running

cr0x@server:~$ ldd --version | head -n 2
ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Copyright (C) 2022 Free Software Foundation, Inc.

Output meaning: Different libm implementations can differ in edge cases. If you’re debugging numerical drift, you need to know the exact runtime.

Decision: Standardize runtimes across compute fleets, or you’ll chase “bugs” that are just library differences.

Task 9: Confirm CPU flags relevant to floating-point behavior

cr0x@server:~$ grep -m1 -oE 'sse2|sse4_2|avx2|fma' /proc/cpuinfo | sort -u
avx2
fma
sse2
sse4_2

Output meaning: Instruction set choices affect numerics (FMA changes rounding behavior) and code paths in libraries.

Decision: If results differ between hosts, check whether different flags are causing different execution paths.

Task 10: Pin a workload to a specific CPU model/segment (operational containment)

cr0x@server:~$ taskset -c 0-3 ./risk_calc --portfolio P42
OK: computed VaR=1.873e6 in 2.14s (threads=4)

Output meaning: This pins execution to a subset of cores. Not a fix for FDIV, but a technique to isolate and reproduce issues on known cores/CPUs.

Decision: If only some cores/CPUs misbehave (rare, but possible with marginal hardware), containment buys you time.

Task 11: Detect heterogeneity in container/Kubernetes nodes (CPU model as a scheduling constraint)

cr0x@server:~$ kubectl get nodes -o wide
NAME     STATUS   ROLES    AGE   VERSION   INTERNAL-IP   OS-IMAGE             KERNEL-VERSION
node-a   Ready    worker   92d   v1.28.3   10.0.1.11     Ubuntu 22.04.3 LTS   5.15.0-91-generic
node-b   Ready    worker   92d   v1.28.3   10.0.1.12     Ubuntu 22.04.3 LTS   5.15.0-91-generic
node-c   Ready    worker   18d   v1.28.3   10.0.1.13     Ubuntu 22.04.3 LTS   5.15.0-91-generic

Output meaning: Age differences often correlate with hardware differences. That’s where “same app, different answers” is born.

Decision: Add node labels based on CPU model/stepping and schedule correctness-critical workloads explicitly.

Task 12: Label nodes by CPU model to enforce placement

cr0x@server:~$ kubectl label node node-c cpu.intel.com/model=79
node/node-c labeled

Output meaning: A stable handle for scheduling rules.

Decision: Use node affinity to ensure consistency, especially when validating numerical regressions.

Task 13: Confirm math library uses consistent rounding mode (runtime sanity)

cr0x@server:~$ python3 - <<'PY'
import decimal
print("decimal rounding:", decimal.getcontext().rounding)
PY
decimal rounding: ROUND_HALF_EVEN

Output meaning: HALF_EVEN is common in financial contexts; inconsistent rounding rules across services can look like “hardware bugs.”

Decision: Standardize rounding policies in code and document them like an API contract.

Task 14: Detect if the kernel is reporting CPU vulnerabilities/mitigation state

cr0x@server:~$ grep -H . /sys/devices/system/cpu/vulnerabilities/* | head -n 5
/sys/devices/system/cpu/vulnerabilities/l1tf:Mitigation: PTE Inversion
/sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI
/sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: usercopy/swapgs barriers and __user pointer sanitization
/sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Retpolines; IBPB: conditional; IBRS_FW
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable

Output meaning: Shows the OS is aware of CPU-level issues. Different topic than FDIV, same meta-lesson: silicon has errata; you must manage it.

Decision: Track these states as inventory. If you can’t explain your mitigation state, you can’t explain your risk.

Fast diagnosis playbook

This is the “stop theorizing and get signal” sequence when someone says, “The numbers don’t add up,” and you suspect hardware, compiler, or library-level correctness issues. The goal is to find the bottleneck in your investigation quickly: inputs, platform, or implementation.

First: confirm it’s real (and bound the blast radius)

Get a minimal reproducer. Same inputs, same function, same output mismatch. If you can’t reproduce, you can’t fix.
Check whether the mismatch is deterministic. Deterministic points to implementation/platform. Non-deterministic points to race, uninitialized memory, or undefined behavior.
Compare results across two machines. If one host differs, you have a hardware/software environment delta to chase.

Second: fingerprint the environment like you mean it

CPU model/stepping and microcode. If different, assume that’s relevant until disproven.
Compiler and flags. Hunt for fast-math, FMA differences, and undefined behavior sanitizer warnings.
Math library versions. libm and BLAS backends are frequent culprits.

Third: isolate by method, not by vibes

Run a high-precision reference. Use decimal/bigfloat or MPFR-based tools as an oracle for spot checks.
Run a software fallback path (if available) and compare. Divergence implicates hardware or low-level codegen.
Lock down rounding mode and denormals. Inconsistent FP environment yields “ghost” differences.

Short joke #2: If your incident postmortem includes “floating point is weird,” you didn’t find the cause—you found an excuse.

Three corporate mini-stories from the correctness trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-sized fintech ran daily risk calculations across a batch cluster. The job was “embarrassingly parallel”: split portfolios, compute metrics, merge results. For years it worked. Then they upgraded a subset of nodes—newer CPUs, same OS image, same container digest. They assumed “compute is compute.”

A week later, reconciliation started drifting. Not catastrophically. Think small discrepancies across millions of accounts, the kind that looks like rounding noise until it doesn’t. The on-call rotated through the usual suspects: timezone offsets, decimal formatting, a new data source. Nothing stuck.

The breakthrough came when someone reran the same portfolio slice on two different nodes and got different values in the 10th–12th decimal place, which was still enough to flip a threshold test downstream. The root cause wasn’t a modern FDIV-style hardware defect; it was a change in floating-point code paths: the newer CPUs enabled FMA, the older ones didn’t, and the BLAS backend made different choices accordingly.

The wrong assumption was simple: “If the container is the same, the math is the same.” Containers package userland, not silicon behavior. Their fix was operational: label nodes by CPU capabilities, schedule risk jobs onto a consistent class, and add a cross-run validation step that compared a sample of results to a high-precision reference.

The FDIV lesson fits: hardware is part of your API surface. Ignore it and you’ll debug your own confidence.

Mini-story 2: The optimization that backfired

A machine-learning platform team had a cost problem: inference was expensive, and product wanted lower p95 latency. An engineer flipped a compiler switch: aggressive floating-point optimizations, plus a BLAS rebuild tuned for the newest instruction set in the fleet.

Latency improved. Everyone celebrated. Then a model monitoring alert fired: drift in prediction distributions for a narrow subset of inputs. Not a crash. Not a spike. A quiet, persistent skew. The model hadn’t changed; the numeric behavior had.

The “optimization” altered associativity and rounding behavior enough to move certain borderline classifications across a threshold. Most predictions were identical. The ones that mattered—high-value edge cases—were not.

They rolled back the build flags, re-ran a representative input corpus, and instituted a rule: any numeric optimization requires an accuracy budget and a regression suite that includes threshold sensitivity tests, not just average error. They also wrote down which workloads could tolerate non-IEEE transformations and which could not.

FDIV didn’t happen because someone enabled a flag. But the corporate failure mode is the same: treating correctness like a negotiable quality attribute. It isn’t negotiable in the parts of your system that make commitments.

Mini-story 3: The boring but correct practice that saved the day

A research computing group ran long simulations on a shared cluster. They had a policy that annoyed everyone: each release had to produce a “numerical fingerprint” on a fixed seed dataset, stored with the build artifact metadata. Same commit, same seed, same environment class, same output hash range.

One month, a new hardware batch arrived. The jobs ran faster, but the fingerprint shifted. Not by much—just enough to exceed their tolerated envelope. No user had complained yet, because the outputs still looked plausible. But the policy caught it before publication-quality results were generated.

The cause turned out to be a subtle runtime change: different libm behavior combined with different default handling of denormal numbers. They weren’t facing a headline-worthy silicon defect, but the outcome could have been reputational damage and weeks of wasted compute.

The fix was procedural and boring: standardize the FP environment, pin critical runs to a validated node class, and keep the fingerprint gate. They also documented the acceptable numeric variance per model type. The lesson: boring gates are how you buy back sleep.

Common mistakes: symptoms → root cause → fix

These are the patterns that make correctness bugs linger for months. Treat them as failure signatures.

1) Symptom: “Only one customer sees it”

Root cause: Workload/data-dependent inputs trigger a rare operand pattern or a threshold edge case.

Fix: Capture the customer’s exact inputs and replay them across two environments. Add canary validation for that input class.

2) Symptom: “Staging can’t reproduce production”

Root cause: CPU stepping, microcode, or instruction set differences between environments. Containers don’t solve that.

Fix: Build environment classes (labels) and require staging to match production class for correctness tests.

3) Symptom: “It’s random; rerun fixes it”

Root cause: Data races, uninitialized memory, undefined behavior, or non-deterministic parallel reduction ordering.

Fix: Use sanitizers, run single-threaded reference mode, and enforce deterministic reductions where correctness matters.

4) Symptom: “The discrepancy is tiny, so it’s fine”

Root cause: Tiny numeric differences can flip comparisons, thresholds, or branching, creating large downstream effects.

Fix: Identify threshold boundaries and add hysteresis, epsilon comparisons, or higher-precision calculation for decision points.

5) Symptom: “It started after a performance improvement”

Root cause: fast-math, FMA enabling, vectorization differences, or BLAS backend changes.

Fix: Maintain two build profiles (strict vs fast). Gate fast profile with numeric regression tests and a documented error budget.

6) Symptom: “It happens only on one rack/region”

Root cause: Hardware batch differences, thermal issues causing marginal behavior, or different BIOS/microcode baselines.

Fix: Pull CPU/microcode inventory, check MCE logs, and quarantine suspect hardware. Don’t argue with physics.

7) Symptom: “Checksums pass, but analytics drift”

Root cause: Your data pipeline is intact; the compute is wrong. Storage integrity does not guarantee compute integrity.

Fix: Add end-to-end validation: recompute samples on a known-good reference implementation and compare.

Checklists / step-by-step plan

When you suspect a CPU-level correctness issue

Freeze inputs: capture exact payloads, seeds, and config. No “close enough.”
Reproduce on the same host twice. If it changes run-to-run, you’re likely chasing nondeterminism, not silicon.
Reproduce on a different host with a different CPU signature. If one host class disagrees, you have a segmentation handle.
Fingerprint the platform: CPU model/stepping, microcode, kernel, libc/libm, container digest, compiler version.
Cross-check with a reference: higher precision or alternative implementation for a representative sample.
Contain: pin sensitive workloads to a validated hardware class; drain suspect nodes.
Communicate: correctness incidents need clear stakeholder messaging—what is impacted, what is not, and how you know.
Remediate: replace/retire hardware if implicated, or standardize runtime/flags if software-induced.
Prevent recurrence: add numeric fingerprint gates and environment class scheduling.

What to standardize in a production compute fleet (if you like sleeping)

CPU inventory tracked like a dependency: model, stepping, microcode.
Golden node classes for correctness-critical jobs.
Build profiles: strict IEEE profile for integrity workloads; fast profile only with explicit approval.
Validation harnesses that can be run on demand: cross-compare, sample-based reference checks.
Provenance metadata: tag results with host class, build ID, and library versions for forensic traceability.

FAQ

1) Was the Pentium FDIV bug a software bug or hardware bug?

Hardware. It was an implementation defect in the Pentium FPU division logic involving incorrect lookup table entries used during division.

2) How common were wrong results?

Rare for typical consumer workloads, more plausible in numerically heavy code that performs lots of floating-point division with varied operands. “Rare” is still unacceptable when the error is silent.

3) Could you fix FDIV with a software patch?

You could sometimes work around it in software by avoiding the specific hardware instruction patterns or using software-emulated division, but the real fix was replacing affected chips. The defect was in silicon.

4) Why didn’t normal testing catch it?

Because floating-point input space is enormous, and verification that relies on “typical cases” misses rare operand patterns. Also, performance pressure historically pushes designs toward approximations that are tricky to validate exhaustively.

5) What’s the modern equivalent risk?

Any silent data corruption source: marginal hardware (memory, CPU), undefined behavior in code, aggressive compiler math flags, inconsistent FP environments across heterogeneous fleets, and bugs in numerical libraries.

6) How do I protect a distributed system from silent math errors?

Use redundancy and validation: recompute samples on separate host classes, compare independent implementations, keep strict build profiles for integrity, and tag outputs with provenance for forensic traceability.

7) Do GPUs have “FDIV-class” problems too?

They can. GPUs and accelerators may use different math modes (fast approximations, fused operations, flush-to-zero behavior). If you need strict reproducibility, you must configure and test for it.

8) Is “fast-math” always wrong?

Not always. It’s a trade: you permit transformations that can violate IEEE expectations. It’s fine for some workloads (graphics, some ML inference), dangerous for others (finance, scientific publication, cryptography, audit trails).

9) What should incident response look like for correctness bugs?

Like a security incident: freeze evidence, bound impact, contain by environment class, communicate clearly, and only then optimize. Correctness is an integrity property, not a performance metric.

Practical next steps

The Pentium FDIV bug isn’t just retro computing gossip. It’s a durable operational lesson: correctness failures don’t announce themselves, and “rare” is not the same as “safe.” If you run production systems where numbers become decisions, you need a plan that treats compute integrity as a first-class reliability concern.

Inventory your compute fleet by CPU model/stepping and microcode; track it like a dependency.
Create validated node classes and schedule correctness-critical workloads onto them intentionally.
Establish a strict math build profile and ban unsafe flags by default.
Add a numerical fingerprint gate for key jobs: fixed seed inputs, tolerated envelope, provenance metadata.
Build a correctness on-call runbook that starts with cross-host comparison and ends with containment, not speculation.

If you only optimize for speed, you will eventually ship fast nonsense. The FDIV bug made that lesson public. You don’t need a worldwide embarrassment to learn it privately.