AMD SMT: when the “Intel feature” became a real rival

December 18, 2025 • February 3, 2026 • Read: 19 min • Views: 0

Was this helpful?

Somewhere between “the CPUs are at 40%” and “the p99 is on fire,” SMT becomes the quiet suspect in the room.
If you run production systems, you’ve seen this movie: latency spikes that don’t line up with load, noisy neighbors that aren’t noisy enough to show up in dashboards, and a well-meaning optimization that turns into a slow-motion outage.

Simultaneous Multithreading (SMT) on AMD—often treated as “Intel’s Hyper-Threading but on AMD”—stopped being a checkbox years ago.
It’s now a real tuning lever: sometimes a free throughput bump, sometimes a tail-latency tax, sometimes a security/compliance argument, and always a scheduling problem wearing a performance hat.

What SMT really is (and what it is not)

SMT lets one physical core present multiple logical CPUs to the operating system. On most modern AMD server parts,
that’s 2 threads per core (SMT2). The scheduler sees two “CPUs.” The core sees one set of execution resources, shared
between two architectural states.

That last sentence is the operational crux: two threads share something. They do not magically become two cores.
In practice, they share front-end resources, execution units, caches at various levels, and—critically—contention paths
you only notice at p95/p99. SMT is a bet that when one thread is stalled (cache miss, branch mispredict, memory wait),
the other thread can use otherwise idle units.

SMT is not “free cores”

The OS will happily schedule two busy threads onto one physical core’s siblings if you let it.
Throughput might rise, latency might get worse, and the CPU graphs will smugly report “not saturated.”
This is how you end up with CPUs “only” 60% busy while your customers are refreshing the page like it’s 2009.

SMT is a scheduling contract you didn’t know you signed

Enabling SMT changes the topology the OS sees. It changes how you pin vCPUs, how you partition workloads, how you read
“CPU utilization,” and how you interpret perf counters. It can also change the risk profile for side-channel attacks
and the mitigations you have to carry.

One dry truth: if you don’t explicitly decide how SMT siblings are used, Linux will decide for you, and Linux optimizes
for throughput and fairness—sometimes at the expense of your tail latency.

How AMD made SMT competitive: from “cute” to “credible”

For a long time, “SMT” in the public mind meant Intel Hyper-Threading. AMD had multithreading stories before Zen, but
Zen (and EPYC in particular) is where AMD SMT became operationally relevant in the same conversations as Intel.

The important shift wasn’t marketing. It was architecture meeting platform reality: higher core counts, better per-core
performance, and a server ecosystem that suddenly took AMD seriously. When you deploy racks of EPYC, you stop treating
SMT as a footnote. You start treating it like a policy.

Why “real rival” happened

Core counts forced the question. With many-core servers, you can choose between “more threads now” and “more isolation.”
Schedulers matured. Linux topology awareness improved; admins learned to respect CPU sibling relationships.
Cloud made it everyone’s problem. Multi-tenant contention and noisy-neighbor effects turn SMT into a business decision.
Security got loud. SMT became part of threat models; some orgs disabled it by default after major speculative execution disclosures.

The net: AMD SMT isn’t a knockoff. It’s a meaningful knob with familiar tradeoffs—but the details of EPYC topology (CCD/CCX eras,
NUMA layouts, cache behavior) mean you can’t copy-paste Intel-era rules and expect happiness.

Facts & history you can use at a whiteboard

Concrete context points—useful for explaining why your “just enable SMT” plan needs a test plan attached.

Intel shipped Hyper-Threading broadly in the early 2000s. It shaped the industry’s mental model of SMT for years.
AMD’s Bulldozer-era “module” design wasn’t SMT. It shared some resources between two integer cores; it confused buyers and benchmark charts.
Zen (2017) brought mainstream AMD SMT to servers. EPYC made SMT a default expectation in AMD datacenters.
Early EPYC generations exposed complex topology. NUMA behavior and cache domains mattered more than “threads per core.”
Linux schedulers got better at packing and spreading. Topology-aware scheduling reduced some classic HT/SMT pathologies, but not all.
Speculative execution disclosures changed defaults. Post-2018, many enterprises revisited SMT policies due to cross-thread leakage risk.
Some hyperscalers chose SMT-on for throughput, SMT-off for certain tiers. The “one size fits all” era ended.
Performance gains from SMT are workload-specific. For many server workloads, SMT yields modest throughput gains; for others it’s negative under contention.

If you need a single mental model: SMT is best when one thread leaves bubbles in the pipeline and the other can fill them.
SMT is worst when both threads want the same hot resources at the same time.

Performance reality: where SMT helps, hurts, and lies to you

Where SMT usually helps

SMT tends to help when you have a lot of independent work that stalls frequently: request-heavy services with I/O waits,
mixed instruction streams, branchy code, cache misses, and enough parallelism to keep the machine busy.
Throughput-oriented systems (batch processing, some web tiers, background queue consumers) often see benefits.

Where SMT often hurts

SMT can hurt when you care about consistent per-request latency, or when the workload is already highly optimized and
saturates execution resources. The sibling thread becomes contention, not help.
Classic victims: low-latency trading-ish systems (even if you’re not trading), busy OLTP databases, and storage paths
where lock contention and CPU cache behavior matter.

The lie: “CPU utilization is low, so CPU isn’t the bottleneck”

With SMT, you can have a core with two siblings each at 50% “utilization” while the physical core is effectively saturated
on a critical resource. You see idle time because the measurement is per logical CPU, not per physical resource.

If your p99 worsens after enabling SMT, don’t argue with the graph. Argue with the scheduler and the counters.

Resource contention you should actually care about

L1/L2 pressure and instruction front-end contention. Two hot threads fight over the same tiny fast things.
Shared execution units. If both threads are heavy on the same unit types, SMT becomes self-harm.
Cache and memory bandwidth. SMT can increase outstanding memory requests, which can help or saturate the fabric.
Lock contention and spin loops. SMT can turn “a little spinning” into “two threads burning the same core.”
Kernel overhead and interrupts. Bad IRQ placement plus SMT can inject jitter into your “dedicated” cores.

One quote worth keeping on your wall, because it applies to SMT decisions as much as it does to incident response:
Hope is not a strategy. —General James N. Mattis

The operational version: measure, change one variable, and be suspicious of improvements that only show up in averages.

Joke #1: SMT is like adding a second driver to a forklift. Sometimes you move more pallets; sometimes you just argue louder in the same aisle.

Fast diagnosis playbook (find the bottleneck fast)

When a service slows down and SMT is involved, you need a deterministic triage order. Here’s the fastest path I’ve found
that avoids losing an afternoon to vibes.

1) Confirm topology and SMT state

How many sockets? How many cores per socket? Is SMT enabled?
Are you looking at “vCPU count” or “physical cores” in your mental math?

2) Identify whether you’re throughput-bound or latency-bound

If p95/p99 is the pain: assume contention/jitter until proven otherwise.
If queue backlogs and total throughput are the pain: SMT might help, but only if you’re not saturating memory/locks.

3) Check run queue and CPU pressure (not just CPU%)

Run queue > number of physical cores: CPU scheduling pressure is real.
High iowait doesn’t mean “disk is slow”; it can mean “threads are blocked and the CPU is underfed.”

4) Look for sibling contention

Are two heavy threads co-scheduled on siblings?
Are IRQs hitting the same physical cores you pinned “latency critical” workloads to?

5) Use counters for confirmation

Context switches, migrations, cache miss rates, stalled cycles.
Compare SMT-on vs SMT-off on the same host class if you can.

6) Decide: policy change or pinning change?

If you need predictable latency: prefer SMT off or strict sibling isolation.
If you need throughput and can tolerate jitter: prefer SMT on with sane placement.

Practical tasks: commands, outputs, decisions (12+)

These are the tasks you run during a performance investigation or a platform readiness review. Each includes:
command, a plausible output snippet, what it means, and what you decide next.

Task 1: Check SMT enabled/disabled at the kernel interface

cr0x@server:~$ cat /sys/devices/system/cpu/smt/control
on

Meaning: Kernel is allowing SMT siblings online.

Decision: If you’re troubleshooting tail latency, keep this in mind; you may test off or sibling pinning.

Task 2: Verify how many threads per core you actually have

cr0x@server:~$ lscpu | egrep 'Socket|Core|Thread|CPU\(s\)'
CPU(s):                               128
Thread(s) per core:                   2
Core(s) per socket:                   32
Socket(s):                            2

Meaning: 64 physical cores, 128 logical CPUs. SMT is in play.

Decision: Any “CPU count” you use in capacity planning must be explicit: physical vs logical.

Task 3: Map SMT sibling pairs (who shares a core with whom)

cr0x@server:~$ for c in 0 1 2 3; do echo -n "cpu$c siblings: "; cat /sys/devices/system/cpu/cpu$c/topology/thread_siblings_list; done
cpu0 siblings: 0,64
cpu1 siblings: 1,65
cpu2 siblings: 2,66
cpu3 siblings: 3,67

Meaning: CPU0 shares a physical core with CPU64, etc.

Decision: When pinning, treat siblings as one shared resource. Put competing heavy threads on different cores, not just different logical CPUs.

Task 4: Confirm NUMA layout (SMT decisions are topology decisions)

cr0x@server:~$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0-63
node 0 size: 257842 MB
node 0 free: 198442 MB
node 1 cpus: 64-127
node 1 size: 257881 MB
node 1 free: 201113 MB

Meaning: Two NUMA nodes; in this example node boundaries align with CPU ranges.

Decision: If a latency-sensitive app spans NUMA nodes unintentionally, fix NUMA placement before blaming SMT.

Task 5: Observe run queue and CPU pressure quickly

cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 9  0      0 182312  22120 913220    0    0     1    12 8420 9100 38 12 48  2  0
11  0      0 182100  22120 913224    0    0     0     0 8510 9802 44 10 44  2  0
14  0      0 181980  22120 913240    0    0     0    24 8732 10401 49 11 38  2  0

Meaning: r (run queue) is high relative to physical cores per NUMA node; context switches are elevated.

Decision: Investigate scheduling contention and pinning. High cs with SMT can amplify jitter.

Task 6: Check per-CPU utilization to spot overloaded siblings

cr0x@server:~$ mpstat -P ALL 1 1 | egrep 'Average|  0 | 64 '
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %idle
Average:       0    78.20    0.00    9.10    0.20    0.00    0.40    0.00   12.10
Average:      64    72.10    0.00    8.90    0.20    0.00    0.30    0.00   18.50

Meaning: Both siblings of one core are hot. That physical core is likely contention-heavy.

Decision: Spread heavy worker threads across physical cores first; don’t pack them onto siblings.

Task 7: Identify CPU migration churn (bad for cache locality and latency)

cr0x@server:~$ pidstat -w 1 3
Linux 6.5.0-25-generic (server)  01/10/2026  _x86_64_  (128 CPU)

12:10:01      UID       PID   cswch/s nvcswch/s  Command
12:10:02     1001     24891    120.00     40.00  myservice
12:10:03     1001     24891    131.00     52.00  myservice
12:10:04     1001     24891    125.00     48.00  myservice

Meaning: Many voluntary and involuntary context switches; potential contention or preemption.

Decision: If this is latency-critical, consider CPU affinity, cgroup isolation, and avoiding SMT sibling sharing.

Task 8: Inspect CFS bandwidth throttling (a common “SMT made it worse” trap)

cr0x@server:~$ cat /sys/fs/cgroup/my.slice/cpu.stat
usage_usec 982331200
user_usec 811100000
system_usec 171231200
nr_periods 38122
nr_throttled 4920
throttled_usec 91822100

Meaning: The workload is being throttled under CFS quota. SMT can hide or exacerbate this by changing perceived capacity.

Decision: Fix quotas/requests first. SMT policy should not compensate for incorrect cgroup CPU limits.

Task 9: Check IRQ distribution (interrupts stealing your “dedicated” cores)

cr0x@server:~$ head -n 10 /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
  24:     90122      1021      1120      1033   PCI-MSI 524288-edge  nvme0q0
  25:     88211       998      1099      1010   PCI-MSI 524289-edge  nvme0q1
  26:     87440      1002      1111       999   PCI-MSI 524290-edge  nvme0q2

Meaning: CPU0 is handling most NVMe interrupts. If CPU0 is a sibling of a latency-critical worker, enjoy your jitter.

Decision: Rebalance IRQ affinity away from critical cores; don’t let SMT siblings share IRQ-heavy CPUs.

Task 10: Verify kernel mitigations that interact with SMT/security posture

cr0x@server:~$ grep -E 'mitigations|nosmt|spectre|mds|l1tf' /proc/cmdline
BOOT_IMAGE=/vmlinuz-6.5.0-25-generic root=/dev/mapper/vg0-root ro mitigations=auto

Meaning: Mitigations are enabled automatically; SMT is not explicitly disabled via nosmt.

Decision: Align with your security policy. Some orgs require SMT off; others accept mitigations and isolation.

Task 11: Compare throughput and stalls with perf counters

cr0x@server:~$ sudo perf stat -a -e cycles,instructions,cache-misses,context-switches -- sleep 10
 Performance counter stats for 'system wide':

   78,221,990,112      cycles
   96,311,022,004      instructions              #    1.23  insn per cycle
      221,884,110      cache-misses
        9,882,110      context-switches

      10.004178002 seconds time elapsed

Meaning: IPC, cache misses, and context switches provide a sanity check. If IPC drops and cache misses rise after enabling SMT, you’re paying contention tax.

Decision: If p99 is bad and IPC/caches look worse with SMT on, test SMT off or enforce sibling isolation.

Task 12: Check if the kernel has offline’d sibling CPUs (partial SMT disable)

cr0x@server:~$ for c in 64 65 66 67; do echo -n "cpu$c online="; cat /sys/devices/system/cpu/cpu$c/online; done
cpu64 online=1
cpu65 online=1
cpu66 online=1
cpu67 online=1

Meaning: Siblings are online. If you see 0, SMT may be effectively reduced via CPU offlining.

Decision: Standardize: either manage SMT via BIOS/kernel options or via controlled CPU sets—don’t leave a half-configured state across a fleet.

Task 13: Pin a process away from SMT siblings (quick experiment)

cr0x@server:~$ sudo taskset -cp 4-31 24891
pid 24891's current affinity list: 0-127
pid 24891's new affinity list: 4-31

Meaning: The process is restricted to a subset of CPUs (ideally physical-core-aligned).

Decision: If latency improves, you don’t necessarily need SMT off—you need better placement.

Task 14: Confirm your CPU set doesn’t accidentally include sibling pairs

cr0x@server:~$ sudo cset shield --cpu=2-15 --kthread=on
cset: --> shielding CPUs: 2-15
cset: --> kthread shield set to: on

Meaning: You created a CPU shield (via cpuset) for isolation. But you still need to ensure those CPUs are physical-core-clean.

Decision: Use thread_siblings_list mapping to build cpusets that don’t put two heavy tasks on the same core.

Joke #2: Turning SMT off to fix latency without measuring is like rebooting a router to fix DNS—sometimes it works, and you learn nothing.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-size SaaS company migrated a core API tier from older Intel servers to shiny AMD EPYC hosts. The goal was simple:
reduce cost per request. They kept the same Kubernetes CPU requests/limits and treated “vCPU” as “core,” because the old
world mostly behaved that way.

The rollout looked fine in staging. In production, p99 latency doubled under peak load. CPU graphs were boring: 55–65%,
nothing scary. The on-call team chased database queries, then network, then GC tuning. The service was “not CPU bound,”
so nobody touched CPU policy.

The mistake: they assumed two logical CPUs were equivalent to two physical cores for their workload. In reality, their
request handlers were already CPU-hot and branchy, and they’d placed multiple busy pods such that siblings were competing
constantly. Linux was being fair, and the workload was being punished.

The fix wasn’t dramatic. They mapped sibling pairs, adjusted CPU Manager policy for the latency tier, and ensured
“guaranteed” pods received whole cores (and not siblings shared with other heavy pods). They also stopped interpreting
“65% CPU” as “headroom.” After the change, p99 returned near baseline with slightly lower throughput than the SMT-packed
configuration—but it was predictable, which is what customers pay for.

The lesson: the incident wasn’t “AMD SMT is bad.” The incident was “we didn’t define what a CPU means in our platform.”
Once you’re wrong about units, every dashboard lies politely.

Mini-story 2: The optimization that backfired

A data platform team ran a mixed workload on large EPYC boxes: a streaming ingest service, a compression-heavy batch job,
and a storage gateway. Someone noticed the batch job only ran at night. They decided to “borrow” CPU capacity during the day
by enabling SMT and increasing worker counts on the ingest service. More threads, more throughput, right?

The first day looked great—average throughput improved and CPU utilization rose. The second day, customer-facing dashboards
started timing out intermittently. Not constantly. Just enough to be infuriating. The storage gateway had sporadic latency
spikes, the kind that don’t show up in averages but wreck anything with timeouts.

Root cause: the ingest service’s extra threads were landing on SMT siblings of the gateway’s critical workers. The gateway
had periodic CPU-intensive bursts (checksums, encryption, metadata churn). Under SMT contention, those bursts got longer,
and the queueing cascaded. The system didn’t fail loudly; it failed like a bureaucracy—slow, inconsistent, and impossible
to blame on one metric.

The rollback restored stability. Then they reintroduced the change properly: they isolated the gateway to whole physical
cores, moved ingest to other cores, and capped ingest workers based on memory bandwidth and lock contention rather than
“threads available.”

The lesson: throughput wins that steal from tail latency are not wins. SMT is a shared-hardware deal; if you don’t set
boundaries, it will set them for you, badly.

Mini-story 3: The boring but correct practice that saved the day

A financial services team ran two classes of compute nodes: general-purpose and latency-sensitive. Their policy looked
unglamorous: SMT on for general-purpose, SMT off (or sibling-isolated) for latency nodes. Every change required an A/B
test with a fixed workload replay, perf counters captured, and a p99 acceptance gate.

During a major OS upgrade, a kernel change adjusted scheduling behavior enough that some workloads began migrating more
aggressively between CPUs. The difference was subtle; most dashboards didn’t notice. But the latency acceptance tests did.
Before the change hit full production, the team flagged a p99 regression on the latency class.

They didn’t argue about theory. They compared perf stats, context switch rates, and migration counts between the old and
new kernels, then pinned the critical service more tightly and adjusted IRQ affinity. The upgrade proceeded without a
customer-visible incident.

The lesson: “boring gates” and repeatable tests beat cleverness. SMT interacts with schedulers, kernels, and microcode.
If you don’t have a harness, you don’t have control—you have hope.

Common mistakes: symptom → root cause → fix

1) Symptom: p99 latency spikes after enabling SMT, but CPU% looks fine

Root cause: Two hot threads co-scheduled on SMT siblings; logical CPU utilization hides physical contention.

Fix: Pin latency-critical workers to whole physical cores or isolate siblings; validate with sibling mapping and perf counters.

2) Symptom: “More vCPUs” in VMs didn’t increase throughput

Root cause: vCPUs landed on siblings or oversubscribed cores; the host is contention-bound (cache/memory/locks).

Fix: Use CPU pinning with awareness of siblings; reduce vCPU count to match physical cores for critical VMs; measure run queue and IPC.

3) Symptom: Kubernetes Guaranteed pods still show jitter

Root cause: Exclusive CPUs weren’t truly exclusive—siblings shared with other workloads, or IRQs landed on the same cores.

Fix: Use static CPU Manager with whole-core allocation and ensure IRQ affinity avoids those cores.

4) Symptom: Disabling SMT improved latency but tanked throughput too much

Root cause: You used SMT as a blanket policy rather than per-tier placement; some workloads benefit from SMT and some don’t.

Fix: Split node pools: SMT-on for throughput tiers, SMT-off or sibling-isolated for latency tiers. Don’t mix without explicit pinning.

5) Symptom: High context switches and migrations, unstable performance

Root cause: Scheduler churn amplified by too many runnable threads and no affinity constraints; SMT increases the scheduling surface area.

Fix: Reduce worker thread counts; pin or constrain CPU sets; check cgroup throttling; tune IRQ placement.

6) Symptom: Security team demands SMT off “everywhere”

Root cause: Policy reacting to cross-thread side-channel risk; engineering didn’t provide a risk-tiered alternative.

Fix: Define workload tiers and isolation boundaries; for multi-tenant or high-risk workloads, SMT off may be correct. For dedicated hosts, SMT on may be acceptable.

7) Symptom: Storage nodes show periodic latency cliffs

Root cause: Interrupt storms or kernel threads contending with app threads on the same physical cores; SMT siblings get hit too.

Fix: Audit /proc/interrupts, rebalance IRQs, reserve cores for IO paths, and avoid pinning app workers to IRQ-heavy siblings.

Checklists / step-by-step plan

Checklist A: Deciding whether SMT should be enabled on a node class

Classify the workload. Throughput tier or latency tier? If you can’t answer, it’s latency tier by default.
Measure baseline. Capture p50/p95/p99, error rates, run queue, IPC, cache misses.
Change one thing. Toggle SMT or enforce sibling isolation—not both at once.
Re-run the same load. Replay production traffic or use a stable synthetic workload.
Gate on p99 and saturation signals. Averages are not allowed to approve platform policy.
Document the meaning of “CPU.” In quotas, capacity models, and dashboards: logical vs physical.

Checklist B: Rolling out SMT policy changes safely

Pick a single host class and a single region/cluster.
Enable detailed host metrics: per-CPU, run queue, migrations, IRQ distribution.
Canary with a small percentage and compare cohorts.
Verify scheduling placement: siblings not double-booked for critical services.
Validate security posture: mitigations, kernel cmdline, compliance sign-off.
Rollback plan: BIOS setting or kernel parameter, and automation to enforce the desired state.

Checklist C: If you keep SMT on, enforce sibling hygiene

Map sibling pairs once per platform and keep it in inventory metadata.
Pin critical workloads to physical cores, not arbitrary CPU IDs.
Separate IRQ-heavy CPUs from latency cores.
Limit noisy workloads by cgroups/quotas, and validate you’re not throttling your “important” pods.
Test under contention, not just under average load.

FAQ

1) Is AMD SMT basically the same as Intel Hyper-Threading?

Conceptually yes: two hardware threads share one physical core’s execution resources. Operationally, you still need to
treat siblings as shared capacity. Differences show up in topology, cache/fabric behavior, and platform defaults.

2) Should I disable SMT on AMD EPYC for databases?

If you run latency-sensitive OLTP and p99 matters, test SMT off or strict sibling isolation. If you run throughput-heavy
analytical queries, SMT can help. Don’t decide by hearsay—decide by p99 and counters.

3) Why does CPU utilization look lower with SMT on?

Because utilization is reported per logical CPU. Two siblings at 50% can still mean the physical core is fully contested.
Use run queue, IPC, and latency metrics to interpret “headroom.”

4) Can I keep SMT on but get predictable latency?

Sometimes. The play is isolation: pin critical workloads to physical cores and keep noisy neighbors off sibling threads.
Also keep IRQs away from those cores. This is harder than “SMT off,” but it preserves throughput for other tiers.

5) What’s the fastest way to see SMT sibling pairs on Linux?

Read /sys/devices/system/cpu/cpu*/topology/thread_siblings_list. It’s authoritative for how the kernel sees sibling groups.

6) Does SMT increase security risk?

SMT can increase exposure to certain cross-thread side-channel attacks because two threads share core resources. Many orgs
handle this by disabling SMT for multi-tenant or high-risk workloads, while keeping it for dedicated hosts with mitigations.

7) In Kubernetes, why do “Guaranteed” pods still contend?

Because “Guaranteed” usually means you get dedicated logical CPUs, not necessarily dedicated physical cores.
If another workload lands on the sibling, you still share the core. Use static CPU Manager and topology-aware allocation.

8) How do I decide between “SMT off” and “pinning/isolating”?

If you have the operational maturity to enforce pinning and monitor drift, isolation is often better for mixed fleets.
If you need simple, robust predictability for a dedicated tier, SMT off is a clean policy.

9) Why did enabling SMT increase context switches?

More logical CPUs can increase scheduling opportunities and migration patterns, especially if you also increased worker
counts. Often the fix is fewer threads, better affinity, and avoiding oversubscription.

Conclusion: what to do Monday morning

AMD SMT stopped being an “Intel-ish feature” the moment EPYC became a serious production default. Treat it like a platform
policy, not a BIOS trivia question. SMT can buy you throughput, but it will happily charge you in tail latency if you let
heavy threads pile onto siblings.

Next steps that actually move the needle:

Inventory your fleet: SMT state, cores/sockets, NUMA layout. Make “physical vs logical CPU” explicit in dashboards.
Pick one latency-critical service and run a controlled SMT-on vs SMT-off experiment with identical load.
If SMT stays on, enforce sibling hygiene: pinning, CPU sets, and IRQ affinity that respects physical cores.
Write the policy down: which node classes are SMT-on, which are SMT-off, and what acceptance gates decide changes.

If you take only one idea: SMT is not about winning benchmarks. It’s about choosing where you want contention to happen—and then enforcing that choice.