Resizable BAR / SAM: the small toggle that can boost big

Was this helpful?

You ship a new GPU into a workstation fleet, run benchmarks, and… nothing. Or worse: a handful of machines get faster, a few get weird stutters,
and one decides it only boots on Tuesdays. The ticket title says “graphics performance regression,” but the root cause is a BIOS toggle most people
treat like decorative trim.

Resizable BAR (and AMD’s marketing name for it, Smart Access Memory / SAM) is exactly that kind of toggle: small, easy to misconfigure, and capable
of moving real workloads. It’s also a great way to discover just how many layers exist between “app wants texture” and “bytes arrive at the GPU.”

What Resizable BAR/SAM actually changes

PCI Express devices expose memory-mapped regions called BARs: Base Address Registers. They’re how the CPU (and OS) maps a device’s registers and
sometimes chunks of device memory into the system’s address space. Historically, GPUs exposed only a small “aperture” of their VRAM to the CPU at
any one time—commonly 256 MB—so the CPU could only directly map and access a window into VRAM. If the CPU needed to touch a different part of VRAM,
the window had to be remapped.

Resizable BAR is a PCIe capability that allows the system firmware/OS to negotiate a larger BAR size for the device—potentially mapping much more of
the GPU’s VRAM into the CPU’s address space at once. That can reduce remapping overhead and improve throughput for workloads that involve CPU-driven
uploads, streaming assets, and lots of small transfers. Sometimes it does nothing. Sometimes it matters.

AMD’s Smart Access Memory (SAM) is not a different technology; it’s AMD’s packaging of Resizable BAR + platform validation + a BIOS checkbox that
makes you feel like you unlocked something. On NVIDIA, it’s still Resizable BAR, typically enabled via BIOS plus driver/game profiles.

Here’s the practical mental model: Resizable BAR doesn’t make the GPU faster. It changes how efficiently the CPU can address and feed GPU memory.
If your bottleneck is shader throughput, ray tracing cores, or the GPU already saturating its own memory bandwidth, BAR size won’t save you.
If your bottleneck is “CPU is doing a lot of tiny VRAM interactions while streaming,” BAR size can move the needle.

The less-marketing explanation: window size and remaps

Without Resizable BAR, the CPU can see a small window of VRAM. Imagine a warehouse with a mail slot: you can pass boxes through, but only one at a
time and you keep swapping which shelf the slot points at. Resizable BAR makes the opening bigger, sometimes big enough to see the whole warehouse.
The handling overhead goes down. Whether that turns into FPS depends on whether “handling overhead” was actually your limiting factor.

One quote, because operations people need a spine

Hope is not a strategy. — Ed Catmull (attributed in engineering circles)

Facts & context: why this exists

You don’t get a BIOS toggle like this without a long trail of compromises. Here are concrete facts and historical breadcrumbs worth knowing
before you start flipping switches in production.

  1. BARs predate modern GPUs. BARs are part of the PCI spec era when “device memory” often meant register windows and small buffers, not 24 GB of VRAM.
  2. 256 MB became a de facto GPU aperture. For years, many platforms mapped a 256 MB prefetchable BAR for GPUs, which was “fine” when VRAM sizes and CPU↔GPU traffic patterns were different.
  3. Resizable BAR is a PCIe capability, not a vendor invention. The mechanism exists in PCIe, but widespread consumer enablement lagged behind because firmware, OS, and driver stacks all had to behave.
  4. “Above 4G Decoding” is about address space, not performance. It allows firmware to allocate PCIe MMIO space above the 4 GB boundary, which is essential when large BARs collide with limited 32-bit MMIO windows.
  5. UEFI matters because the boot pipeline allocates resources. Legacy/CSM boot paths and older option ROM expectations often make large, flexible MMIO allocations painful or impossible.
  6. Server boards cared earlier than gamer boards. High-end platforms that routinely map huge MMIO regions (HBAs, NICs with SR-IOV, accelerators) pushed the ecosystem toward saner allocation behavior.
  7. Drivers gate behavior aggressively. NVIDIA historically enabled Resizable BAR per-game/profile because “works on paper” isn’t the same as “works on 500 engines with 500 quirks.”
  8. Virtualization complicates it. Passthrough, IOMMU groups, and firmware allocation inside VMs can prevent large BAR mappings even if the host supports them.
  9. It’s not just gaming. Some content creation and compute pipelines benefit when CPU-side staging and GPU-side memory management interact heavily.

When it helps (and when it won’t)

Where Resizable BAR tends to help

  • Asset streaming heavy workloads: open-world games, large scenes, frequent texture/geometry streaming.
  • CPU-limited render submission with lots of transfers: the CPU spends time coordinating uploads and resource transitions.
  • Some creation apps: large project timelines, high-res textures, frequent cache swaps—depending on the app’s architecture.
  • Benchmarks that mimic real streaming: not just “max shader load,” but “load/evict/load again.”

Where it usually won’t help

  • Pure GPU-bound scenarios (high resolution + heavy shading) where the GPU is already the limiting factor.
  • Workloads dominated by GPU local memory bandwidth rather than CPU↔GPU transactions.
  • Very old engines or apps that don’t stress the CPU’s VRAM mapping behavior.
  • Systems bottlenecked by PCIe lane layout (x8 vs x16, chipset uplinks) or by storage I/O feeding assets.

Dry reality check: it’s a knob, not a miracle

Resizable BAR can produce meaningful uplifts in some titles and workflows, but it’s not a guaranteed “free performance” button. The reason it became
a hype cycle is simple: it’s one of the few platform-level changes that can show measurable gains without buying a new GPU. That doesn’t make it
universally beneficial; it makes it tempting.

Joke #1: Resizable BAR is like giving your GPU a bigger straw—great if you were straw-limited, useless if you were drink-limited.

Hard requirements and compatibility tripwires

In production environments—yes, even “production gaming rigs” in corporate labs—you want deterministic behavior. Resizable BAR has prerequisites.
If any of them aren’t met, you’ll get partial enablement, no enablement, or enablement plus weird side effects.

Platform requirements (the usual suspects)

  • UEFI boot (CSM off in most cases).
  • Above 4G Decoding enabled in BIOS/UEFI.
  • Resizable BAR enabled in BIOS/UEFI.
  • GPU VBIOS support for Resizable BAR.
  • Motherboard firmware that allocates MMIO sanely (this is where “latest BIOS” stops being optional).
  • OS + driver support (Windows and modern Linux kernels generally support it, but drivers may still gate usage).

Tripwires you should actively look for

  • Mixed GPU environments: iGPU enabled + dGPU + additional PCIe devices can push MMIO allocation into edge cases.
  • Many PCIe devices: NICs, HBAs, NVMe add-in cards, capture cards—MMIO space gets crowded.
  • Virtualization/passthrough: the guest may not be able to map large BARs, or the hypervisor might clamp them.
  • Old option ROMs: legacy ROM expectations can conflict with big BAR mappings.
  • Suspicious “enabled” UI: some firmwares show the toggle but fail to actually allocate a larger BAR.

Fast diagnosis playbook (first/second/third)

If you’re debugging a performance complaint and someone mentions Resizable BAR, you need to answer three questions quickly:
(1) is it actually enabled, (2) is the workload actually sensitive, and (3) is something else the bottleneck.

First: verify it’s enabled end-to-end

  • Check BIOS settings: Above 4G Decoding, Resizable BAR, CSM/Legacy boot.
  • In the OS, confirm the GPU BAR size is larger than the legacy window (often > 256M).
  • Confirm driver recognizes it (NVIDIA Control Panel on Windows; sysfs/lspci on Linux).

Second: confirm you’re measuring the right bottleneck

  • GPU utilization, VRAM utilization, CPU utilization per core.
  • Frame time consistency (stutter vs average FPS).
  • PCIe link width/speed (x16 Gen4 vs x8 Gen3 can dominate outcomes).

Third: look for resource allocation conflicts

  • dmesg/Windows event logs for PCI resource allocation warnings.
  • IOMMU/ACS quirks if passthrough is involved.
  • Other devices consuming MMIO space (multiple NVMe AICs, SR-IOV NICs).

If you can’t prove it’s enabled, stop arguing about benchmarks. If you can prove it’s enabled but nothing changes, assume you’re not BAR-limited.
Move on.

Practical tasks with commands, outputs, and decisions

These are the tasks I actually run when someone says “Resizable BAR is on” or “SAM broke my box.” Each task includes a realistic command,
sample output, what the output means, and the decision it drives. Commands are Linux-focused because Linux exposes the truth without UI drama.
You can adapt the logic to Windows tooling if you live there.

Task 1: Identify the GPU PCI address

cr0x@server:~$ lspci -nn | grep -E "VGA|3D"
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204] (rev a1)

Meaning: The GPU is at 01:00.0. You’ll use this address for all deeper PCIe inspection.
Decision: Lock onto the correct device; don’t guess when multiple GPUs exist.

Task 2: Check BAR sizes and whether they look “large”

cr0x@server:~$ sudo lspci -s 01:00.0 -vv | grep -E "Region 0|Region 2|prefetchable"
Region 0: Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
Region 2: Memory at 4000000000 (64-bit, prefetchable) [size=16G]

Meaning: A 16G prefetchable BAR is a strong indicator Resizable BAR is active and mapping a large VRAM aperture.
Legacy behavior is often ~256M. Your exact region numbers can vary.
Decision: If you still see only a small prefetchable region, go back to firmware/boot mode prerequisites.

Task 3: Confirm the Resizable BAR capability exists

cr0x@server:~$ sudo lspci -s 01:00.0 -vv | grep -n "Resizable BAR" -A6
214:	Capabilities: [bb0] Resizable BAR
215:		Resizable BAR: BAR 2: current size: 16GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB

Meaning: The device advertises Resizable BAR capability and negotiated a 16GB mapping.
Decision: If the capability is absent, you likely need a GPU VBIOS update or the GPU simply doesn’t support it.

Task 4: Check boot mode (UEFI vs legacy) quickly

cr0x@server:~$ test -d /sys/firmware/efi && echo UEFI || echo Legacy
UEFI

Meaning: You’re booted via UEFI.
Decision: If you get “Legacy,” disable CSM/Legacy boot and reinstall or convert boot, otherwise big BAR enablement often fails.

Task 5: Confirm kernel saw “Above 4G” style resource allocation without errors

cr0x@server:~$ dmesg -T | grep -iE "pci.*resource|BAR|mmio" | tail -n 8
[Mon Jan 20 09:41:05 2026] pci 0000:01:00.0: BAR 2: assigned [mem 0x4000000000-0x43ffffffff 64bit pref]
[Mon Jan 20 09:41:05 2026] pci 0000:00:01.0: PCI bridge to [bus 01]
[Mon Jan 20 09:41:05 2026] pci_bus 0000:01: resource 0 [mem 0x4000000000-0x43ffffffff 64bit pref]

Meaning: The kernel successfully assigned a large 64-bit prefetchable window.
Decision: If you see “not enough MMIO resources” or BAR assignment failures, expect instability or BAR fallback.

Task 6: Validate PCIe link width and speed (the silent killer)

cr0x@server:~$ sudo lspci -s 01:00.0 -vv | grep -E "LnkCap:|LnkSta:"
LnkCap:	Port #0, Speed 16GT/s, Width x16, ASPM L0s L1
LnkSta:	Speed 16GT/s (ok), Width x16 (ok)

Meaning: You’re running at Gen4 x16 equivalent (16GT/s) and full width. Good.
Decision: If this shows x8 or Gen3, fix seating, slot choice, BIOS lane bifurcation, risers, or chipset routing before blaming BAR.

Task 7: Check IOMMU state (relevant for passthrough and some platforms)

cr0x@server:~$ dmesg -T | grep -iE "IOMMU|DMAR|AMD-Vi" | head
[Mon Jan 20 09:41:03 2026] AMD-Vi: IOMMU performance counters supported
[Mon Jan 20 09:41:03 2026] AMD-Vi: IOMMU enabled

Meaning: IOMMU is enabled. Good for isolation, sometimes bad for latency-sensitive paths if misconfigured.
Decision: If you’re doing GPU passthrough and large BAR isn’t sticking, you may need hypervisor settings that allow large MMIO.

Task 8: Inspect sysfs resource mapping for the GPU

cr0x@server:~$ sudo cat /sys/bus/pci/devices/0000:01:00.0/resource
0x00000000f6000000 0x00000000f6ffffff 0x0000000000040200
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000004000000000 0x00000043ffffffff 0x000000000004220c

Meaning: The third line shows a huge mapped region (0x4000… to 0x43ff…), consistent with a large BAR.
Decision: If the prefetchable region is small, or missing, your enablement isn’t real.

Task 9: Confirm the driver loaded and which GPU is active

cr0x@server:~$ lsmod | grep -E "^nvidia|^amdgpu" | head
nvidia_drm             86016  3
nvidia_modeset       1241088  7 nvidia_drm
nvidia              62738432  340 nvidia_modeset

Meaning: NVIDIA kernel modules are loaded.
Decision: If you’re on the wrong driver (nouveau vs nvidia, or a mismatched amdgpu stack), BAR behavior and performance will be unpredictable.

Task 10: Check VRAM size and basic runtime stats (sanity)

cr0x@server:~$ nvidia-smi --query-gpu=name,pci.bus_id,memory.total,pcie.link.gen.current,pcie.link.width.current --format=csv
name, pci.bus_id, memory.total [MiB], pcie.link.gen.current, pcie.link.width.current
NVIDIA GeForce RTX 3090, 00000000:01:00.0, 24576 MiB, 4, 16

Meaning: Confirms you’re querying the right GPU and that PCIe link is healthy.
Decision: If the link gen/width is low under load, check power management or BIOS “PCIe speed” settings.

Task 11: Measure CPU vs GPU bound behavior during a run (quick and dirty)

cr0x@server:~$ sudo apt-get -y install sysstat >/dev/null
cr0x@server:~$ mpstat -P ALL 1 5
Linux 6.5.0 (server) 	01/21/2026 	_x86_64_	(32 CPU)

09:52:10 AM  CPU   %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
09:52:11 AM  all   32.1  0.0  6.2   0.4    0.0  0.7    0.0    0.0    0.0   60.6
09:52:11 AM   7    97.0  0.0  2.0   0.0    0.0  0.0    0.0    0.0    0.0    1.0

Meaning: One core is pegged. That smells like a CPU submission bottleneck or a single-threaded portion of the workload.
Decision: If one or two cores are saturated while GPU utilization is low, Resizable BAR might help a bit—but you should also chase CPU-side causes.

Task 12: Verify storage isn’t the actual “streaming” bottleneck

cr0x@server:~$ iostat -xm 1 3
Linux 6.5.0 (server) 	01/21/2026 	_x86_64_	(32 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          28.10    0.00    5.92    6.33    0.00   59.65

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s w_await  aqu-sz  %util
nvme0n1         820.0  94200.0     0.0    0.00    7.10   114.88    12.0    980.0   2.10    5.90   94.0

Meaning: The NVMe device is at ~94% utilization with non-trivial await. Asset streaming could be storage-limited.
Decision: If storage is saturated, BAR changes won’t fix hitching; fix I/O (faster storage, better caching, less contention) first.

Task 13: Check hugepages/THP and basic memory pressure (stutter can be memory)

cr0x@server:~$ grep -E "MemTotal|MemAvailable|SwapTotal|SwapFree" /proc/meminfo
MemTotal:       131840512 kB
MemAvailable:    18422528 kB
SwapTotal:      16777212 kB
SwapFree:        1024000 kB

Meaning: You’re low on available memory and swap is mostly used. That can cause nasty frame time spikes unrelated to BAR.
Decision: If memory pressure exists, fix that before you attribute improvements/regressions to BAR.

Task 14: Enumerate other PCIe devices competing for MMIO space

cr0x@server:~$ lspci | grep -E "Ethernet|Non-Volatile memory|SATA|RAID|Fibre|Infiniband"
03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
04:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
05:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)

Meaning: Multiple devices may require MMIO windows; with large BAR, allocation can get tight.
Decision: If BAR enablement fails on “fully loaded” systems but works on minimal ones, consider reducing devices, changing slots, or updating firmware.

Task 15: Check if the kernel clamped BAR sizes (common with quirks)

cr0x@server:~$ dmesg -T | grep -iE "Resizable BAR|resizable|clamp|re-size|rebar" | tail -n 20
[Mon Jan 20 09:41:05 2026] pci 0000:01:00.0: enabling Extended Tags
[Mon Jan 20 09:41:05 2026] pci 0000:01:00.0: BAR 2: resized to 16GB

Meaning: The kernel explicitly resized the BAR.
Decision: If you see “failed to resize” lines, you’re not getting the feature even if BIOS says “Enabled.”

Task 16: (Virtualization) Check QEMU/KVM for large BAR readiness on the host

cr0x@server:~$ sudo dmesg -T | grep -i vfio | tail -n 8
[Mon Jan 20 10:03:12 2026] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
[Mon Jan 20 10:03:12 2026] vfio-pci 0000:01:00.0: BAR 2: can't reserve [mem 0x4000000000-0x43ffffffff 64bit pref]

Meaning: VFIO can’t reserve that large BAR window for passthrough as currently configured.
Decision: You need to adjust hypervisor/VM firmware settings (e.g., allow large MMIO, OVMF, guest address space) or accept no large BAR in the guest.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A media team rolled out new workstations for editors. Same GPU model as the pilot. Same driver package. Same OS image. The only difference was the
motherboard revision—quietly swapped by procurement because “equivalent.”

The pilot machines showed nice improvements in timeline scrubbing and preview consistency after enabling Resizable BAR. So the rollout playbook said:
“Enable ReBAR + Above 4G, validate with a benchmark, ship it.” That playbook assumed “Enabled in BIOS” meant “Enabled in reality.”

Within a week, support tickets piled up: intermittent black screens, random application crashes during export, and a couple of machines that started
hard-freezing under load. The benchmark used for validation didn’t trigger the issue; the actual editing workload did.

The root cause wasn’t mystical. On the “equivalent” boards, the firmware allocated PCIe resources differently when multiple NVMe drives and a 10GbE NIC
were installed. With large BAR enabled, the platform hit a resource edge case. The OS logs showed BAR reassignment attempts and occasional failures.
Some boots would fall back silently; other boots left the GPU driver in a bad mood.

Fix was boring: firmware update, a tighter slot population guide, and an explicit OS-side verification step (BAR size in lspci) as a gate.
The wrong assumption was thinking a UI toggle is a contract. It’s a suggestion.

Mini-story 2: The optimization that backfired

A small ML team ran GPU-accelerated inference on a shared workstation pool—because buying dedicated servers was “next quarter,” which is corporate
for “not happening soon.” They noticed some workloads had lower latency variance on one machine and asked what was different.

Someone discovered Resizable BAR was enabled on that box and decided to enable it everywhere. No change control. No staged rollout. Just a Friday
afternoon “performance improvement.” The kind of decision that creates weekend plans for other people.

Monday morning, a subset of nodes started throwing GPU resets under high concurrency. Not all. Only those with a particular capture card model
installed for a separate project. The resets looked like driver instability, so the first reaction was to roll drivers. That didn’t help.

Eventually, the pattern emerged: the systems with the capture card had tighter MMIO layout, and enabling large BAR pushed the platform into a
configuration that was technically legal but practically fragile. Under load, error recovery paths were messy. Disabling ReBAR stabilized everything.

The backfire wasn’t that Resizable BAR is “bad.” It was that the team optimized one dimension (possible throughput) and ignored platform resource
constraints and heterogeneity. A toggle applied fleet-wide without topology awareness isn’t optimization; it’s improv.

Mini-story 3: The boring but correct practice that saved the day

A product group ran a lab of mixed Windows and Linux machines for game engine testing. They had a policy: any BIOS-level performance toggle requires
a recorded before/after run on a representative workload, plus a machine-readable verification artifact stored with the test report.

It sounded bureaucratic until it wasn’t. A new BIOS release landed, and a batch of machines got updated. The lab’s nightly performance pipeline
flagged a small regression in frame time consistency on a subset of systems. Average FPS was fine. The “feel” was not.

Because the lab stored verification artifacts, it was trivial to see what changed: the BAR size dropped back to a small window after the BIOS update,
even though the BIOS UI still showed Resizable BAR enabled. The UEFI update had also flipped CSM behavior on some profiles.

The fix was fast: enforce UEFI-only boot, re-apply the correct firmware settings, and block that BIOS build until validated. No guesswork, no
“maybe it’s the driver.” Just evidence, then action.

The boring practice—treating firmware toggles as configuration drift risks—saved days of argument and kept the lab credible.

Common mistakes: symptoms → root cause → fix

1) “BIOS says enabled, but performance didn’t change”

  • Symptoms: No uplift; OS tools still show ~256MB-ish BAR; benchmarks unchanged.
  • Root cause: CSM/Legacy boot, missing Above 4G Decoding, or firmware failing to allocate large MMIO so BAR never negotiated.
  • Fix: Boot UEFI, disable CSM, enable Above 4G Decoding + ReBAR, update BIOS, then confirm with lspci -vv BAR size.

2) “Enabled ReBAR and now random black screens / driver resets”

  • Symptoms: Intermittent display drops, GPU reset logs, instability under load.
  • Root cause: MMIO/resource allocation edge case with other PCIe devices; sometimes aggravated by risers/bifurcation settings.
  • Fix: Update BIOS, simplify PCIe topology, move cards to different slots, test with minimal devices; if still unstable, disable ReBAR.

3) “It works until we add another NVMe card”

  • Symptoms: ReBAR stops negotiating or system fails POST after adding PCIe devices.
  • Root cause: MMIO space exhaustion or poor firmware allocation policy.
  • Fix: Ensure Above 4G Decoding, update firmware, reduce add-in cards, or choose boards with better PCIe resource handling.

4) “Stutter got worse even though average FPS improved”

  • Symptoms: Higher average FPS, but hitching spikes increased.
  • Root cause: You moved the bottleneck: now storage I/O, memory pressure, shader compilation, or CPU scheduling dominates frame times.
  • Fix: Measure storage utilization (iostat), RAM pressure (/proc/meminfo), CPU per-core saturation (mpstat), and address those.

5) “Passthrough VM doesn’t see ReBAR”

  • Symptoms: Host shows large BAR, guest shows small BAR or fails to start VM with resource errors.
  • Root cause: Guest firmware/VM config doesn’t allow large MMIO; VFIO can’t reserve big BAR; IOMMU constraints.
  • Fix: Use UEFI (OVMF) for the guest, configure large MMIO windows, ensure host can reserve resources; otherwise accept it won’t work in that topology.

6) “We enabled it fleet-wide and some machines lost boot display”

  • Symptoms: No video output at boot, or stuck at vendor splash.
  • Root cause: Firmware/option ROM compatibility issues, especially with older GPUs or mixed legacy settings.
  • Fix: Revert via BIOS reset, update GPU VBIOS, enforce UEFI-only, and roll out in canaries.

Joke #2: If your change plan is “flip the BIOS switch and vibe,” congratulations—you’ve invented Chaos Engineering for people who hate dashboards.

Checklists / step-by-step plan

Change plan for a single workstation (safe and fast)

  1. Capture baseline. Record driver version, BIOS version, and one representative benchmark or workload trace (average + 1% lows/frame times).
  2. Update firmware first. If you’re not on a reasonably current BIOS, don’t bother. Too many allocation bugs live there.
  3. Switch to UEFI-only. Disable CSM/Legacy boot. Verify Linux shows /sys/firmware/efi.
  4. Enable Above 4G Decoding. This is the “make room” setting.
  5. Enable Resizable BAR. Save, reboot.
  6. Verify in OS. Use lspci -vv and confirm large prefetchable BAR and Resizable BAR capability negotiated size.
  7. Re-run the same workload. Compare average throughput and tail latency/frame-time consistency.
  8. Decide: keep or revert. Keep if it improves the metric you care about without adding instability. Revert if it adds flakiness or doesn’t help.

Canary rollout plan for a fleet (what SREs actually do)

  1. Segment the fleet by hardware topology. Same motherboard revision matters. Same GPU VBIOS matters. Same add-in card population matters.
  2. Pick canaries per segment. Not one hero box. At least a few per topology.
  3. Define success metrics. Not “feels faster.” Pick measurable: frame time p95, compile time, export duration, crash rate.
  4. Automate verification. Collect lspci -vv snippets or sysfs resource lines as artifacts.
  5. Roll forward with a rollback plan. Document how to revert BIOS settings remotely or via hands-on procedure.
  6. Watch for correlated failures. Especially with additional PCIe devices and after firmware updates.

Decision rule of thumb

  • Enable if you can verify large BAR in OS and your workload is streaming/CPU-transfer sensitive.
  • Don’t bother if you’re GPU-bound and stable; you’re likely chasing noise.
  • Disable if you see instability or if enabling it breaks a known-good PCIe topology.

FAQ

1) Is SAM different from Resizable BAR?

Not fundamentally. SAM is AMD’s branded implementation of Resizable BAR plus platform validation and defaults. The underlying mechanism is PCIe Resizable BAR.

2) Do I need Above 4G Decoding?

In most real systems, yes. Large BAR mappings consume MMIO space, and Above 4G Decoding lets firmware allocate that space above 4 GB where there’s room.

3) Why does my BIOS show “enabled” but Linux still shows a small BAR?

Common causes: you’re booting legacy/CSM, the firmware failed allocation due to other devices, or the GPU VBIOS/firmware combination didn’t negotiate it.
Trust lspci -vv over the checkbox.

4) Can Resizable BAR reduce stutter?

It can, in workloads where CPU↔GPU transfers and remapping overhead contribute to frame time spikes. But stutter is often storage, memory pressure,
shader compilation, or CPU scheduling. Measure before crediting BAR.

5) Does it help compute/ML workloads?

Sometimes, especially if the workflow frequently stages data from CPU memory to GPU memory in patterns that benefit from larger mappings.
Many ML pipelines are dominated by GPU compute and GPU memory bandwidth, where BAR size changes little.

6) Is it safe to enable in a workstation fleet?

Safe if you treat it like any firmware change: stage it, verify OS-level negotiation, and watch crash/black-screen rates. Unsafe if you flip it across
heterogeneous hardware and call it “the same.”

7) Why do some vendors enable it only for certain games?

Because the ecosystem is messy. Some engines and driver paths benefit; some can regress; some may trip corner cases. Profile gating is a pragmatic
risk-control tactic.

8) Can I use it with GPU passthrough to a VM?

It depends. The host may support it, but the guest needs enough MMIO space and a UEFI firmware setup that can map the large BAR. Many passthrough
setups require explicit “large MMIO” configuration.

9) What’s the single best proof it’s working on Linux?

A large prefetchable BAR size in lspci -vv (often multiple GB) plus the “Resizable BAR” capability showing a non-trivial current size.

10) If it doesn’t help, should I turn it off?

If you see no benefit and you value maximum predictability, yes—disable it and simplify. If it’s stable and you manage a mixed workload environment,
leaving it on is reasonable as long as you can verify it stays enabled across firmware updates.

Conclusion: practical next steps

Resizable BAR/SAM is one of those rare platform switches that can legitimately improve real workloads—when the workload is sensitive and the platform
allocates resources cleanly. It’s also a great way to expose shaky firmware, crowded PCIe topologies, and benchmarking habits that ignore tail latency.

  1. Verify, don’t assume. Confirm BAR size in the OS. If it’s not large there, it’s not enabled.
  2. Measure the right thing. Use frame times or p95/p99 latency, not just average throughput.
  3. Fix the obvious bottlenecks first. PCIe link width/speed, storage saturation, and memory pressure frequently dominate “BAR debates.”
  4. Roll out like an adult. Canary by hardware topology, track firmware drift, and keep a rollback plan.

If you want a one-line policy: enable Resizable BAR where you can prove it’s negotiated and it improves the metric your users actually feel; otherwise,
leave it off and enjoy your quieter incident queue.

← Previous
Docker Compose “version is obsolete”: modernize your compose file safely
Next →
ZFS Resilver vs Scrub: Stop Mixing Them Up

Leave a comment