PSU Sizing for Servers — Stop Guessing, Start Measuring

Was this helpful?

The fastest way to embarrass yourself in a data center is to “know” a server is a 500W box because the spec sheet said so—right up until a firmware update
spins fans to jet-engine mode, the breaker clicks, and your “small change” turns into an outage ticket with your name on it.

Power is a production dependency. Treat it like one. If you can graph latency, you can graph watts. And if you can measure watts, you can stop buying PSUs
like you’re picking winter tires by vibe.

Why PSU sizing is an SRE problem, not a shopping problem

PSU sizing is often treated as a procurement checkbox: pick a wattage, tick “redundant,” ship it. In production systems, that approach fails for the same
reason “we’ll just add more nodes” fails: it ignores the hard limits that trip first.

The PSU is where your workload becomes physics. Workload spikes become current spikes. Firmware behavior becomes fan power. A new NIC becomes a few more watts
that you never budgeted. And the most humiliating part: when power goes sideways, the symptoms don’t always scream “power.”
You get flaky disks, surprise reboots, NIC resets, corrupted BMC sensors, or “random” kernel panics. Power problems cosplay as software problems.

You want three outcomes:

  • No outages caused by breaker trips, PSU overload, or brownouts.
  • No waste from massively oversized PSUs running at inefficient low load.
  • Fast recovery when a PSU fails, a feed drops, or a PDU lies to you.

If your current PSU sizing method is “add up TDPs and round up,” you’re running production on hope. Hope is not a power source.

Interesting facts and a little history (so you stop repeating old mistakes)

  1. Early PC power supplies focused on 5V-heavy loads. Modern servers are largely 12V-centric, and DC-DC conversion moved onto the board.
  2. ATX12V (early 2000s) pushed more power to 12V to feed CPUs, changing how “rails” and current limits mattered in real builds.
  3. 80 PLUS (mid-2000s) made efficiency a marketing and procurement line item, but its test points don’t cover your spiky workloads.
  4. Data centers shifted from “one big UPS” thinking to distributed UPS and intelligent PDUs, making measurement easier—if you actually use it.
  5. Redundant PSUs became standard not because they’re sexy, but because hot-swapping a PSU beats a 2am maintenance window.
  6. Modern CPUs and GPUs introduced aggressive boost behavior; instantaneous power can exceed “TDP” in ways procurement decks rarely mention.
  7. Fan curves changed the game: high static-pressure fans can draw meaningful power at full tilt, and firmware updates can alter that behavior overnight.
  8. Rack density exploded as virtualization and GPUs arrived; power and cooling became the first constraints, not rack units.
  9. Breaker coordination in facilities evolved, but breakers still act faster than your alerting sometimes—especially on inrush.

A sane mental model: average, peak, and the ugly seconds in between

PSU sizing mistakes come from using one number when you need at least three:

  • Steady-state average: what the server draws most of the time.
  • Sustained peak: what it draws during real work, not synthetic “TDP” math.
  • Transient/inrush: what it draws during boot, simultaneous drive spin-up, fan ramp, or GPU boost spikes.

Add a fourth if you do redundancy:
single-PSU mode—because in a 1+1 setup, you still need to survive on one PSU without collapsing.

What “PSU wattage” actually means

A “1200W” PSU is typically rated for a certain input voltage, temperature, airflow, and sometimes altitude. It also implies a maximum DC output under those
conditions. It does not mean your system can safely draw 1200W continuously in every rack, at any temperature, on any feed, while the dust bunnies
run an insulative hedge fund inside your chassis.

One quote you should keep on your wall

“Hope is not a strategy.” — Gen. Gordon R. Sullivan

Not an SRE, but the sentiment is painfully relevant. In power planning, “it probably won’t peak” is hope, dressed as engineering.

Joke #1: A server PSU is like a parachute: if you only discover it’s undersized when you need it, you’re about to have a bad day.

What to measure (and what spec sheets won’t tell you)

Spec sheets are written to sell hardware, not to keep your cluster alive. They’re still useful—but only as boundary conditions. Your job is to
measure reality in your environment, with your firmware, and your workload mix.

Measure at multiple layers

  • Wall / PDU input power: what you pay for and what trips breakers.
  • PSU output (rarely directly visible): what the system consumes in DC terms.
  • Component-level hints: CPU package power, GPU power, drive activity, fan RPM and PWM.

Know the limits that matter

  • Branch circuit (breaker rating; continuous load derating practices vary by region and code).
  • PDU and plug limits (C13/C14 vs C19/C20, cord gauge, per-outlet caps).
  • PSU per-unit rating at your input voltage (common gotcha: 120V vs 208/230V performance and available headroom).
  • Redundancy mode (load sharing vs active/standby; and whether the platform can survive one PSU at full load).

Don’t confuse these terms

  • TDP: a thermal design point, not a contractual power ceiling.
  • PL1/PL2 (and vendor equivalents): sustained vs boost power policy; firmware can change them.
  • Apparent power (VA) vs real power (W): UPS and PDUs can report either; power factor matters when you’re close to limits.

Practical measurement tasks (commands, outputs, decisions)

You can’t “architect” your way around measurement. Below are concrete, runnable tasks. Each has: a command, what the output means, and the decision it supports.
Use them to build a power profile per server model and per workload class.

Task 1: Read BMC-reported instantaneous power (IPMI)

cr0x@server:~$ ipmitool sensor | egrep -i 'Power|Pwr Consumption|Watts'
Pwr Consumption   | 312        | Watts      | ok

Meaning: The BMC thinks the system is drawing ~312W right now (often input power, sometimes computed).
Decision: If this is far from your expectation, validate the BMC source against PDU metering before trusting it for capacity planning.

Task 2: Pull power history / min-max if the platform exposes it

cr0x@server:~$ ipmitool sdr elist | egrep -i 'Pwr|Power'
System Level      | 00h | ok  |  3.1 | Power Meter
System Level      | 01h | ok  |  3.2 | Power Max
System Level      | 02h | ok  |  3.3 | Power Min

Meaning: Some vendors expose max/min since boot or reset.
Decision: If max is close to PSU or circuit limits, don’t “average” it away. Plan for it, or cap it.

Task 3: Measure CPU package power limits and current draw (Intel RAPL via powercap)

cr0x@server:~$ sudo cat /sys/class/powercap/intel-rapl:0/constraint_0_power_limit_uw
225000000

Meaning: 225,000,000 µW = 225W long-term package limit (PL1-ish) for that RAPL domain.
Decision: If your PSU sizing assumed “CPU is 165W,” but firmware allows 225W sustained, update your budget or enforce a cap.

Task 4: Sample RAPL energy to estimate average CPU power over an interval

cr0x@server:~$ E1=$(cat /sys/class/powercap/intel-rapl:0/energy_uj); sleep 10; E2=$(cat /sys/class/powercap/intel-rapl:0/energy_uj); echo $(( (E2-E1)/10000000 ))
186

Meaning: Roughly 186W average for that package over 10 seconds (energy in µJ divided by 10s).
Decision: Identify workloads with sustained high CPU power; they’re the ones that make “peak” not a rare event.

Task 5: Check GPU power draw and limits (NVIDIA)

cr0x@server:~$ nvidia-smi --query-gpu=name,power.draw,power.limit,clocks.sm --format=csv,noheader
NVIDIA A10, 126.54 W, 150.00 W, 1395 MHz

Meaning: Real-time GPU power is 126W, with a 150W limit.
Decision: For GPU servers, PSU sizing without real GPU power telemetry is cosplay. If multiple GPUs can hit limit together, budget that peak.

Task 6: Verify current CPU frequency and throttle status (quick sanity check)

cr0x@server:~$ lscpu | egrep -i 'Model name|CPU max MHz|CPU MHz'
Model name:          Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz
CPU max MHz:         3200.0000
CPU MHz:             2001.102

Meaning: Current frequency isn’t boosted; at load it may spike and increase power.
Decision: If you see persistent low frequency under load, you might be power-limited or thermally limited—both tie back to PSU and cooling.

Task 7: Check for power-related events in kernel logs

cr0x@server:~$ sudo journalctl -k -b | egrep -i 'power|brown|thrott|vrm|PSU|over current|watchdog' | tail -n 20
kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 27: b200000000070005
kernel: EDAC MC0: CPU power throttling detected

Meaning: Hardware/firmware reported throttling or errors that can be power delivery related (not always, but worth correlating).
Decision: If these correlate with spikes or reboots, stop debugging “random instability” and inspect power feeds, PSUs, and thermals.

Task 8: Check PSU status and redundancy mode via IPMI (if supported)

cr0x@server:~$ ipmitool sdr type 'Power Supply'
PS1 Status       | ok
PS2 Status       | ok
PS1 Input Power  | 165 Watts
PS2 Input Power  | 162 Watts

Meaning: Both PSUs are active and sharing load roughly equally.
Decision: If you expected 1+1 with one idle and one active, you may not be in the redundancy mode you think you are. Update your failure model.

Task 9: Measure wall power via a metered rack PDU (SNMP example)

cr0x@server:~$ snmpget -v2c -c public pdu01 1.3.6.1.4.1.318.1.1.26.6.3.1.7.1
SNMPv2-SMI::enterprises.318.1.1.26.6.3.1.7.1 = INTEGER: 356

Meaning: Vendor OID returns outlet power in watts (example: 356W). MIB semantics vary; confirm units once, then automate.
Decision: Use PDU readings as the ground truth for circuit planning and breaker risk. BMC readings are “nice-to-have,” not your accountant.

Task 10: Check PDU outlet current to spot near-trip risk

cr0x@server:~$ snmpget -v2c -c public pdu01 1.3.6.1.4.1.318.1.1.26.6.3.1.5.1
SNMPv2-SMI::enterprises.318.1.1.26.6.3.1.5.1 = INTEGER: 28

Meaning: Outlet current (example: 2.8A if device uses deci-amps; again vendor-specific).
Decision: Convert to amps, compare against circuit and PDU per-outlet limits. If you’re close, don’t add “just one more node.”

Task 11: Correlate workload with power using a controlled load test (CPU)

cr0x@server:~$ sudo apt-get -y install stress-ng
...
cr0x@server:~$ stress-ng --cpu 32 --cpu-method matrixprod --timeout 60s --metrics-brief
stress-ng: info:  [8421] dispatching hogs: 32 cpu
stress-ng: metrc: [8421] cpu                60.00s  19184.12 bogo ops/s

Meaning: You’ve created a repeatable CPU load. Pair this with simultaneous PDU/BMC readings.
Decision: Build a per-platform “CPU-only peak” baseline. Then do the same with disk and network to see combined peaks.

Task 12: Disk activity and drive count matter—measure it (storage load)

cr0x@server:~$ iostat -dx 2 3
Device            r/s     w/s   rMB/s   wMB/s avgrq-sz avgqu-sz await  svctm  %util
nvme0n1         820.0   110.0   128.0    24.0     358     2.10   2.1   0.6   55.0

Meaning: Storage is doing real work. Drives and controllers draw more under load; HDDs also have spin-up spikes.
Decision: If your peak power coincides with rebuilds/resyncs, budget for “failure mode power,” not just happy-path power.

Task 13: Check for RAID/HBA controller battery/flash module charging events

cr0x@server:~$ sudo dmesg | egrep -i 'battery|cachevault|supercap|charging' | tail -n 20
megaraid_sas 0000:3b:00.0: CacheVault charging started

Meaning: Cache protection modules can draw extra power while charging after maintenance or long downtime.
Decision: If you’ve had cold starts or maintenance, expect a temporary power bump. Don’t treat it as “mystery watts.”

Task 14: Check fan RPM and PWM—fans are not free

cr0x@server:~$ sudo ipmitool sdr | egrep -i 'FAN|RPM' | head
FAN1            | 7800   | RPM  | ok
FAN2            | 8100   | RPM  | ok

Meaning: High RPM implies higher fan power draw and often indicates thermal stress or a firmware profile shift.
Decision: If you see sustained high fan speeds, investigate airflow and inlet temps; your power budget and PSU thermals are now worse.

Task 15: Validate PSU input voltage (because 120V vs 208V matters)

cr0x@server:~$ ipmitool sensor | egrep -i 'Inlet|VIN|AC|Voltage' | head
PS1 Inlet Volt   | 208        | Volts     | ok
PS2 Inlet Volt   | 208        | Volts     | ok

Meaning: The PSUs see 208V, which generally improves efficiency and reduces current for the same power.
Decision: If you’re on 120V and pushing density, consider moving to higher voltage feeds where feasible. It’s often the simplest capacity win.

Task 16: Quick-and-dirty inrush observation with PDU peak logging (if supported)

cr0x@server:~$ snmpget -v2c -c public pdu01 1.3.6.1.4.1.318.1.1.26.4.3.1.6.1
SNMPv2-SMI::enterprises.318.1.1.26.4.3.1.6.1 = INTEGER: 47

Meaning: A “peak current since last reset” style counter (example). Exact OID depends on vendor and model.
Decision: If peak current is far above steady state, stagger boots and avoid synchronized power-on after outages.

Redundant PSUs: 1+1 is not always 1+1

Redundant PSUs are sold as reliability. In practice, they’re reliability only if you size and feed them correctly.
Two PSUs do not guarantee you can run at full power after one fails. That depends on:

  • Per-PSU capacity vs system peak.
  • Load sharing behavior (active/active sharing or active/standby).
  • Power capping behavior when one PSU is gone (some systems automatically throttle; others just fall over).
  • Feed independence (two PSUs on the same PDU is not redundancy; it’s optimism with extra steps).

N+1 sizing in plain terms

If you have two 800W PSUs in a 1+1 configuration, the relevant question is:
Can the server run at peak on a single 800W PSU?
If your real measured peak is 780W at the wall and your PSU at high inlet temp derates, you are not “safe.” You are balanced on a thin technicality.

Load balancing is not guaranteed

If two PSUs share poorly (firmware, mismatched PSUs, aging, or cabling), one PSU can run hotter and closer to limit. When it fails, the other takes a
step-load that can cause a second failure or a reboot. That’s the “redundant PSU double-tap,” and it’s as fun as it sounds.

Inrush current: the breaker doesn’t care about your spreadsheets

Inrush is the surge when you apply power—charging capacitors, starting fans, spinning disks, waking GPUs, and letting every regulator decide it’s time to
party. Your steady-state power budget can look fine while inrush trips the breaker during a mass reboot.

Facilities folk care about this because it’s the difference between “power restored” and “half the row stayed dark.” You should care because after a data
center event, everyone will be power-cycling at once, and your orchestrator might helpfully do the same thing at the same time.

Joke #2: Breakers are like on-call engineers: they tolerate a lot, but they remember exactly one last straw.

How to reduce inrush risk

  • Stagger boot (PDUs, out-of-band automation, or orchestration hooks).
  • Avoid synchronized fan ramp by updating firmware in controlled waves, not “entire fleet Friday.”
  • Know your HDD behavior: staggered spin-up settings on HBAs can save your breaker and your pride.
  • Measure peak current where possible: some metered PDUs and UPSes log peaks and inrush events.

Derating: temperature, altitude, dust, and why “1200W” is sometimes fantasy

PSU ratings assume specific conditions. Then you install the server in a rack with partial blanking, a hot aisle that is more “warm suggestion,” and dust
that turns heatsinks into felt.

Derating shows up as:

  • Lower available output at higher inlet temperature.
  • Higher fan power (which increases system draw and reduces efficiency).
  • Earlier thermal shutdown or protective throttling.

Temperature is a multiplier on risk

A server that’s “fine” at 18°C inlet can become fragile at 30°C inlet when one PSU fails and the remaining PSU runs hotter and louder.
That’s the worst time to learn that your redundancy assumption was based on a lab brochure.

Altitude is real (yes, really)

High altitude reduces air density, reducing cooling effectiveness. Many vendors specify derating above certain elevations. You can ignore that if you like,
but the PSU will not be persuaded by your confidence.

Efficiency curves: the watts you don’t draw still cost you

Efficiency is not a constant. It’s a curve. Most PSUs are happiest somewhere around 40–60% load, depending on design. At very low load, efficiency drops,
and you waste power as heat. At very high load, efficiency can also drop, and thermals get ugly.

Oversizing by default feels “safe,” but it can waste money in three ways:

  • Capex: bigger PSU models cost more.
  • Opex: lower efficiency at idle across a fleet is not cute on the power bill.
  • Cooling: wasted watts become heat your facility must remove.

What to do instead of blind oversizing

Measure your real peak and choose a PSU such that:

  • Your steady state sits in the efficient part of the curve.
  • Your sustained peak stays below a conservative threshold (especially under N+1 failure mode).
  • Your inrush does not trip branch circuits during fleet events.

This is less glamorous than buying the biggest unit available. It is also how you avoid spending your weekends in a cold aisle with a flashlight.

Three corporate mini-stories from the land of “it should be fine”

Mini-story 1: The incident caused by a wrong assumption

A company rolled out a new batch of storage-heavy servers: lots of disks, dual controllers, and “redundant” PSUs. Procurement sized the PSUs by adding
CPU TDP, RAM, and “a bit for disks.” They also standardized on a 120V feed in a legacy room because it was “already there.”

Everything looked fine in steady state. The racks were quiet, the monitoring graphs were boring, and the rollout was declared a success. Then the facility
did a planned power maintenance. After power came back, the entire row tried to boot at once. Several breakers tripped immediately. A handful of racks came
up half-alive: some servers boot-looped, some dropped disks, and a couple of controllers came up degraded.

The postmortem was messy because the first wave of debugging went after software: kernel versions, boot order, RAID firmware. The giveaway was that failures
clustered by rack and PDU, not by OS build. Someone finally pulled the PDU peak current logs and compared them to the branch circuit rating.

The root cause wasn’t that the servers drew “too much power” on average. It was that inrush and simultaneous disk spin-up blew past the breaker’s tolerance
at 120V, where current is higher for the same wattage. The “redundant PSUs” didn’t help because redundancy doesn’t prevent a breaker from tripping.

The fix was boring: staggered boot sequencing, enabling staggered spin-up on the HBAs, and moving the highest-density racks to higher-voltage feeds where
available. They also started capturing peak power at the PDU as part of acceptance testing. The next maintenance event was uneventful—which is the best kind
of event.

Mini-story 2: The optimization that backfired

Another org got serious about power efficiency and decided to “right-size” aggressively. They noticed their servers idled around 120–160W and concluded that
smaller PSUs would improve efficiency at idle. They pushed a standard config that used lower-wattage PSUs across a new compute fleet.

The lab tests looked good. Idle efficiency improved slightly. Procurement loved the cost reduction. The fleet deployed into production, where the workload
was bursty—think batch analytics mixed with spiky API traffic. During bursts, CPU boost behavior pushed sustained power higher than anyone expected.
It was still “within spec” for a single PSU—until a PSU failed.

Under 1+1 redundancy, a single PSU now had to carry the full load. On paper, it could. In practice, under higher inlet temps and dustier conditions than
the lab, the remaining PSU ran hot. The platform firmware responded by throttling performance. The service didn’t go down, but latency went from “fine” to
“why is the queue melting.” SREs saw it as a software regression because nothing crashed. It just got slow and unpredictable.

The backfire wasn’t the idea of right-sizing. It was doing it based on idle tests and ignoring N+1 failure mode plus environmental derating. The eventual fix
was to bump PSU size one step, enforce power caps on the most bursty nodes, and stop using idle-only efficiency as the success metric. Efficiency matters.
Predictability matters more.

Mini-story 3: The boring but correct practice that saved the day

A team running a mixed fleet (some GPU boxes, some storage nodes, some plain compute) had a dull policy: every new hardware model had to pass a “power
characterization” checklist before it could join production. That meant measuring idle, 50th percentile load, sustained peak, and boot inrush—using the
same PDU model they used in production.

The checklist also required testing redundancy: pull one PSU under load and confirm the node stays stable, with power and thermals recorded.
It was not optional. It was not “when we have time.” It was as mandatory as RAID rebuild tests.

One day, a vendor delivered a “minor revision” of a server model. Same SKU family, same marketing sheet, different firmware and slightly different fan
behavior. The characterization caught that the new revision had a much sharper fan ramp under certain sensor thresholds, bumping peak draw enough to matter
when a PSU failed. The system stayed up, but the headroom vanished.

Because they had baseline data, this didn’t become an incident. They adjusted rack placement (lower density per circuit for that revision), tuned firmware
settings where allowed, and updated the power budget. A month later a real PSU failure happened in production during a heavy job. The node stayed online.
No customer impact. No heroics. Just the quiet satisfaction of boring engineering working exactly as promised.

Fast diagnosis playbook

When something smells like power—random reboots, correlated failures by rack, performance cliffs after a PSU failure—don’t wander. Check in this order.
The goal is to find the bottleneck in minutes, not after you’ve rewritten half the scheduler.

First: confirm what’s failing (node, rack, feed, or room)

  • Do failures cluster by rack/PDU or by hardware model?
  • Are events correlated with boot storms, maintenance, or temperature spikes?
  • Do you see breaker trips or UPS alarms?

Second: trust the PDU/UPS for input power, then cross-check BMC

  • Check rack PDU per-outlet watts/amps and any peak counters.
  • Compare to BMC reported system watts; large mismatches suggest bad sensors or different measurement points.
  • Validate input voltage. Low voltage means higher current for the same load, and less margin.

Third: test redundancy behavior under load

  • Under controlled conditions, pull one PSU and observe: does power jump? do fans ramp? does the host throttle?
  • Confirm each PSU is on separate feeds and that the feeds are truly independent.
  • If performance changes materially, treat it as a production risk, not a “nice-to-know.”

Fourth: isolate transient causes

  • Boot inrush and disk spin-up: look for breaker or PDU peak events during power restoration.
  • Firmware/fan curve changes: correlate with recent updates.
  • Workload bursts: correlate with CPU/GPU power telemetry and timing of the incidents.

Common mistakes (symptom → root cause → fix)

1) Random reboots under load

Symptom: Hosts reboot when batch jobs start or GPU utilization spikes.
Root cause: PSU overload or transient response issues; sometimes a single PSU is weak/aging and collapses on step-load.
Fix: Measure at PDU during the workload. Test with one PSU removed. Replace suspect PSU. If peaks are legitimate, raise PSU capacity or cap power.

2) Breaker trips after maintenance or power restoration

Symptom: Whole racks stay dark, breakers trip right when everything powers on.
Root cause: Inrush current plus synchronized boot; HDD spin-up and fan ramp amplify it, especially at 120V.
Fix: Stagger boot. Enable staggered spin-up. Reduce density per circuit, or move to higher voltage feeds.

3) “Redundant” PSU but one failure causes performance collapse

Symptom: No outage, but latency spikes and throughput tanks when one PSU dies.
Root cause: Single-PSU mode triggers power capping or thermal stress; remaining PSU runs hot and firmware throttles CPU/GPU.
Fix: Size so one PSU can handle sustained peak with margin at worst inlet temp. Test and document the failure-mode performance.

4) PDU shows high watts, BMC shows low watts (or vice versa)

Symptom: Two “authoritative” numbers disagree by 20–40%.
Root cause: Different measurement points (input vs computed), sensor calibration drift, or BMC firmware bugs.
Fix: Treat PDU/UPS input as the billing and breaker truth. Use BMC for relative trends. Calibrate once with a known-good meter if needed.

5) New firmware causes power budget overruns

Symptom: After BIOS/BMC update, rack power rises or fan power jumps; breakers or UPS alarms start appearing.
Root cause: Updated fan curves, higher boost power limits, or different default power profiles.
Fix: Re-characterize power after firmware changes. Lock power profiles. Roll updates in waves with PDU monitoring.

6) “We sized by TDP” and now everything is tight

Symptom: Nameplate math says you’re safe; real measurements say you’re not.
Root cause: TDP isn’t a cap; platform overhead, memory, drives, NICs, fans, and boost behavior were not included.
Fix: Build a measured power model per platform: idle, typical, sustained peak, inrush, and N+1 mode. Stop using TDP sums as a final answer.

Checklists / step-by-step plan

Step-by-step: how to size a PSU for a server model (without guessing)

  1. Establish the measurement source of truth. Use metered PDU/UPS input watts for capacity planning; use BMC for trends and redundancy status.
  2. Record inlet conditions. Note input voltage and approximate inlet temperature during tests. Power data without conditions is gossip.
  3. Measure four power points:
    • Idle (post-boot, services steady)
    • Typical workload (representative production mix)
    • Sustained peak (stress test that matches real bottlenecks)
    • Boot/inrush peak (cold boot, not warm reboot)
  4. Test N+1 behavior. Under load, remove one PSU and observe stability, throttling, and fan behavior. Record power and thermals.
  5. Apply margin intentionally. Add headroom for sensor error, environmental drift, aging, and “surprise firmware.” Avoid the reflex of doubling.
  6. Validate against the circuit. Convert watts to amps at your voltage, and ensure you’re not flirting with branch limits under peaks and inrush.
  7. Decide on PSU size and redundancy. Choose a PSU model where single-PSU mode remains stable at sustained peak, with real margin.
  8. Document the profile. Store measured values and test conditions in your hardware runbook so the next person doesn’t redo archaeology.
  9. Operationalize monitoring. Alert on unusual increases, but also on loss of redundancy and unexpected shifts in load sharing.
  10. Re-test after meaningful changes. BIOS/BMC updates, new NICs/HBAs, GPU model changes, or workload shifts all deserve a re-measure.

Checklist: rack power budgeting you can defend in a meeting

  • Per-rack: measured typical and measured peak, not just a sum of nameplates.
  • Per-circuit: amperage at actual voltage, with clear assumption about allowable continuous utilization.
  • Inrush plan: boot staggering procedure documented and tested.
  • Redundancy plan: PSUs on separate feeds; PDUs on separate upstreams where possible.
  • Acceptance tests: every new hardware model gets a power characterization run before rollout.

Checklist: “we are about to add GPUs” edition

  • Measure GPU power limit per card and confirm the platform’s total power envelope.
  • Confirm PCIe auxiliary power and riser limits; don’t assume the chassis wiring matches the GPU marketing.
  • Test combined CPU+GPU peak under real workloads (not only synthetic).
  • Confirm redundancy behavior when one PSU is removed during GPU load.
  • Validate rack circuit and PDU outlet type (C13 vs C19) and per-outlet caps.

FAQ

1) Can I size a PSU by adding up component TDP?

Use TDP sums only as a rough lower bound. Real systems exceed it via boost behavior, fan power, controller charging, and transient spikes. Measure at the PDU.

2) Should I always buy the highest-wattage PSU option?

No. Oversizing can waste money and reduce efficiency at low load. Buy for measured sustained peak plus margin, and ensure single-PSU mode is safe if you run redundant.

3) What margin should I add?

There’s no universal number. Add margin for measurement error, environmental changes, aging, and future add-ons. The right margin is the one that survives N+1
failure mode at worst inlet temperature without throttling or instability.

4) Which is more trustworthy: BMC watts or PDU watts?

For breaker and capacity planning: PDU/UPS input watts. For per-host trending and redundancy status: BMC is useful. If they disagree, investigate, but budget off the PDU.

5) Why does power draw jump after a BIOS/BMC update?

Firmware can change CPU power limits, fan curves, memory training behavior, and peripheral power management defaults. Treat firmware updates like a hardware change:
re-measure power.

6) How do redundant PSUs affect efficiency?

With load sharing, each PSU runs at a lower percentage load, which may move you off the sweet spot of the efficiency curve. With active/standby, one PSU may run
near the sweet spot but the other still consumes standby power. Measure, don’t assume.

7) What’s the deal with VA vs W when sizing UPS and circuits?

W is real power; VA is apparent power. UPSes and PDUs may quote either. If power factor isn’t near 1.0, VA can be significantly higher than W, and that can
become the limiting factor for UPS capacity even when watts look fine.

8) How do I prevent breaker trips during fleet restarts?

Stagger boots, enable staggered drive spin-up where applicable, and avoid orchestrated “all nodes up now” behavior. Confirm with PDU peak logs and do a controlled test.

9) Do I need to care about PSU rail limits anymore?

Less than in the old desktop multi-rail drama, but still yes in certain platforms. Server PSUs and backplanes usually abstract this, but high GPU density and
riser power distribution can expose hidden limits. If you see instability under GPU load, verify platform power distribution, not just total watts.

10) What’s a practical way to power cap servers to stay within limits?

Use vendor tools or firmware power profiles where possible, and validate with PDU measurements. For CPUs, RAPL-based limits can help, but confirm behavior under your
workload—some workloads trade latency for watts in unpleasant ways.

Next steps you can do this week

  • Pick one server model and create a power profile: idle, typical, sustained peak, boot/inrush, and N+1 test results.
  • Make the PDU your source of truth for input power and peaks; wire SNMP polling into your metrics pipeline.
  • Run a controlled redundancy test: under meaningful load, pull a PSU and watch for throttling, fan ramp, and power jumps.
  • Write a boot-stagger procedure and rehearse it. After an outage is not the time to discover your tooling can’t do sequencing.
  • Update your purchasing spec: require metered PSUs/BMC sensors where possible, and require vendors to state behavior under single-PSU mode.
  • Re-check after firmware updates. Treat “minor revision” hardware as new until you’ve measured it.

The goal isn’t to become a power engineer. It’s to stop treating watts like folklore. Measure, budget, test failure modes, and move on to the problems that are
actually interesting—like why your storage rebuild window is still too long.

← Previous
Audio Crackling on Windows 11: Fix Latency Without Buying New Hardware
Next →
Bypass ‘This PC Can’t Run Windows 11’ Safely (What Still Matters)

Leave a comment