How to migrate from VMware ESXi to Proxmox VE (step-by-step): VMs, disks, VLANs, downtime

Was this helpful?

The hardest part of “moving off ESXi” isn’t copying disks. It’s discovering what you’ve been relying on without noticing:
a VLAN tag somewhere in a vSwitch, a snapshot chain nobody owns, a Windows VM that only boots because of a controller type you forgot existed.

This guide is for production operators. The kind who want a plan you can run at 2 a.m. with a change ticket open, not a bloggy optimism festival.
We’ll migrate VMs, storage, and networks to Proxmox VE with realistic downtime math, real commands, and the failure modes you’ll actually hit.

What you’re really migrating (it’s not just VMs)

An ESXi estate is a set of agreements you didn’t write down: which VLANs exist, which NICs are trunks, which datastores are “fast enough,”
which VMs are allowed to run old virtual hardware, and which ones have been quietly snapshotting since last quarter.
Proxmox will happily run your workloads, but it won’t preserve those agreements unless you deliberately rebuild them.

Think of the migration as four independent moves:

  • Compute semantics: CPU model exposure, NUMA behavior, disk controller type, firmware (BIOS vs UEFI), secure boot, timers.
  • Storage semantics: thin vs thick, snapshots, write ordering, queue depth, TRIM/discard, alignment, cache settings.
  • Network semantics: VLAN tags, MTU, LACP, MAC learning, promiscuous mode (rarely needed, often “accidentally enabled”).
  • Operations semantics: backups, restores, monitoring, maintenance windows, and how you rollback when a VM won’t boot.

Your goal is not “get it booting.” Your goal is “make it boring again.” Boring is a feature.

Facts and context that change decisions

These are the small, concrete bits of history and behavior that matter in planning. Ignore them and you’ll learn them the expensive way.

  1. VMDK is older than most of your tooling. VMware’s virtual disk formats evolved alongside VMFS and snapshot chains; long-lived VMs often carry legacy assumptions (like controller types) forward.
  2. QCOW2 wasn’t designed as a “performance format.” It’s flexible (snapshots, compression), but raw on ZFS or LVM-thin can be simpler and faster for many production loads.
  3. Virtio became the de facto standard for KVM performance because emulation is expensive. If you leave NIC and disk on e1000/IDE equivalents “for safety,” you pay for it forever.
  4. OVF/OVA were meant for portability, but real portability depends on drivers. Exporting an appliance doesn’t guarantee it boots cleanly on different virtual hardware.
  5. ESXi snapshots are not backups, and the file structure makes that obvious. A chain of delta disks behaves like a Jenga tower: stable until you bump it.
  6. Proxmox’s cluster model is simple on purpose. It’s designed so a small team can run it without buying a management plane; that simplicity becomes your responsibility in networking and storage design.
  7. VLAN-aware bridges in Linux are mature. This isn’t 2008. A Linux bridge with VLAN filtering can replace many vSwitch use cases cleanly—if you map the tags correctly.
  8. ZFS has a long memory. It will protect you from silent corruption, but it will also faithfully preserve your bad decisions (like a recordsize mismatch for databases) until you tune it.

One quote to keep you honest, and to justify the paranoia in your change plan:
Hope is not a strategy. — James R. Schlesinger

Preflight inventory: what to measure before you touch anything

If you can’t list your VLANs, datastore usage, and VM boot firmware, you’re not migrating—you’re gambling with extra steps.
Inventory work is not glamorous. Neither is an incident review.

What to inventory (minimum)

  • VM list: name, CPU, RAM, disks, NIC count, OS, criticality, owner, maintenance window, RPO/RTO.
  • Disk layout: which VMDKs are thin, snapshot chains, and the datastore they live on.
  • Firmware: BIOS vs EFI. Secure boot matters more than you think.
  • Network: port groups, VLAN IDs, trunk ports, MTU, LACP, uplinks.
  • Special devices: USB passthrough, serial ports, GPU, dongles, physical NIC passthrough.
  • Time dependencies: NTP sources, domain controllers, licensing servers.
  • Backups: what can be restored, how quickly, and whether anyone has tested it this year.

Joke #1: The only thing more permanent than a “temporary” firewall rule is a “temporary” VM snapshot.

Decide upfront: cold migration or partial live?

With ESXi to Proxmox, “live migration” isn’t the default path unless you introduce intermediate replication tooling.
For most shops, the reliable approach is cold migration per VM (shutdown, export/copy, import, boot).
Your downtime is dominated by disk copy and post-boot validation.

You can reduce downtime with:

  • Pre-seeding disks while VM runs (copy VMDK to staging, then a final sync during downtime).
  • Application-level replication (DB replication, file sync), then flip endpoints.
  • Parallelism (multiple VMs per window), but don’t saturate your storage and then act surprised.

Target architecture on Proxmox: storage, CPU types, network model

Proxmox gives you enough rope to build a great platform or an artisanal outage. Pick a baseline and standardize it.

Compute: choose compatibility first, then performance

For mixed hardware clusters, start with a conservative CPU type (x86-64-v2 or similar baseline) so VMs can move between nodes later.
If you only have one node per VM forever, sure, use host for max performance. But be honest about your future.

  • Disk bus: VirtIO SCSI (single) or VirtIO Block; avoid IDE/SATA unless you’re bootstrapping an old OS.
  • NIC model: VirtIO; keep e1000 only for ancient installers and remove it afterwards.
  • Firmware: match the source VM (BIOS vs OVMF/UEFI). Don’t “modernize” during migration unless you like double-debugging.

Storage: pick a primary backend and stick to it

Common Proxmox storage backends in migration projects:

  • ZFS: strong integrity, snapshots, replication; great if you understand RAM needs and write amplification.
  • LVM-thin: simple, fast, familiar; fewer knobs; snapshots exist but behave differently than ZFS.
  • Ceph: powerful, but migrating to Proxmox is not the time to also learn distributed storage from scratch.
  • NFS/iSCSI: fine for shared storage; performance depends on network and server tuning.

Opinionated take: if you’re doing a first Proxmox migration and you’re not already running a storage team, use ZFS on mirror/RAIDZ
or LVM-thin on hardware RAID. Leave Ceph for when you can afford to get good at it.

Networking: embrace Linux bridges and be explicit

Proxmox uses Linux networking under the hood. That’s good news: it’s predictable, scriptable, and debuggable with standard tools.
Your migration hinges on recreating ESXi port groups as Linux bridge VLANs (or separate bridges), then attaching VM NICs with the right tags.

VLANs, bridges, bonds: mapping ESXi networking to Proxmox

ESXi hides some complexity behind “vSwitch” and “port group.” Linux shows you the gears. That’s a feature until you mis-tag traffic.

Translate the model

  • ESXi vSwitch uplinks → Linux bond (optional) or one physical NIC
  • Port group with VLAN ID → Proxmox bridge port with tag set, or VLAN-aware bridge + per-VM VLAN tag
  • Trunk/native VLAN behavior → Linux bridge VLAN filtering + switchport config must agree

Recommended pattern: one VLAN-aware bridge per host

Create vmbr0 attached to your trunk uplink (or bond), set it VLAN-aware, and tag per VM NIC.
This keeps the host config stable while you change per-VM tags during migration.

MTU and jumbo frames: decide with evidence

If you need MTU 9000 for storage (iSCSI/NFS/Ceph), set it end-to-end: switchports, physical NICs, bonds, bridges, and VM interfaces as needed.
A single link at 1500 creates a “works sometimes” nightmare. Those are the best kind of nightmares: intermittent.

Disk migration strategies (choose your pain)

There are three common ways to get a disk from ESXi to Proxmox. None is universally best; pick based on size, bandwidth, and how allergic you are to downtime.

Strategy A: export OVF/OVA, import into Proxmox

Best when: you want a packaged export and your VMs are straightforward.
Worst when: you have huge disks or want tight control over disk formats and controllers.

  • Pros: standardized container, easy to archive.
  • Cons: can be slow; import may not preserve every nuance; still need driver/controller validation.

Strategy B: copy VMDK and convert with qemu-img

Best when: you want control, you have shell access, you want to choose raw vs qcow2, and you care about performance.

  • Pros: transparent, scriptable, works without fancy tooling.
  • Cons: snapshot chains require care; conversion time can be significant; needs disk space planning.

Strategy C: block-level replication or storage-based move

Best when: you already have replication (database, file sync) or shared storage you can re-present.
This is usually not a “generic VM migration” story; it’s per-application.

Joke #2: The migration plan that says “we’ll just convert the disks” is like saying “we’ll just land the plane.” Technically correct, emotionally incomplete.

Checklists / step-by-step plan (downtime and cutover)

Phase 0: pick the pilot and define success

  • Choose 1–3 VMs that are representative but not business-critical (yet).
  • Define “success” in operational terms: boot, network reachability, app health checks, backup jobs running, monitoring green.
  • Define rollback: power off Proxmox VM, power on ESXi VM, revert DNS or VIP if needed.

Phase 1: build Proxmox host(s) with a boring baseline

  • Install Proxmox VE, update, and reboot into the expected kernel.
  • Configure storage with a single primary pool/volume group; avoid mixing formats for no reason.
  • Configure a VLAN-aware bridge and confirm trunking with the network team (or your future self).
  • Set NTP, DNS, and a management access pattern that doesn’t rely on a single jump host.

Phase 2: pre-seed data (optional but powerful)

  • Copy VM exports or VMDKs to a staging area while the VM is still running.
  • Measure throughput and estimate downtime: disk size / observed MB/s + conversion overhead.
  • If downtime is too large, stop here and redesign (replication, better link, local staging, parallel copy).

Phase 3: downtime window (per VM)

  1. Disable schedules/agents that will fight you (backup jobs, patching).
  2. Shutdown the VM cleanly on ESXi.
  3. Final copy of disk/export files.
  4. Convert/import into Proxmox and attach with matching firmware/controller.
  5. Boot in an isolated network first (optional but recommended for critical apps).
  6. Validate: IP, routes, DNS, app health, data integrity, time sync, licensing, logs.
  7. Cut over: move VLAN/DNS/VIP, enable monitoring and backups.
  8. Leave ESXi VM powered off but intact until the next business day at minimum.

Phase 4: standardize after success

  • Convert NIC/disk to VirtIO if you booted with legacy models.
  • Install qemu-guest-agent and enable it for clean shutdowns and IP reporting.
  • Set Proxmox backup policy and test restore (don’t negotiate with physics later).

Taskbook: commands, outputs, and what you decide from them

You wanted practical. Here’s practical. Each task includes (1) command, (2) what output means, (3) what decision you make.
Adjust hostnames, VMIDs, storage names, and interfaces to your environment.

Task 1: confirm Proxmox node health and versions

cr0x@server:~$ pveversion -v
proxmox-ve: 8.2.2 (running kernel: 6.8.12-4-pve)
pve-manager: 8.2.4 (running version: 8.2.4/2e1f9d1a)
pve-kernel-6.8: 6.8.12-4
qemu-server: 8.2.2
libpve-storage-perl: 8.2.3

Meaning: Confirms kernel and QEMU stack. Migrations fail in weird ways on mismatched userland/kernel combinations.

Decision: If the node is behind, update now—before you import VMs and start debugging ghosts.

Task 2: check storage capacity and filesystem type on the target

cr0x@server:~$ pvesm status
Name             Type     Status           Total            Used       Available        %
local             dir     active         98.00 GiB       12.34 GiB       80.12 GiB   12.59%
rpool             zfspool active          3.64 TiB        1.02 TiB        2.62 TiB   28.02%

Meaning: Shows storages available to Proxmox and free space. Imports require headroom (conversion can double space temporarily).

Decision: If available space is less than 1.3–1.5× the largest disk you’ll convert, stop and add space or change strategy (raw import, stream conversion, or staging elsewhere).

Task 3: verify ZFS pool health before trusting it

cr0x@server:~$ zpool status -x
all pools are healthy

Meaning: ZFS is not currently reporting errors or degraded vdevs.

Decision: If you see DEGRADED/FAULTED, you do not import production VMs yet. Fix the pool first or you’re importing into a slow-motion incident.

Task 4: baseline network interface and bridge state

cr0x@server:~$ ip -br link
lo               UNKNOWN        00:00:00:00:00:00
eno1             UP             3c:fd:fe:aa:bb:01
bond0            UP             3c:fd:fe:aa:bb:01
vmbr0            UP             3c:fd:fe:aa:bb:01

Meaning: Confirms link state. If your bridge or uplink isn’t UP, your “import succeeded” is about to become “why can’t it ping.”

Decision: Fix physical/bond/bridge issues now; don’t start with a broken network and hope it heals.

Task 5: confirm bond mode and LACP negotiation (if using a bond)

cr0x@server:~$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v6.8.12-4-pve

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
MII Status: up
LACP rate: fast
Aggregator selection policy (ad_select): stable

Slave Interface: eno1
MII Status: up
Actor Churn State: none

Slave Interface: eno2
MII Status: up
Actor Churn State: none

Meaning: Confirms 802.3ad and link health. Churn states and “down” indicate switch mismatch or cabling issues.

Decision: If LACP isn’t stable, use active-backup temporarily and schedule a network fix. Migration day is not when you debug switchport channels.

Task 6: verify VLAN filtering and allowed VLANs on the Linux bridge

cr0x@server:~$ bridge vlan show
port              vlan-id
bond0             1 PVID Egress Untagged
bond0             10
bond0             20
bond0             30
vmbr0             1 PVID Egress Untagged

Meaning: Shows which VLANs are allowed/filtered on the bridge ports. If VLAN 20 is missing, VMs tagged 20 will be silent.

Decision: If VLANs aren’t present, fix bridge VLAN configuration and switch trunk configuration before cutover. This is a binary success condition.

Task 7: baseline throughput for copy planning (don’t guess)

cr0x@server:~$ iperf3 -c 10.20.30.40 -t 10
Connecting to host 10.20.30.40, port 5201
[  5] local 10.20.30.11 port 50422 connected to 10.20.30.40 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  9.42 GBytes  8.09 Gbits/sec                  sender
[  5]   0.00-10.00  sec  9.41 GBytes  8.08 Gbits/sec                  receiver
iperf Done.

Meaning: You have ~8 Gbit/s of usable throughput in this test path.

Decision: Use this to estimate copy time. If your VM has a 2 TB disk, you’re not doing a “quick window” over this link without pre-seeding or redesign.

Task 8: copy an exported VM bundle to Proxmox (staging)

cr0x@server:~$ rsync -ah --info=progress2 /mnt/stage/ESXI-EXPORTS/app01/ /var/lib/vz/import/app01/
         12.34G  12%   215.78MB/s    0:00:46
        101.22G 100%   238.11MB/s    0:07:14 (xfr#12, to-chk=0/14)

Meaning: Confirms actual sustained transfer rate and completion.

Decision: If throughput is far below expectation, find the bottleneck before you schedule downtime (disk IO, single thread, encryption overhead, network path).

Task 9: convert a VMDK to raw (often best on ZFS/LVM-thin)

cr0x@server:~$ qemu-img info /var/lib/vz/import/app01/app01-disk1.vmdk
image: /var/lib/vz/import/app01/app01-disk1.vmdk
file format: vmdk
virtual size: 300G (322122547200 bytes)
disk size: 128G
cluster_size: 65536

Meaning: Shows virtual size (guest-visible) vs actual used blocks (thin-provisioned reality).

Decision: If the disk is thin but full of old blocks, consider trimming inside the guest before final export. It can cut copy time dramatically.

cr0x@server:~$ qemu-img convert -p -O raw /var/lib/vz/import/app01/app01-disk1.vmdk /rpool/vmdata/app01-disk-0.raw
    (0.99/100%)

Meaning: Conversion is CPU and IO heavy; percent indicates progress.

Decision: If conversion time is unacceptable, evaluate importing as qcow2 temporarily, or pre-convert during business hours on staging storage, then move the result.

Task 10: create the VM shell in Proxmox with matching firmware

cr0x@server:~$ qm create 120 --name app01 --memory 8192 --cores 4 --sockets 1 --net0 virtio,bridge=vmbr0,tag=20 --scsihw virtio-scsi-single --ostype l26 --bios ovmf
create VM 120: success

Meaning: Creates VMID 120 with OVMF (UEFI) firmware and a VLAN-tagged virtio NIC.

Decision: Match BIOS/OVMF to the source VM. If you guess wrong, you’ll get a “no boot device” that looks like a disk problem but isn’t.

Task 11: import the disk into Proxmox-managed storage

cr0x@server:~$ qm importdisk 120 /rpool/vmdata/app01-disk-0.raw rpool
importing disk '/rpool/vmdata/app01-disk-0.raw' to VM 120 ...
Formatting '/rpool/data/vm-120-disk-0.raw', fmt=raw size=322122547200
Successfully imported disk as 'unused0:rpool:vm-120-disk-0'

Meaning: Disk is now in Proxmox storage and attached as an unused device.

Decision: Next you must attach it as the correct boot device/controller. “Unused” means Proxmox imported it but you didn’t wire it in.

Task 12: attach the imported disk and set boot order

cr0x@server:~$ qm set 120 --scsi0 rpool:vm-120-disk-0 --boot order=scsi0 --efidisk0 rpool:1,format=raw
update VM 120: -scsi0 rpool:vm-120-disk-0 -boot order=scsi0 -efidisk0 rpool:1,format=raw

Meaning: VM now has its primary disk on SCSI (VirtIO SCSI) and an EFI disk for OVMF variables.

Decision: If the source was BIOS (legacy), do not add efidisk0; instead keep --bios seabios and set boot accordingly.

Task 13: first boot with console and immediate log triage

cr0x@server:~$ qm start 120
start VM 120: success
cr0x@server:~$ qm terminal 120
connected to qemu-server 120 (running)

Meaning: You’re on the VM console. If it hangs early, this is where you see bootloader and driver issues.

Decision: If the VM kernel panics or Windows BSODs, don’t randomly change five settings. Identify controller/firmware mismatch first.

Task 14: verify the VM sees its NIC and gets an IP (via guest agent)

cr0x@server:~$ qm guest cmd 120 network-get-interfaces
{
  "return": [
    {
      "name": "eth0",
      "ip-addresses": [
        {
          "ip-address": "10.20.20.51",
          "ip-address-type": "ipv4",
          "prefix": 24
        }
      ],
      "hardware-address": "BC:24:11:22:33:44"
    }
  ]
}

Meaning: Guest agent is running and reports network interfaces. This is gold during cutovers.

Decision: If guest agent isn’t responding, you may still proceed, but you lose visibility. Install/enable the agent as part of stabilization.

Task 15: check host IO latency during conversion/import

cr0x@server:~$ iostat -xz 1 3
Linux 6.8.12-4-pve (server)  12/28/2025  _x86_64_ (32 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.40    0.00    6.10   18.20    0.00   63.30

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz aqu-sz  %util
nvme0n1         20.00   880.00     0.00   0.00   2.10    44.00  420.00  51200.00    12.00   2.78  18.50   121.90   7.90  92.00

Meaning: High %iowait and high w_await indicate storage pressure. %util near 100% suggests saturation.

Decision: If latency spikes during business hours, throttle conversions, move imports off peak, or stage on faster storage. Don’t “power through” and then explain the company-wide slowness.

Task 16: validate VLAN tagging from inside a VM (quick sanity)

cr0x@server:~$ ping -c 2 10.20.20.1
PING 10.20.20.1 (10.20.20.1) 56(84) bytes of data.
64 bytes from 10.20.20.1: icmp_seq=1 ttl=64 time=0.410 ms
64 bytes from 10.20.20.1: icmp_seq=2 ttl=64 time=0.392 ms

--- 10.20.20.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.392/0.401/0.410/0.009 ms

Meaning: The VM reaches its gateway on the expected subnet; VLAN and bridge tagging likely correct.

Decision: If it can’t ping the gateway, don’t debug the application. Fix L2/L3 first: VLAN tag, trunk allowed list, bridge, or IP config.

Windows and Linux guest gotchas (drivers, boot, qemu-guest-agent)

Controller types are not cosmetic

Many ESXi Windows VMs booted for years on LSI Logic SAS or VMware Paravirtual controllers.
On Proxmox, you’ll likely want VirtIO SCSI for performance. But if Windows doesn’t have the VirtIO driver installed, it won’t boot after you switch.

Practical approach:

  • First boot using a compatible controller (SATA can be a temporary crutch).
  • Install VirtIO drivers in Windows.
  • Then switch disk bus to VirtIO SCSI and confirm boot.

UEFI vs BIOS mismatch looks like a disk failure

If the source VM used EFI and you boot it under SeaBIOS (or vice versa), you often get “no boot device.”
People then re-import disks, re-convert, and waste a day. Match firmware first.

Time sync: avoid dueling clocks

ESXi had VMware Tools time sync in many environments. Proxmox uses QEMU guest agent and standard OS time services.
Pick one authoritative method (usually NTP/chrony in the guest) and disable the rest.

Trim/discard and thin provisioning reality

If you migrate thin-provisioned disks and then store them on thin storage again, you’ve created a “thin-on-thin” stack.
It can be fine, but you must monitor actual physical usage and ensure discard/TRIM is supported end-to-end.

Fast diagnosis playbook (find the bottleneck in minutes)

When a migration is “slow” or a VM is “laggy,” don’t debate feelings. Take measurements in this order. It’s optimized for speed and signal.

First: is the network the limiter?

  • Run iperf3 between source-side staging and Proxmox.
  • Check for a single saturated link, duplex mismatch, or a hidden 1G hop.
  • Confirm MTU consistency if you expect jumbo frames.

Second: is the target storage saturated or high-latency?

  • Use iostat -xz 1 on Proxmox during conversion/import.
  • Look for %util near 100% and await climbing into tens of milliseconds for SSD/NVMe (or hundreds for spinning rust).
  • Check ZFS: pool health, ARC pressure, and whether you’re accidentally syncing everything.

Third: is CPU the limiter (often during conversion)?

  • Use top or pidstat to see if qemu-img is pegging a core.
  • If CPU-bound, parallelizing conversions can help—but only if storage can handle it.

Fourth: is the guest misconfigured (boot and drivers)?

  • Check firmware and disk bus first (OVMF vs SeaBIOS; VirtIO vs SATA).
  • Then NIC model (VirtIO) and VLAN tags.
  • Only then chase application-level issues.

Common mistakes: symptom → root cause → fix

1) VM boots to “no boot device”

Symptom: VM starts, then drops into firmware shell or boot manager with no disk.

Root cause: Firmware mismatch (BIOS vs UEFI) or wrong boot order, or disk attached to a different bus than the OS expects.

Fix: Match firmware to the source VM; set boot order to the correct disk; attach disk on SATA temporarily if needed to boot, then install VirtIO and switch.

2) VM has network link but can’t reach gateway

Symptom: Interface is up; IP is set; ping to gateway fails.

Root cause: Wrong VLAN tag (or missing VLAN on trunk); bridge not VLAN-aware; switchport not trunking the VLAN.

Fix: Validate VLAN tag per VM NIC; confirm bridge VLAN filtering; confirm switch allowed VLAN list. Prove L2 with ARP and ping to gateway.

3) Migration copy is “fast at first, then crawls”

Symptom: Copy starts at hundreds of MB/s, then drops to tens.

Root cause: ZFS pool hitting SLOG-less sync writes (if forced), ARC pressure, or destination disk becoming latency-bound due to random writes during conversion.

Fix: Stage on fast local storage; convert off-peak; consider raw instead of qcow2; monitor iostat and ZFS stats; avoid forcing sync unless required.

4) Windows VM bluescreens after switching to VirtIO

Symptom: Boot failure/BSOD immediately after changing disk controller or NIC model.

Root cause: Missing VirtIO drivers or wrong storage controller type.

Fix: Boot using SATA or a compatible controller; install VirtIO drivers; then switch and reboot. Keep one known-good rollback point.

5) VMs work but are mysteriously slower than on ESXi

Symptom: Higher latency, lower throughput, intermittent stalls.

Root cause: Emulated devices (e1000/IDE), wrong caching mode, host power settings, or storage recordsize mismatch (ZFS).

Fix: Use VirtIO for NIC and disk; verify cache settings; ensure CPU governor/power management is sane; tune ZFS dataset properties where appropriate.

6) Backups silently don’t capture the new VMs

Symptom: Backup jobs run, but the migrated VM isn’t included or fails.

Root cause: VM stored on a storage backend not included in backup config, or snapshots not supported/disabled.

Fix: Reconcile storage IDs, backup schedules, and snapshot capabilities. Test restore immediately after the first successful cutover.

Three corporate mini-stories from the trenches

Mini-story 1: the incident caused by a wrong assumption (VLAN “defaults”)

A mid-sized SaaS company planned an ESXi to Proxmox migration with what looked like a clean network story:
“All VM networks are tagged VLANs, trunks everywhere, no native VLAN dependence.” That sentence should have been followed by evidence, but it wasn’t.

During the first production cutover, the VM came up, responded on its application port from some places, and was completely unreachable from others.
The on-call team saw link lights, correct IP config, and a clean boot. They did the classic thing: blamed the firewall.
A few rule toggles later, the blast radius grew, because now they were changing more variables while lacking the one variable that mattered.

The root cause was brutally simple: the ESXi port group had VLAN ID set, but the upstream switchport also had a “native VLAN” that carried management traffic,
and some legacy systems relied on untagged frames reaching the VM through a chain of intermediate devices.
On Proxmox, the bridge was VLAN-aware but configured with a strict allowed list that dropped untagged traffic.

The fix wasn’t to “make Proxmox behave like ESXi” by turning off filtering. The fix was to document which traffic was supposed to be tagged,
eliminate accidental reliance on native VLAN behavior, and then explicitly configure untagged VLAN handling where it was truly required.
After that, the next migrations were boring. The first one was not.

Mini-story 2: the optimization that backfired (thin-on-thin and surprise capacity)

A corporate IT team was proud of their storage efficiency. On ESXi they used thin-provisioned VMDKs on a shared array, and it “worked fine.”
When moving to Proxmox with ZFS, they kept the pattern: VMDK → qcow2 (thin) → ZFS (copy-on-write). Thin everywhere. Efficient everywhere.

For a few weeks, it looked like a win. Migration time was acceptable, snapshots were easy, and backups were lighter than expected.
Then a quarterly batch job ran: a workload that wrote and rewrote large files in place.
Copy-on-write plus fragmentation plus metadata churn did what it always does when provoked—latency climbed and stayed there.

The team reacted by adding more snapshots (“so we can roll back”). That increased metadata and space pressure.
Then they tried compression on already-compressed data. CPU went up, latency got worse, and the business users discovered new ways to describe slowness.

The eventual fix was boring and effective: move hot database disks to raw on a dedicated dataset with sane recordsize,
minimize snapshot frequency on those volumes, and keep thin provisioning where it actually helps (VM templates, dev boxes, low-churn servers).
Efficiency isn’t free; it’s a trade. The bill arrives as latency.

Mini-story 3: the boring but correct practice that saved the day (restore test and rollback discipline)

A healthcare org had a migration plan that made some engineers roll their eyes: “Every migrated VM must pass a restore test.”
Not “we have backups.” Not “backups are green.” An actual restore, boot, and validation on an isolated network.
It sounded slow. It was slow. It also saved them.

One VM—an old vendor appliance with a fragile bootloader—imported cleanly but failed on first reboot after a configuration change.
No one had touched the disk format; it just didn’t come back. The vendor blamed virtualization. Virtualization blamed the vendor.
Time did what time always does during an outage: it accelerated.

Because they had already tested restores on Proxmox, they didn’t have to invent a recovery procedure while stressed.
They restored the last known-good backup to a new VMID, attached the NICs with the correct VLAN tags, and the service returned.
The original VM stayed down for forensics without blocking patient-facing operations.

The postmortem wasn’t glamorous. The takeaway was: restore tests aren’t pessimism, they’re rehearsal.
In regulated environments, “we can restore” is a capability, not a belief.

FAQ

1) Can I migrate ESXi VMs to Proxmox with near-zero downtime?

Sometimes, but not generically. For true near-zero downtime, use application replication (DB replication, file sync, clustering) and cut over endpoints.
Disk-conversion-based VM moves are usually cold migrations unless you introduce specialized replication tooling.

2) Should I use qcow2 or raw on Proxmox?

If you’re on ZFS or LVM-thin, raw is often the simplest performance baseline. qcow2 is useful when you want qcow2 features (like internal snapshots),
but it can add overhead and complexity. Pick one standard per storage class and don’t mix casually.

3) What’s the cleanest way to handle VLANs on Proxmox?

A VLAN-aware bridge (vmbr0) on the trunk uplink, then set the VLAN tag per VM NIC. It scales, it’s explicit, and it mirrors how port groups work conceptually.

4) Do I need a Proxmox cluster to start?

No. A single node is fine for a pilot. But if you expect to live-migrate VMs between nodes later, plan CPU compatibility, shared storage (or replication),
and consistent networking from the start.

5) My Windows VM won’t boot after import. What do I check first?

Firmware (UEFI vs BIOS), then disk controller/bus. If you switched to VirtIO before installing drivers, revert to a compatible controller, boot, install drivers, switch again.

6) How do I estimate downtime per VM?

Measure transfer throughput (iperf and real rsync), then compute: disk bytes / sustained bytes per second + conversion time + boot/validation.
Add buffer for surprises. If you’re trying to squeeze a 2 TB move into a 30-minute window, the math will publicly disagree with you.

7) Can I keep the same IP and MAC addresses?

You can keep IPs if the VLAN/subnet remains the same. MAC addresses can be set manually in Proxmox if you must,
but do it only when a license or policy depends on it; otherwise let Proxmox assign and update dependencies.

8) What about snapshots during migration?

Collapse or remove unnecessary ESXi snapshots before exporting/copying. Snapshot chains slow exports and increase risk.
On Proxmox, establish a new snapshot/backup policy after stabilization; don’t import old snapshot baggage unless you have a strong reason.

9) Is it safe to migrate and also “upgrade” VM hardware (UEFI, VirtIO, new NIC model)?

It’s safe when you can afford double-debugging. For critical systems, migrate first with minimal change, prove stability, then modernize in a second change window.

10) What’s the minimum validation after cutover?

Boot, correct IP/subnet/gateway/DNS, time sync, app health checks, logs clean enough to see new errors, monitoring, and a successful backup job (or at least a manual backup run).

Conclusion: practical next steps

Do the migration like an operator, not a gambler. Build a boring Proxmox baseline, inventory the ESXi estate with intent,
and treat VLANs and firmware/controller choices as first-class migration data—not “details we’ll fix after it boots.”

Next steps that pay off immediately:

  • Pick a pilot VM and run the full cold migration path end-to-end, including backup and restore test.
  • Standardize one network pattern (VLAN-aware bridge) and one storage pattern (raw on ZFS or LVM-thin) for most VMs.
  • Write a per-VM cutover sheet: firmware, disk bus, VLAN tag, IP plan, owner validation steps, rollback steps.
  • Schedule migrations in batches that match measured throughput and storage latency, not wishful calendar math.

When you’re done, the best compliment you can receive is silence: no tickets, no “it feels slower,” no mystery VLAN ghosts.
Just workloads doing their job, on a platform you can explain on a whiteboard without apologizing.

← Previous
Proxmox “cannot initialize CMAP service”: Corosync/pmxcfs troubleshooting checklist
Next →
Low-end GPUs: why they’ll matter again

Leave a comment