ESXi to Proxmox Storage Migration: Moving VMFS Datastores to ZFS, NFS, or iSCSI with Minimal Downtime

Was this helpful?

Storage migration is where virtualization projects go to die: not because the tools are bad, but because the assumptions are. “The network is fine.” “These disks are fast.” “We can copy it overnight.” Then you hit the wall at 2 a.m. with a half-copied VMDK and a database that refuses to boot anywhere except the old VMware host.

If you’re moving from ESXi/VMFS to Proxmox and you want minimal downtime, you need two things: (1) a storage target that behaves predictably under VM workloads, and (2) a migration method that doesn’t require heroics. This guide is the practical playbook: what to choose (ZFS vs NFS vs iSCSI), what to measure, what to copy, and what to do when performance goes sideways.

Facts and context that change decisions

Here are a few concrete facts and bits of history that matter when you’re planning VMFS-to-Proxmox storage migrations. Not trivia. Decision fuel.

  1. VMFS was designed for concurrent access by multiple ESXi hosts, with on-disk locking primitives that assume VMware’s stack. Linux can read VMFS, but it’s not a first-class citizen in the Proxmox world.
  2. VMDK is not “one format.” There are monolithic sparse, thick, split, and stream-optimized variants. The export path (OVF/OVA) can quietly change the disk format you end up copying.
  3. ZFS started at Sun Microsystems with an “end-to-end checksumming” philosophy. That matters for VM images because silent corruption is a real failure mode, not a scary story.
  4. NFS has been around since the 1980s and has had decades to become boring. In production systems, boring is a feature you can page less about.
  5. iSCSI is block storage over IP—which means your VM host takes responsibility for the filesystem (like ext4, xfs, or ZFS). That makes performance tuning and failure domains very different from NFS.
  6. VMware’s change block tracking (CBT) can make incremental copies fast in VMware land, but it’s not a native Proxmox concept. If you were relying on it indirectly (backup products), plan for a behavioral shift.
  7. Thin provisioning is a policy, not a law of physics. Overcommit is easy; reclaiming space later is where the drama starts, especially across format conversions.
  8. 4K sector alignment used to be optional. It isn’t anymore. Misalignment can cost you double writes and “why is this SSD slower than rust?” moments during migrations.

Joke #1: Storage migrations are like moving house: the boxes multiply when you’re not looking, and somehow you end up carrying a printer you haven’t used since 2017.

Choose your landing zone: ZFS vs NFS vs iSCSI

ZFS on Proxmox (local or shared via replication)

When to pick it: You want strong data integrity, snapshots, simple operational control, and you can keep VM disks local to a Proxmox node (or you’re okay with replication-based HA rather than shared storage).

Operational reality: ZFS is not “set it and forget it.” It’s “set it and monitor it.” ZFS will give you excellent behavior when you respect its needs: enough RAM, sane recordsize/volblocksize decisions, and not pretending a RAIDZ1 of consumer SSDs is enterprise storage.

Minimal downtime angle: ZFS snapshots and zfs send/zfs receive are your friend for large transfers and repeatable cutovers. If you can stage a dataset and do a final incremental send during a short outage window, you get predictable downtime.

Avoid it if: You must have shared storage semantics for live migration across nodes and you don’t want to run a replication schedule with failover logic. Proxmox can do replication, but it’s not identical to “shared datastore.”

NFS (shared storage without the iSCSI personality)

When to pick it: You want shared storage that’s easy to reason about, you have a solid NAS or Linux NFS server, and you value operational simplicity and portability.

Operational reality: NFS performance is highly dependent on the server, network, and mount options. The upside: when something breaks, the tooling and mental models are common. The downside: if your network has microbursts, you’ll find out.

Minimal downtime angle: Shared NFS lets you place VM disks centrally so you can move compute separately. For cutovers, you still need to copy the data, but once the disks live on NFS, host-level evacuation is much simpler.

Avoid it if: Your only network is a congested, shared LAN with unknown buffering and you cannot isolate storage traffic. NFS over a shaky network feels like a database on a trampoline.

iSCSI (block storage with sharp edges)

When to pick it: You already have a SAN, you need block-level semantics, you want multipath, and you have the discipline to manage it. iSCSI is fine—if you treat it like a system, not a checkbox.

Operational reality: iSCSI gives you failure modes that look like “storage is slow” but are actually “one path is flapping and multipath is doing interpretive dance.” Your monitoring needs to include path health and latency distribution, not just throughput.

Minimal downtime angle: If your iSCSI LUN is already the shared backing store, you can migrate compute more freely. The data migration still exists (VMFS to whatever filesystem you use on the LUN), but you gain flexibility afterward.

Avoid it if: You don’t have redundant networking, you can’t do multipath properly, or you need your team to sleep. iSCSI without standards is a ticket generator.

Target architectures that actually work

Architecture A: Single-node Proxmox with local ZFS

Best for small to medium environments, labs, branch sites, or teams migrating in phases. Put VMs on a local ZFS pool. Use snapshots for rollback. Use ZFS replication to a second node for disaster recovery or “poor-man’s HA.”

Key decisions: mirror vs RAIDZ, SLOG or not, compression, ashift, and how you’ll handle backups (Proxmox Backup Server or other). The pool layout is a one-way door.

Architecture B: Proxmox cluster with shared NFS

This is the “make it boring” option. Shared NFS storage for VM disks + separate Proxmox cluster networking for corosync + dedicated backup path. Live migration becomes a compute move, not a storage move.

Key decisions: NFS server design (ZFS-backed NAS is common), network isolation (VLANs or physically separate NICs), and mount options that match your workload.

Architecture C: Proxmox cluster with iSCSI + multipath + LVM-thin or ZFS

You can do iSCSI LUNs with LVM-thin, or iSCSI LUNs consumed by ZFS as a vdev (not my favorite unless you know exactly why you’re doing it). Most teams do LVM-thin on iSCSI for block storage semantics.

Key decisions: multipath policy, queue depths, ALUA settings, and how you’ll monitor per-path latency. Also: who owns the SAN configuration and how changes are controlled.

Joke #2: iSCSI is like a relationship with great chemistry and terrible communication—when it’s good, it’s brilliant; when it’s bad, it’s 3 a.m. Wireshark.

Preflight: inventory, constraints, and downtime math

Inventory the VMs like you mean it

Your migration plan is only as good as your inventory. “About 40 VMs” is not inventory. Inventory includes: disk sizes (provisioned and used), OS types, boot modes (BIOS vs UEFI), critical services, RPO/RTO expectations, and whether the guest supports clean shutdown.

Measure change rate, not just size

If a VM has 2 TB of disk but changes 2 GB/day, you can stage most data ahead of time and keep cutover short. If it changes 400 GB/day, you’re doing a cold cut or you’re building a replication strategy at the application layer.

Decide what “minimal downtime” means

Minimal downtime is not zero downtime. It’s the smallest outage window that’s honest: time to quiesce the application, final delta copy, boot on Proxmox, validate, and roll back if needed.

Pick a rollback story before you start

If you can’t roll back, you’re not migrating; you’re gambling. Rollback usually means: keep the ESXi VM intact, do not delete VMFS data, and plan DNS/load balancer reversals. If you must reuse IPs, stage a plan for ARP cache refreshes and firewall state resets.

Hands-on tasks with commands (and what to decide from the output)

These are the real checks I run before, during, and after a storage migration. Each task includes a command, a plausible output snippet, what it means, and the decision you make from it.

Task 1: Confirm Proxmox node health and kernel basics

cr0x@server:~$ pveversion -v
pve-manager/8.2.2/1a2b3c4d (running kernel: 6.8.12-4-pve)

What it means: You’re on a modern Proxmox release with a current pve-kernel. Storage drivers and ZFS behavior are kernel-sensitive.

Decision: If you’re on an old kernel, fix that before migrating. Don’t combine “new storage + old kernel + midnight cutover.”

Task 2: Verify CPU virtualization and IOMMU (for futureproofing)

cr0x@server:~$ lscpu | egrep -i 'Virtualization|Model name'
Model name:                           Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz
Virtualization:                       VT-x

What it means: Hardware virtualization is present. Not directly storage, but it affects migration success when you need to run the same guest configs.

Decision: If virtualization is missing/disabled, stop and fix BIOS settings. Storage migration won’t matter if nothing boots.

Task 3: Check NIC link, speed, and errors (storage over network depends on this)

cr0x@server:~$ ip -s link show dev eno1
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 3c:ec:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast
    9876543210  1234567      0      12       0       0
    TX:  bytes packets errors dropped carrier collsns
    8765432109  1122334      0       0       0       0

What it means: Errors are zero; a few drops can be normal under burst, but “12 dropped” is a smell you should understand.

Decision: If errors increment during testing, fix cabling/switching before blaming NFS/iSCSI/ZFS.

Task 4: Validate jumbo frames end-to-end (only if you insist on them)

cr0x@server:~$ ping -M do -s 8972 -c 3 10.10.20.10
PING 10.10.20.10 (10.10.20.10) 8972(9000) bytes of data.
8980 bytes from 10.10.20.10: icmp_seq=1 ttl=64 time=0.612 ms
8980 bytes from 10.10.20.10: icmp_seq=2 ttl=64 time=0.590 ms
8980 bytes from 10.10.20.10: icmp_seq=3 ttl=64 time=0.605 ms

--- 10.10.20.10 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2040ms

What it means: MTU 9000 works across the path. If this fails, you don’t have jumbo frames; you have fragmentation and sadness.

Decision: If it fails, either fix MTU consistently everywhere or revert to 1500. Mixed MTU is worse than not having jumbo.

Task 5: Create a ZFS pool with correct ashift (example: mirrored SSDs)

cr0x@server:~$ sudo zpool create -o ashift=12 -o autotrim=on rpool mirror /dev/disk/by-id/nvme-SAMSUNG_MZVLB1T0HBLR-00000_S4X7NX0N123456 /dev/disk/by-id/nvme-SAMSUNG_MZVLB1T0HBLR-00000_S4X7NX0N789012
cr0x@server:~$ sudo zpool status rpool
  pool: rpool
 state: ONLINE
config:

        NAME                                                                STATE     READ WRITE CKSUM
        rpool                                                               ONLINE       0     0     0
          mirror-0                                                          ONLINE       0     0     0
            nvme-SAMSUNG_MZVLB1T0HBLR-00000_S4X7NX0N123456                  ONLINE       0     0     0
            nvme-SAMSUNG_MZVLB1T0HBLR-00000_S4X7NX0N789012                  ONLINE       0     0     0

errors: No known data errors

What it means: Pool is healthy; ashift=12 is a sane default for 4K sector devices (including many SSDs that lie about sectors).

Decision: If you already created a pool with the wrong ashift, rebuild it. There is no magical “fix later.”

Task 6: Tune ZFS datasets for VM images (volblocksize + compression)

cr0x@server:~$ sudo zfs create -o compression=zstd -o atime=off rpool/vmdata
cr0x@server:~$ sudo zfs get -o name,property,value compression,atime rpool/vmdata
NAME         PROPERTY     VALUE
rpool/vmdata compression  zstd
rpool/vmdata atime        off

What it means: Compression is on (usually a win for VM images), atime is off (less metadata churn).

Decision: If your workload is already CPU-bound, compression might hurt. Otherwise, it often improves both space and performance.

Task 7: Benchmark storage latency the boring way (fio, sync writes)

cr0x@server:~$ sudo fio --name=sync4k --directory=/rpool/vmdata --rw=randwrite --bs=4k --iodepth=1 --numjobs=1 --size=2G --direct=1 --fsync=1
sync4k: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.33
sync4k: (groupid=0, jobs=1): err= 0: pid=22110: Mon Dec 23 14:03:41 2025
  write: IOPS=4200, BW=16.4MiB/s (17.2MB/s)(2048MiB/124800msec)
    clat (usec): min=120, max=24000, avg=230.50, stdev=410.20

What it means: 4K sync latency average ~230 µs is decent for mirrored NVMe. The max tail (24 ms) is what will hurt databases and log-heavy services.

Decision: If tail latency is ugly, stop. Fix device firmware, check write cache settings, or reconsider pool layout before importing production VMs.

Task 8: Export VM disks from ESXi datastore to a staging server (via SSH + vmkfstools copy)

cr0x@server:~$ ssh root@esxi-01 "vim-cmd vmsvc/getallvms | head"
Vmid   Name                 File                                 Guest OS           Version   Annotation
12     app01                [vmfs01] app01/app01.vmx              ubuntu64Guest      vmx-19
13     db01                 [vmfs01] db01/db01.vmx                centos7_64Guest    vmx-19

What it means: You can enumerate VMs and locate their VMX paths. From there, you can locate VMDKs inside the VM directory.

Decision: If you can’t SSH to ESXi due to policy, you’ll need to use an export method (OVF/OVA) or backup product restore. Plan for format changes and time.

Task 9: Confirm VMDK type and size on ESXi before copying

cr0x@server:~$ ssh root@esxi-01 "ls -lh /vmfs/volumes/vmfs01/app01 | egrep 'vmdk|vmx'"
-rw-------    1 root     root         8.0K Dec 20 10:12 app01.vmx
-rw-------    1 root     root         1.2K Dec 20 10:12 app01.vmdk
-rw-------    1 root     root        150G Dec 23 03:20 app01-flat.vmdk

What it means: This VM uses a descriptor + flat extent (common for thick-ish formats). Copy both. The descriptor is tiny but required.

Decision: If you only copy the big -flat.vmdk and forget the descriptor, you’ll spend time rebuilding metadata. Don’t do that to yourself.

Task 10: Copy VMDKs efficiently (rsync over SSH, with resume)

cr0x@server:~$ rsync -avP --inplace -e ssh root@esxi-01:/vmfs/volumes/vmfs01/app01/ /srv/stage/app01/
sending incremental file list
app01.vmdk
          1,248 100%    1.19KB/s    0:00:01 (xfr#1, to-chk=1/3)
app01-flat.vmdk
  12,884,901,888  8%   112.34MB/s    0:19:42

What it means: You’re moving the extent at ~112 MB/s. The --inplace option helps resuming large files.

Decision: If throughput is far below what the network/storage should do, stop and diagnose (see fast playbook). Don’t “wait it out” and miss your window.

Task 11: Convert VMDK to raw or qcow2 on the Proxmox side

cr0x@server:~$ qemu-img info /srv/stage/app01/app01.vmdk
image: /srv/stage/app01/app01.vmdk
file format: vmdk
virtual size: 150 GiB (161061273600 bytes)
disk size: 150 GiB

cr0x@server:~$ qemu-img convert -p -f vmdk -O raw /srv/stage/app01/app01.vmdk /rpool/vmdata/app01-disk-0.raw
    (100.00/100%)

What it means: You validated the source format and converted to raw, which is simple and performant on ZFS (ZVOL is also common; raw file on dataset is workable if tuned).

Decision: If you need snapshots at the storage layer and thin provisioning behavior, qcow2 can help—but it adds overhead. For performance-sensitive VMs on ZFS, raw or ZVOL is typically the move.

Task 12: Import disk into a Proxmox VM and verify boot config (BIOS/UEFI)

cr0x@server:~$ qm create 120 --name app01 --memory 8192 --cores 4 --net0 virtio,bridge=vmbr0
create VM 120: success

cr0x@server:~$ qm importdisk 120 /rpool/vmdata/app01-disk-0.raw rpool
importing disk 'app01-disk-0.raw' to VM 120 ...
transferred 0.0 B of 150.0 GiB (0.00%)
transferred 150.0 GiB of 150.0 GiB (100.00%)
Successfully imported disk as 'unused0: rpool:vm-120-disk-0'

cr0x@server:~$ qm set 120 --scsihw virtio-scsi-pci --scsi0 rpool:vm-120-disk-0 --boot order=scsi0
update VM 120: success

What it means: Disk is attached and set as boot device. If the guest was UEFI on ESXi, you also need OVMF and an EFI disk.

Decision: If the VM doesn’t boot, check firmware (SeaBIOS vs OVMF), disk controller type (LSI vs virtio), and whether the OS has virtio drivers.

Task 13: Configure NFS storage on Proxmox and validate mount behavior

cr0x@server:~$ sudo pvesm add nfs nfs-vmstore --server 10.10.20.10 --export /export/vmstore --content images,iso,backup --options vers=4.1
create storage 'nfs-vmstore'

cr0x@server:~$ pvesm status
Name         Type     Status     Total      Used      Available   %
local        dir      active     98.00GB    12.44GB    85.56GB   12.69%
nfs-vmstore  nfs      active     10.00TB     4.20TB     5.80TB   42.00%

What it means: Storage is mounted and visible. If it shows inactive, you have network/export/options problems.

Decision: If NFS is active but performance is bad, check NFS server threads, sync settings, and network QoS before blaming Proxmox.

Task 14: Configure iSCSI target and check session/login

cr0x@server:~$ sudo apt-get update -qq
cr0x@server:~$ sudo apt-get install -y -qq open-iscsi multipath-tools

cr0x@server:~$ sudo iscsiadm -m discovery -t sendtargets -p 10.10.30.20
10.10.30.20:3260,1 iqn.2001-04.com.example:san.lun01

cr0x@server:~$ sudo iscsiadm -m node -T iqn.2001-04.com.example:san.lun01 -p 10.10.30.20 --login
Logging in to [iface: default, target: iqn.2001-04.com.example:san.lun01, portal: 10.10.30.20,3260]
Login to [iface: default, target: iqn.2001-04.com.example:san.lun01, portal: 10.10.30.20,3260] successful.

What it means: Discovery works and you’re logged in. Next step is multipath sanity and filesystem/LVM creation.

Decision: If discovery is empty, you have target ACLs, VLAN routing, or CHAP mismatches. Fix those before touching Proxmox storage config.

Task 15: Validate multipath and path health (iSCSI)

cr0x@server:~$ sudo multipath -ll
mpatha (36001405abcde1234567890fedcba9876) dm-2 DGC,VRAID
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 3:0:0:10 sdb 8:16 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 4:0:0:10 sdc 8:32 active ready running

What it means: Two paths exist; one is active/optimized (prio 50). This is what you want with ALUA-capable arrays.

Decision: If you only see one path, don’t proceed. Fix the redundant networking now; later you’ll be doing it under load.

Task 16: Check ZFS health and detect silent trouble early

cr0x@server:~$ sudo zpool iostat -v rpool 2 3
                              capacity     operations     bandwidth
pool                        alloc   free   read  write   read  write
--------------------------  -----  -----  -----  -----  -----  -----
rpool                        320G   612G    120    450  12.3M  55.1M
  mirror-0                   320G   612G    120    450  12.3M  55.1M
    nvme-SAMSUNG...123456      -      -     60    225  6.1M   27.5M
    nvme-SAMSUNG...789012      -      -     60    225  6.2M   27.6M
--------------------------  -----  -----  -----  -----  -----  -----

What it means: Balanced I/O across the mirror and sane throughput. If one device shows odd latency spikes, you’ll see it in zpool iostat patterns.

Decision: If a single vdev is lagging, replace it before the migration. Don’t “monitor it” into a failure during cutover week.

Migration methods: cold, warm-ish, and “don’t do that”

Method 1: Cold migration (shutdown VM, copy disks, boot on Proxmox)

This is the default because it’s predictable. You shut down the VM on ESXi, copy the final disk state, convert/import, boot on Proxmox, validate services, and update networking/DNS.

Downtime profile: Potentially high, but bounded and understandable. You can test the import process ahead of time with a clone or a snapshot export.

Where it wins: Databases with strict consistency, legacy OSes, or when you don’t trust guest-level replication.

Method 2: Staged copy + short cutover (copy most data first, then final sync)

If you can keep the VM running while you stage a bulk copy (or you can stage from a snapshot), you can shrink downtime to “final delta + boot.” Common approach:

  • Stage a full copy of VMDK(s) to a transfer host.
  • Schedule a cutover window.
  • Shutdown VM, run a final rsync/copy (small delta if change rate is low), convert/import, boot.

Downtime profile: Often excellent for low-change-rate VMs. The key is not lying to yourself about change rate.

Method 3: Application-level replication (min downtime, more moving parts)

For big databases or high-write systems, copying disk images is often the wrong abstraction. Replicate at the application layer: database replication, file replication, or service-specific sync, then cut over clients.

Downtime profile: Can be very small, but operationally heavier. Also demands that your application has a clean failover plan.

Method 4: “Mount VMFS and copy from Linux” (works, but don’t romanticize it)

You can mount VMFS volumes from Linux using tools like vmfs-fuse, then copy out VMDKs. It’s useful in a pinch, especially if ESXi is dead but the LUNs are alive. But it’s not the cleanest primary plan. It can be read-only, performance varies, and you’re adding another abstraction layer during a stressful time.

Methods to avoid unless you really know why

  • “We’ll just convert directly from the ESXi datastore live.” If the datastore is busy, you’re competing with production I/O while trying to copy it. That’s how you invent your own incident.
  • “We’ll use qcow2 everywhere because it’s flexible.” Flexibility is not free. qcow2 can be fine, but it’s extra CPU and metadata overhead. Use it where it has a purpose (snapshots, thin), not as a religion.
  • “We’ll run RAIDZ with random-write VMs and hope.” RAIDZ can be great for sequential workloads and capacity. VM random-write patterns will show you why mirrors exist.

Checklists / step-by-step plan

Checklist A: Decide the target storage (10-minute decision)

  1. If you need shared storage for live migration today, prefer NFS (boring) or iSCSI (sharp) depending on your environment.
  2. If you prioritize data integrity + predictable ops and can accept local disks + replication, pick ZFS local.
  3. If you already own a SAN and have multipath competence, iSCSI is fine. If not, don’t learn it during migration week.

Checklist B: Pre-migration staging (do this before you touch a production VM)

  1. Build Proxmox storage (ZFS/NFS/iSCSI) and run fio with sync writes.
  2. Validate network MTU consistency and error counters.
  3. Create at least one test VM on Proxmox; ensure backups work.
  4. Pick one non-critical ESXi VM; do a full migration rehearsal and document the exact steps and timings.
  5. Define rollback: ESXi VM remains powered off but intact until validation passes; DNS changes tracked; firewall/NAT rules staged.

Checklist C: Cutover plan (per VM)

  1. Confirm last good backup and restore test (yes, actually test).
  2. Notify stakeholders with an outage window that includes validation time.
  3. Shutdown guest cleanly (application stop first for databases).
  4. Run final disk copy / final delta copy.
  5. Convert/import disk to Proxmox storage format.
  6. Set VM firmware/controller to match guest requirements.
  7. Boot in isolated VLAN or disconnected network first if you need to avoid IP conflicts.
  8. Validate service health (systemd, logs, application checks).
  9. Switch traffic (DNS/LB) and monitor.
  10. Keep rollback option for a defined period; then retire old VM artifacts.

Checklist D: Post-migration hardening (because future-you has to operate this)

  1. Enable Proxmox backups with retention and periodic restore tests.
  2. Set ZFS scrub schedule and alerting, or NFS/iSCSI health checks with latency alerts.
  3. Document storage topology: pools, datasets, exports, LUN IDs, multipath WWIDs.
  4. Baseline VM disk latency from inside the guest (so you have “normal” numbers).

Fast diagnosis playbook: find the bottleneck fast

When migration copies are slow or Proxmox VMs feel sluggish after cutover, don’t guess. Triage like an SRE: isolate whether you’re compute-bound, network-bound, or storage-bound. This is the shortest path to truth.

First: is it the network?

  • Check interface errors/drops on the Proxmox host and the storage server/switch ports.
  • Run an MTU ping test (only if using jumbo frames).
  • Measure raw throughput with iperf3 between Proxmox and NFS/iSCSI endpoints.

Interpretation: If iperf3 is weak or unstable, storage tuning won’t save you. Fix network first.

Second: is it the storage device latency (tail latency)?

  • Run fio with --fsync=1 and small blocks (4k/8k) to mimic worst-case VM writes.
  • On ZFS, use zpool iostat to see if one device is misbehaving.
  • On iSCSI, check multipath status and per-path latency if available from array tooling.

Interpretation: Average IOPS can look fine while tail latency destroys databases. Trust the tails.

Third: is it contention or misconfiguration on the host?

  • Look for CPU steal / host CPU saturation during conversion.
  • Check that your VM disk uses virtio-scsi and not an emulated controller unless required.
  • Confirm you’re not accidentally on a slow storage backend (like local dir on spinning disk).

Interpretation: If the host is busy converting multiple disks in parallel, you can make storage look bad. Throttle your own ambition.

Common mistakes: symptom → root cause → fix

1) Symptom: Copy speed is stuck at 20–40 MB/s on a 10Gb network

Root cause: Single TCP stream bottleneck, poor NIC offload settings, or you’re reading from a busy VMFS datastore under load.

Fix: Validate with iperf3. If network is fine, stage copy from a snapshot or after-hours when datastore is quiet. Consider parallelization carefully (two streams can help; twenty can burn the source array).

2) Symptom: VM boots on Proxmox but disk is “read-only filesystem”

Root cause: Filesystem inconsistency due to dirty shutdown or incomplete copy; sometimes also driver/controller mismatch leading to timeouts.

Fix: Do clean shutdown on ESXi. Re-copy after shutdown. Run fsck (Linux) or chkdsk (Windows) during validation in an isolated network.

3) Symptom: VM won’t boot; drops into EFI shell or “no bootable device”

Root cause: Firmware mismatch (UEFI vs BIOS), missing EFI disk, wrong boot order, or bootloader tied to controller type.

Fix: Match firmware: SeaBIOS for BIOS installs, OVMF for UEFI. Add EFI disk for OVMF. Ensure disk attached as scsi0/virtio where the OS expects it.

4) Symptom: NFS storage is active but VM I/O is spiky and latency jumps

Root cause: NFS server sync settings, overloaded NAS CPU, or network microbursts/drop. Sometimes it’s a single bad NIC/port.

Fix: Check NIC error counters and switch port stats. Validate NFS version (4.1 often behaves better) and server-side thread counts. Isolate storage traffic if you can.

5) Symptom: iSCSI LUN performance is fine until a path fails, then everything stalls

Root cause: Multipath misconfiguration (queue_if_no_path without sane timeouts), ALUA not respected, or asymmetric paths not handled.

Fix: Fix multipath policy and timeouts. Validate ALUA priorities. Test a controlled path failure before production cutover.

6) Symptom: ZFS pool looks healthy but VM writes are slow

Root cause: RAIDZ under random-write workload, insufficient RAM (ARC misses), or slog misconception (adding a slow SLOG device can hurt).

Fix: Mirrors for VM-heavy pools. Don’t buy a random cheap SSD for SLOG; if you don’t need synchronous semantics, leave it out. Measure with fio.

7) Symptom: Storage space “disappears” after migration

Root cause: Thick disks created from thin sources, qcow2 preallocation choices, or stale snapshots on the new backend.

Fix: Decide thin vs thick intentionally. Track snapshots and prune. On ZFS, watch referenced vs used; on LVM-thin, watch data+metadata usage.

Three corporate-world mini-stories (pain included)

Mini-story 1: The incident caused by a wrong assumption

They had a neat plan: export VMs from ESXi, import into Proxmox, point everything at the new NFS store. The team did a pilot with two small Linux VMs, it looked fine, and the PowerPoint practically wrote itself.

The wrong assumption was simple: “If NFS mounts, performance will be fine.” Nobody measured tail latency. Nobody checked switch buffers. The storage VLAN shared uplinks with backup traffic, and backups ran whenever the backup team felt spiritually moved to click “run now.”

Cutover night, a busy Windows file server VM landed on Proxmox. Users came in Monday morning and complained that Explorer was “freezing.” The VM wasn’t down; it was worse. It was alive and suffering. On the host, NFS I/O had periodic stalls; inside the guest, SMB timeouts stacked up like bad decisions.

The team chased ghosts: virtio drivers, Windows updates, antivirus, even DNS. Eventually someone graphed NFS latency and noticed the spikes aligned with backup traffic on the same uplinks.

The fix was boring: dedicate bandwidth for storage, enforce QoS, and schedule backups with the kind of discipline usually reserved for payroll. The lesson was sharper: “It mounted” is not a performance test. It’s barely a greeting.

Mini-story 2: The optimization that backfired

A different shop wanted speed. They chose ZFS for local VM storage and read a handful of confident internet posts about SLOG devices. So they added a “fast” consumer NVMe as SLOG, because it was available and because the phrase “separate intent log” sounds like a cheat code.

The environment had a mix of workloads. Some VMs were databases doing synchronous writes. Others were general app servers. For a week, things looked fine. Then the consumer NVMe started throwing intermittent latency spikes under sustained sync load. Not full failure. Worse: half-failure.

ZFS did what ZFS does: it preserved correctness. Performance, however, dropped off a cliff during those spikes. VMs developed periodic I/O pauses. The team interpreted this as “ZFS is slow” and started turning knobs: recordsize changes, compression off, arc tweaks. They turned it into a week-long tuning festival.

The real issue was the “optimization”: the SLOG device wasn’t power-loss protected and had inconsistent latency under sync workloads. Removing the SLOG and reverting to mirrored enterprise SSDs restored stable behavior. Later, they added a proper SLOG device with power-loss protection only after measuring that sync workloads actually benefitted.

Optimization isn’t about adding parts. It’s about reducing uncertainty. When you add a component with unpredictable latency, you’re optimizing for blame assignment.

Mini-story 3: The boring but correct practice that saved the day

One finance-adjacent company had to move a set of old but critical VMs: a licensing server, an internal app, and a database that everyone pretended wasn’t important until it broke. They had a strict change window and executives who heard “virtual” and assumed it meant “instant.”

The team’s practice was painfully unglamorous: every VM migration got a rehearsed runbook, a timing sheet, and a rollback drill. They also kept ESXi VMs intact for a fixed “confidence period” after cutover. No one was allowed to reclaim space early, no matter how tempting the empty datastore looked.

During the real cutover, the database VM imported fine but failed to boot due to a firmware mismatch: ESXi had it in UEFI mode; the initial Proxmox VM was created with SeaBIOS. That’s a simple fix, but it’s not a fun fix when your outage clock is ticking.

Because they rehearsed, they recognized the failure mode quickly. They switched the VM to OVMF, added an EFI disk, corrected the boot order, and booted cleanly. Total delay: minutes, not hours.

The real save came later: a downstream app had a hard-coded IP for the database, and the network team’s NAT rule didn’t apply in the new segment. Rollback wasn’t theoretical. They rolled back traffic, fixed the NAT rule, then re-cut. Boring practice—runbooks and rollback discipline—beat cleverness. Every time.

One reliability quote (operations-approved)

Paraphrased idea (attributed to Gene Kim): “Improving reliability often means improving the flow of work and feedback, not just adding more controls.”

FAQ

1) Can Proxmox read VMFS directly?

Not in a supported, first-class way for production writes. You can sometimes read VMFS volumes from Linux with helper tooling and copy data out. Treat it as a recovery technique, not your main migration pipeline.

2) Should I convert VMDK to qcow2 or raw?

For performance-sensitive VMs, raw (or ZVOL-backed disks) is usually the safer bet. qcow2 is useful when you need features like easy snapshots on file-based storage or thin provisioning behavior, but it adds overhead.

3) What’s the best option for live migration in Proxmox?

Shared storage (commonly NFS, or iSCSI with a shared block backend) makes live migration straightforward. Local ZFS can still work with replication-based approaches, but it’s not the same “shared datastore” model.

4) How do I keep downtime minimal for large VMs?

Either stage most of the data ahead of time and do a short final sync during the cutover, or replicate at the application layer (database replication, file replication). Disk-image copying alone doesn’t beat physics for high-change-rate workloads.

5) Do I need virtio drivers?

Linux generally handles virtio well. Windows often needs virtio drivers depending on the install and controller type. If you switch controllers (e.g., from LSI to virtio-scsi), plan driver availability before cutover.

6) Mirrors or RAIDZ for ZFS VM storage?

Mirrors for VM-heavy random I/O. RAIDZ is great for capacity and sequential patterns, but it can punish random writes and tail latency—exactly what VMs generate.

7) Is a SLOG device required for ZFS?

No. SLOG only matters for synchronous writes when you’re using sync=standard and workloads actually issue sync. A bad SLOG can make performance worse. Use one only after measuring and choose hardware with power-loss protection.

8) NFS or iSCSI for shared storage?

NFS is typically easier operationally and easier to debug. iSCSI can be excellent with proper multipath and SAN discipline, but it comes with more configuration and more interesting failure modes.

9) How do I validate performance after migration?

Measure from both sides: host-level (fio, zpool iostat, multipath status) and guest-level (application latency, filesystem latency, database commit times). Users experience latency, not throughput.

10) What’s the safest rollback strategy?

Keep the original ESXi VM powered off but intact until you’ve validated functionality and performance on Proxmox, and you’ve passed a defined observation window. Rollback should be a procedure, not a vibe.

Conclusion: next steps you can execute

Here’s the practical path that keeps you out of the “storage migration incident” club:

  1. Pick the landing zone: ZFS local (integrity + simplicity), NFS (shared + boring), or iSCSI (shared + sharp).
  2. Benchmark what matters: sync 4K latency and tail behavior, not just big sequential throughput.
  3. Rehearse one VM end-to-end: copy, convert, import, boot, validate, rollback. Time it.
  4. Stage data early for low-change-rate VMs; use application replication for high-change-rate systems.
  5. Cut over with rollback intact, and don’t delete VMFS data until you’ve survived reality.

If you do nothing else: measure, rehearse, and keep rollback boring and available. That’s what “minimal downtime” looks like in production—less drama, more proof.

← Previous
Debian 13: “Filesystem full” broke your DB — the recovery steps that actually work (case #59)
Next →
Ports with Missing Features: When “It’s There” Doesn’t Mean It Works

Leave a comment