Dovecot: maildir vs mdbox — pick storage that won’t haunt you later

November 29, 2025 • February 3, 2026 • Read: 23 min • Views: 18

Was this helpful?

You don’t notice your mailbox format when things are quiet. You notice it when the CEO’s iPhone says “Cannot Get Mail,” your disks are 70% idle, and yet every IMAP login feels like it’s negotiating with a filing cabinet full of confetti.

Mail storage is one of those infrastructure decisions that stays boring—until it becomes the only thing anyone wants to talk about. Let’s keep it boring, on purpose.

The decision that actually matters

“Maildir vs mdbox” sounds like a format debate. It’s not. It’s an operational philosophy debate:

Maildir bets on filesystem semantics: each message is its own file; atomic renames are your friend; corruption tends to be localized; and you get a lot of inodes.
mdbox bets on Dovecot-managed aggregation: messages live in bigger container files with Dovecot metadata; you reduce inode pressure; operations can be faster under certain IO patterns; and when you mess up, you can mess up larger.

If you’re running a small server with a sane filesystem and you want straightforward debugging and easy partial recovery, Maildir is the default that ages well. If you’re operating at scale where inode counts, directory scanning, and small-file overhead are killing you, mdbox can be the right pain—but only if you’re disciplined about backups, maintenance, and operational tooling.

One quote worth keeping on a sticky note, because it applies directly to mailbox formats: “Hope is not a strategy.” — Gene Kranz.

Maildir and mdbox in one screen

Maildir: what it is

Maildir stores each message as a separate file in a directory structure—typically cur/, new/, and tmp/ per mailbox. Flags are often encoded in the filename. Delivery and moves rely on atomic rename behavior.

Operational vibe: When something breaks, you can often open a directory and see the messages. You can recover one user without feeling like you’re diffusing a bomb.

mdbox: what it is

mdbox stores messages inside “box” files managed by Dovecot (with accompanying index and map metadata). Think of it as Dovecot owning more of the storage layer: fewer files, more structure, and more dependence on Dovecot’s consistency guarantees.

Operational vibe: When it’s fast, it’s nice. When you need to repair, you want your tools ready and your backups verified.

What you should choose (opinionated)

Choose Maildir if: you’re a small-to-medium operation, you value simple recovery, you have decent SSDs, you use snapshotting/backups, and you want predictable failure domains.
Choose mdbox if: you have lots of users, lots of messages, inode pressure is real, directory listing overhead hurts, or you need storage features like efficient file counts—and you’re willing to operationalize Dovecot maintenance and consistent backup/restore drills.
Avoid “it doesn’t matter” as a decision. It matters the day you need to restore one mailbox at 3 a.m. and your restore process is basically “restore everything and pray.”

Joke #1: Mail storage decisions are like tattoos: they seem fun until you try to remove them during a quarterly outage review.

Facts and history that explain today’s tradeoffs

Some context makes the tradeoffs feel less arbitrary. Here are concrete points that show why these formats exist and why they behave the way they do:

Maildir was designed to avoid mbox locking problems. Traditional mbox stores a whole mailbox in one file; concurrent access historically caused locking pain and corruption risk.
Maildir’s “atomic rename” trick depends on filesystem guarantees. The tmp→new/cur rename pattern relies on atomicity within the same filesystem.
IMAP exploded mailbox “metadata” needs. Indexing, flags, and UID tracking became performance-critical; Dovecot’s index files are a response to that reality.
Small-file overhead became a bigger deal as mail retention grew. Millions of tiny files stress inodes, directory lookups, and backup tools; this pressure is one reason aggregated formats exist.
Dovecot introduced “box” formats to reduce filesystem churn. mdbox and similar designs shift work from the filesystem into Dovecot-managed structures.
Filesystem evolution matters. Ext4, XFS, ZFS, and btrfs handle directories and metadata differently; the same format can be “fine” on one and painful on another.
Copy-on-write snapshots changed backup expectations. With ZFS/btrfs snapshots, “consistent point-in-time” is easier—but only if your indexing and locking model behaves well under snapshots.
Email clients got more aggressive. Mobile clients do frequent syncs; server-side search/FTS became expected; mailbox formats that amplify metadata IO can feel worse today than they did in 2008.

How each format fails in production

Maildir failure modes

1) “Too many files” becomes a real outage. You hit inode exhaustion, backups slow to a crawl, or directory scans become your latency floor. Maildir doesn’t politely warn you; it just becomes slow and then suddenly impossible.

2) Partial corruption is survivable—but not free. A few message files can be corrupted by disk issues or broken transfers. Usually you can salvage the rest. But if your index files go weird, clients see missing or duplicate messages until you rebuild indexes.

3) Backups lie if you don’t snapshot. File-by-file backup while delivery is ongoing can capture inconsistent states (messages in tmp/, partial renames). It can still work, but you need to understand what “consistent” means for maildir.

mdbox failure modes

1) Metadata consistency becomes your life. mdbox leans on Dovecot metadata (indexes, map files). If those are out of sync or corrupted, the mailbox can look empty or scrambled even if the underlying box files exist.

2) Larger blast radius per file. A corrupted container file can affect more messages. Dovecot tooling can repair in many cases, but the “single message file” isolation of maildir is not the default here.

3) Restore complexity goes up. Restoring one mailbox can be straightforward if you have per-user directories and good tooling. It can also be a mess if you did “one giant volume and hope.” Design matters.

Joke #2: The best mailbox format is the one you can restore while your coffee is still drinkable.

Performance model: what you’re really paying for

The hidden tax in Maildir: metadata and directory operations

Maildir performance is dominated by filesystem metadata: creating, renaming, stat’ing, listing directories, and updating timestamps. If you have SSDs and a filesystem that handles directories well, maildir can be very fast. If you have spinning disks or overloaded metadata paths, maildir can feel like it’s doing everything except serving email.

When users have hundreds of thousands of messages in a single folder, maildir can degrade sharply because the server ends up doing a lot of directory operations just to answer “what’s new?” Dovecot indexes help, but the underlying file count still haunts you in backups, fsck time, and inode usage.

The hidden tax in mdbox: Dovecot-managed structures and repair workflow

mdbox tends to reduce file count pressure, which can reduce directory and inode overhead. But you’re paying in a different currency: you need to trust and maintain Dovecot’s metadata structures. That means you care about index integrity, map file health, and how your backup/restore interacts with those files.

On busy systems, mdbox can be friendlier to the filesystem, but it can also amplify the consequences of “clever” tuning or unsafe backup practices.

Latency vs throughput: pick what your users feel

Most mail outages aren’t “the server can’t handle throughput.” They’re “login is slow,” “opening INBOX is slow,” “search is slow,” “flag updates are slow.” That’s latency. Latency comes from storage round-trips and metadata contention.

Rule of thumb: If your pain is metadata-heavy (file counts, directory scans, backup crawling), mdbox starts to look better. If your pain is repair and recovery simplicity, maildir is hard to beat.

Backups, restores, and why “it’s just files” is a trap

Maildir backups: deceptively simple

Maildir looks like “just files,” which makes people relax. Don’t. Live maildir has transient states (tmp/), renames, and index updates. If you back it up without snapshots, you can capture a mailbox mid-flight.

What works well:

Filesystem snapshots (ZFS, btrfs, LVM thin snapshots) with backup reading from the snapshot.
Backups that preserve permissions, ownership, and timestamps (mail delivery and Dovecot are sensitive to these).
Regular index rebuild practices during restore tests.

mdbox backups: you need consistency, not just copies

mdbox requires you to capture both the box files and the metadata/index files in a consistent point-in-time. Snapshot-based backups are the sane baseline. If you rely on file-by-file copying without snapshots, you risk capturing mismatched states—box file says one thing, index says another, map says a third.

Restores: plan for “one user, one folder, one message”

The restore that matters isn’t “restore the entire server.” It’s “restore one mailbox folder for one executive because a client synced and deleted everything.” That restore should be a runbook, not a heroic improvisation.

If you can’t restore a single mailbox without restoring the world, you didn’t pick a storage format; you picked a future incident.

Replication/HA reality check

Mailbox format doesn’t replace replication strategy. It just changes the failure modes and operational ergonomics.

Dovecot replication and format choice

Dovecot can replicate mailboxes at the application layer. That can smooth over some filesystem differences. But replication doesn’t fix:

Bad capacity planning (inode exhaustion still happens, just on two machines).
Slow storage (now you have slow storage plus replication overhead).
Unsafe backup practices (replication will happily replicate deletions and some forms of corruption).

Snapshots are not replication, replication is not backup

Snapshots give point-in-time recovery; replication gives availability. You want both if the mail system matters. If you can only afford one, pick backups that you’ve tested. Availability without recovery is just a faster way to stay down.

Practical tasks: commands, outputs, and what you decide

These are the checks I actually run when someone says “IMAP is slow,” “messages disappeared,” or “disk is fine, but mail is dying.” Each task includes what the output means and the decision you make.

1) Confirm the current mailbox format

cr0x@server:~$ doveconf -n | egrep '^(mail_location|mail_attachment_dir|mail_plugins|namespace|mail_fsync)'
mail_location = maildir:~/Maildir
mail_plugins = $mail_plugins quota
mail_fsync = optimized

Meaning: This server uses maildir under each user’s home directory. If you expected mdbox, your “performance assumptions” are already wrong.

Decision: Keep troubleshooting aligned with the format: inode pressure and directory ops matter more with maildir; metadata consistency and map/index integrity matter more with mdbox.

2) Check Dovecot version (features and bugs matter)

cr0x@server:~$ dovecot --version
2.3.19.1 (9b53102964)

Meaning: You’re on a modern 2.3.x. Behavior differs across major/minor releases, especially around index handling and fsync defaults.

Decision: If you’re on something ancient, consider upgrading before doing “clever” tuning. Old mail storage bugs are not charming.

3) Measure inode usage (Maildir’s silent killer)

cr0x@server:~$ df -ih /var/vmail
Filesystem      Inodes IUsed   IFree IUse% Mounted on
/dev/sdb1         50M   41M     9M   83% /var/vmail

Meaning: 83% inode usage. That’s not “fine.” That’s “one bad import away from downtime.”

Decision: If inode usage trends upward fast, either (a) move to a filesystem with more inodes / different allocation strategy, (b) enforce retention, (c) move high-volume users to mdbox, or (d) redesign foldering/archival.

4) Count message files in a hot mailbox folder

cr0x@server:~$ find /var/vmail/acme.example/jane/Maildir/cur -type f | wc -l
287641

Meaning: 287k files in one directory. Many filesystems handle this poorly under churn, even if reads are cached.

Decision: Consider folder partitioning (year-based archives), client policy changes, or moving that user to mdbox if the operational cost is recurring.

5) Identify whether Dovecot is spending time in IO wait

cr0x@server:~$ iostat -x 1 3
Linux 6.1.0-18-amd64 (server) 	01/03/2026 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.21    0.00    1.10   24.50    0.00   71.19

Device            r/s     w/s   rkB/s   wkB/s  avgrq-sz avgqu-sz   await  r_await  w_await  svctm  %util
nvme0n1         85.00  220.00  7400.0 14800.0     95.0     3.20   10.50     5.10    12.60   0.35  10.70

Meaning: CPU iowait is high (24.5%). Storage isn’t saturated (%util ~10%), but latency (await) isn’t great. That often points to sync-heavy patterns or metadata contention rather than raw throughput limits.

Decision: Check fsync settings, Dovecot processes doing sync storms, and filesystem mount options. For maildir, metadata write patterns can cause this even on SSD.

6) See which processes are causing IO pressure

cr0x@server:~$ pidstat -d 1 5
Linux 6.1.0-18-amd64 (server) 	01/03/2026 	_x86_64_	(8 CPU)

# Time        UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
12:10:01     1001     23142      0.00   9200.00      0.00  dovecot
12:10:01     1001     23188      0.00   5100.00      0.00  dovecot
12:10:01        0     11422      0.00   1400.00      0.00  rsync

Meaning: Dovecot is writing heavily (likely index updates, flag changes, deliveries). There’s also an rsync running—classic “backup competing with live IO.”

Decision: If you don’t have snapshots, stop file-walking backups during peak. Move to snapshot-based backup or schedule rsync off-hours, or throttle it.

7) Check Dovecot service health and concurrency

cr0x@server:~$ doveadm service status
auth: client connections: 12, server connections: 12
imap: client connections: 380, server connections: 380
lmtp: client connections: 0, server connections: 0
indexer-worker: client connections: 8, server connections: 8

Meaning: IMAP has 380 active connections. Indexer workers are active. If you’re under-provisioned on indexers, searches and mailbox opens can drag.

Decision: Tune process limits responsibly. If indexers are pegged, increase workers or fix the cause (e.g., constant index rebuild due to permission issues or broken caches).

8) Measure mailbox open and status latency (from Dovecot’s perspective)

cr0x@server:~$ doveadm -v mailbox status -u jane@example.com messages recent uidnext unseen INBOX
INBOX messages=142003 recent=0 uidnext=412887 unseen=12

Meaning: This command should return fast. If it stalls, you have IO latency, index issues, or locking contention.

Decision: If slow: check for index corruption, expensive filesystem scans, or blocked storage. For maildir with huge INBOX, encourage archiving.

9) Identify and rebuild broken index files safely

cr0x@server:~$ doveadm -Dv index -u jane@example.com INBOX
doveadm(jane@example.com): Debug: Loading modules from directory: /usr/lib/dovecot/modules
doveadm(jane@example.com): Debug: Added plugin: quota
doveadm(jane@example.com): Debug: Finished indexing INBOX

Meaning: Dovecot can rebuild indexes. Debug output confirms module load and that indexing completed.

Decision: If this fixes missing messages in clients, you had index inconsistency rather than message loss. Add periodic index maintenance or fix the root cause (permissions, disk errors, forced resets).

10) Look for filesystem-level errors (the boring truth serum)

cr0x@server:~$ dmesg -T | egrep -i 'ext4|xfs|btrfs|zfs|nvme|i/o error|reset|abort' | tail -n 10
[Fri Jan  3 11:58:41 2026] nvme nvme0: I/O 123 QID 6 timeout, completion polled
[Fri Jan  3 11:58:41 2026] nvme nvme0: resetting controller
[Fri Jan  3 11:58:43 2026] EXT4-fs warning (device sdb1): ext4_dx_add_entry: Directory index full, reach max htree level

Meaning: NVMe timeouts and an ext4 directory indexing warning. That’s not a Dovecot problem. That’s storage and filesystem behavior under stress.

Decision: Fix hardware/firmware, check NVMe health, and consider filesystem tuning or moving mail storage to a filesystem better suited for huge directories.

11) Confirm actual disk health (before you blame the mail format)

cr0x@server:~$ smartctl -a /dev/nvme0 | egrep -i 'critical_warning|media_errors|num_err_log_entries|temperature'
Critical Warning:                   0x00
Temperature:                       41 Celsius
Media and Data Integrity Errors:    0
Error Information Log Entries:      2

Meaning: No media errors, but there are error log entries. Combined with NVMe resets, you may have intermittent controller/firmware issues.

Decision: Schedule maintenance: firmware updates, controller checks, and consider redundancy. Mail formats won’t save you from flaky hardware.

12) Check directory and file distribution (spot pathological layouts)

cr0x@server:~$ du -sh /var/vmail/acme.example/jane/Maildir
96G	/var/vmail/acme.example/jane/Maildir

Meaning: 96 GB for one user. Large is fine, but large plus lots of small files changes everything.

Decision: If a handful of users dominate storage and performance, treat them specially: separate storage tier, mdbox, or dedicated volume.

13) Check filesystem mount options (latency killers hide here)

cr0x@server:~$ findmnt -no TARGET,SOURCE,FSTYPE,OPTIONS /var/vmail
/var/vmail /dev/sdb1 ext4 rw,relatime,errors=remount-ro

Meaning: Standard options. If you see sync or extremely aggressive journaling settings, you might have self-inflicted latency.

Decision: Avoid random mount-option cargo culting. Make changes only with measured latency improvements and a rollback plan.

14) For mdbox: locate key metadata and validate basic integrity signs

cr0x@server:~$ ls -la /var/vmail/acme.example/jane/mdbox/ | head
total 64
drwx------  5 vmail vmail 4096 Jan  3 12:01 .
drwx------ 12 vmail vmail 4096 Jan  3 12:01 ..
-rw-------  1 vmail vmail 8192 Jan  3 11:59 dovecot.index
-rw-------  1 vmail vmail 4096 Jan  3 11:59 dovecot.index.log
-rw-------  1 vmail vmail 2048 Jan  3 12:00 dovecot.map.index
-rw-------  1 vmail vmail 4096 Jan  3 11:58 storage

Meaning: Presence of index and map files is expected. Missing or zero-sized files during normal operation can indicate corruption or permission problems.

Decision: If these are missing or unreadable, fix permissions/ownership first; if corruption is suspected, move to snapshot restore or Dovecot repair workflows.

15) Observe active lock contention on mailbox files

cr0x@server:~$ lsof +D /var/vmail/acme.example/jane/Maildir 2>/dev/null | head -n 10
COMMAND   PID  USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
dovecot 23142 vmail   15r  REG  8,17     12456 918273 /var/vmail/acme.example/jane/Maildir/dovecot.index
dovecot 23142 vmail   16u  REG  8,17     40960 918274 /var/vmail/acme.example/jane/Maildir/dovecot.index.log
dovecot 23188 vmail   18r  REG  8,17     53248 918275 /var/vmail/acme.example/jane/Maildir/cur/1735891023.M1234P23188.server,S=53248:2,S

Meaning: You can see which files are hot. If many processes contend on index/log files, you may have a workload that constantly churns flags or forces index rewrites.

Decision: If contention is consistent, evaluate index settings, storage latency, and client behavior (e.g., clients that re-sync everything constantly).

Fast diagnosis playbook

This is the order that finds bottlenecks quickly without turning into a week-long archaeology project.

First: prove whether it’s storage latency, CPU, or Dovecot concurrency

IO wait and latency: iostat -x 1 3 and pidstat -d 1 5. High iowait or high await points to storage or sync patterns.
CPU saturation: top or pidstat -u 1 5. If CPU is pegged, you’re not choosing between maildir and mdbox—you’re choosing between scaling and rewriting.
Connection pressure: doveadm service status. If IMAP connections spike and processes are starved, fix process limits and client behavior.

Second: determine whether the problem is “filesystem metadata” or “Dovecot metadata”

Maildir suspects: inode usage (df -ih), huge folder file counts (find ... | wc -l), ext4 directory warnings in dmesg.
mdbox suspects: missing/invalid dovecot.map.index and index log churn, slow status queries despite reasonable storage metrics.

Third: validate if the issue is localized or systemic

Run doveadm mailbox status on one “hot” user and one “normal” user.
If only a few users are slow, treat them as special cases (archival, mailbox split, different format, different volume).
If everyone is slow, suspect storage, global indexers, backups, or a recent change in mount/options/kernel/firmware.

Fourth: choose the least risky corrective action

Rebuild indexes (safe and reversible).
Stop competing IO (backups, antivirus scans, aggressive log shipping).
Fix filesystem/hardware errors before tuning Dovecot.
Only then consider migration between formats.

Common mistakes: symptoms → root cause → fix

1) “IMAP login is slow, but disk utilization is low”

Symptoms: Users report delays opening folders; monitoring shows low %util on disks.

Root cause: High latency per operation (metadata IO, sync writes, directory lookups). Low utilization doesn’t mean low latency.

Fix: Measure await with iostat -x. If latency is high, reduce sync-heavy behaviors, move backups off the live FS, and consider mdbox if inode/directory overhead dominates.

2) “Backups are consistent because we use rsync nightly”

Symptoms: Restores produce weird mailboxes: missing recent messages, duplicates, or client resync storms.

Root cause: File-by-file copying captured maildir/mdbox mid-update; indexes and messages aren’t from the same point in time.

Fix: Snapshot-first backups. Restore from snapshot. Rebuild indexes post-restore using doveadm index or by removing stale index files carefully.

3) “We’ll just put everything in one massive INBOX”

Symptoms: One or two users are always slow; backup windows blow out; filesystem warnings appear.

Root cause: Huge folders amplify listing and metadata costs (maildir) or indexing overhead (any format).

Fix: Enforce archiving and foldering policies. Consider server-side Sieve rules. Split hot mailboxes across storage tiers.

4) “We migrated formats and didn’t plan the index transition”

Symptoms: After migration, clients see missing mail until they resync; server load spikes.

Root cause: Indexes weren’t rebuilt cleanly or were restored inconsistently; clients trigger heavy sync behavior.

Fix: Post-migration index rebuild, staged client reconnect, and controlled concurrency. Communicate expected resync behavior to helpdesk.

5) “We tuned fsync away because performance”

Symptoms: Performance improved… until a crash or power event; then users lose flag updates, recent deliveries, or see corrupted state.

Root cause: Unsafe durability settings. Mail storage is write-heavy metadata; losing a few seconds can create confusing inconsistencies.

Fix: Keep durability sane. If you want speed, buy better storage or redesign; don’t gamble with integrity unless you can tolerate the loss.

6) “Antivirus scans the entire mail store every hour”

Symptoms: Periodic latency spikes; lots of cache misses; IO spikes; users complain in waves.

Root cause: Maildir’s many files punish full-tree scans; even mdbox suffers if scans thrash caches.

Fix: Scan at ingestion (LMTP/SMTP pipeline) or use targeted scanning. Exclude indexes and transient directories from broad scans where appropriate.

7) “We assumed the filesystem doesn’t matter”

Symptoms: Same configuration behaves differently across hosts; upgrades “randomly” change performance.

Root cause: Directory indexing, allocator behavior, and journaling differ per filesystem and kernel version.

Fix: Standardize filesystem choice and mount options for mail volumes. Benchmark mailbox operations, not just sequential throughput.

Three corporate mini-stories (anonymized, plausible, and instructive)

Mini-story 1: The outage caused by a wrong assumption

They ran a mid-sized corporate mail platform. Nothing exotic: Dovecot, Postfix, a VM cluster, and a “temporary” storage volume that became permanent. A new team member asked what mailbox format they used. The answer was confident and wrong: “It’s Maildir, so it’s just files. Backups are easy.”

The backup job was a nightly file-walk copy. No snapshots. The job ran while deliveries were happening, and occasionally during peak mobile sync time. It “worked” in the sense that it produced a pile of files. It also captured transient states: messages in tmp, renames half-complete, index logs mid-update.

Months later, a storage failure forced a restore. The restore completed quickly—management applauded. Then the helpdesk queue turned into a denial-of-service attack. Users saw missing messages, duplicated threads, and folders that looked empty until they clicked around and waited. Some clients “fixed” it by re-downloading everything, which made the server work harder, which made the clients retry more.

The root cause wasn’t Maildir. It was the assumption that file-level backup equals consistent backup. They moved to snapshot-based backups and added a restore drill that included index rebuild and staged client reconnection. The next restore was boring. Everyone hated it less. That’s how you know it worked.

Mini-story 2: The optimization that backfired

A different company had serious inode pressure. They went mdbox to reduce file counts and cut backup scanning time. Good instinct. Then they decided to squeeze extra performance by tweaking durability: reducing sync behavior and pushing write caching harder. It looked great in benchmarks. Their graphs got prettier.

Then they had a power event in one rack. Not a disaster, just a few minutes of chaos. Systems rebooted. Most services recovered. Mail did not recover cleanly. Users could log in, but new mail appeared inconsistently, and flags behaved like suggestions. Some mail existed in box files but didn’t show up in IMAP views until indexes were repaired. Some repairs worked; others required restores of metadata from snapshots.

The post-incident review wasn’t fun. The “optimization” reduced the cost of each write by trading away the reliability properties they were implicitly relying on. With aggregated storage and metadata structures, small inconsistencies can cascade into weird user-visible states that are hard to explain and harder to support.

They rolled back the risky tuning, invested in proper power protection for storage, and standardized on a tested snapshot+replication approach. Performance was slightly worse than the benchmark fantasy. Availability was better than the outage reality. Choose reality.

Mini-story 3: The boring but correct practice that saved the day

A regulated enterprise ran Dovecot at scale. Their mail store was big enough that “restore everything” was not a plan; it was a resignation letter. They used Maildir for most users and mdbox for a subset of high-volume accounts. The key wasn’t the formats. It was discipline.

They practiced restores quarterly. Not “we tested backups.” Actual restores: pick a random mailbox, restore it to an isolated host, rebuild indexes, validate IMAP access, and verify message counts and recent deliveries. They also had a policy: snapshot every few minutes, keep short retention locally, replicate snapshots off-host, and test the restore path end-to-end.

When a storage controller started glitching, they saw it early in dmesg and SMART. Before it became data loss, they failed over the mail service to the replica and kept serving users. Then they restored a few affected mailboxes from the last known-good snapshot and validated them with Dovecot tools before reintroducing them.

No heroics. No mystery. Just the kind of boring operational hygiene that seems expensive right up until it’s the cheapest thing you ever did.

Checklists / step-by-step plan

Choosing a format: a decision checklist

How many messages per user? If many users exceed 200k messages in a folder, Maildir will punish your filesystem unless you manage foldering.
Are you inode-constrained? If df -ih trends above 70% and growing, treat it as a scaling problem, not a warning label.
Do you have snapshot-based backups? If no, fix that before changing formats. Otherwise you’ll just change the shape of your backup inconsistency.
Do you need easy partial recovery? Maildir tends to be friendlier for surgical recovery. mdbox can be fine, but you need practiced tooling and process.
What’s your filesystem? Benchmark mailbox operations on the actual filesystem and kernel you’ll run. Mail workloads are metadata-heavy and weird.
Do you have staff time for maintenance? If not, choose the path with the simplest day-2 operations: Maildir plus good snapshotting.

Migration plan: Maildir → mdbox (safe-ish sequence)

Inventory users and identify hot mailboxes (large folders, heavy churn).
Implement snapshot-based backups and perform a restore drill before migration.
Set up a staging server with the target Dovecot version and configuration.
Migrate a small pilot group first. Monitor IMAP latency, index rebuild time, and user-visible issues.
Schedule migrations off-peak. Throttle concurrency. Don’t migrate everyone at once unless you enjoy living dangerously.
After each batch: rebuild indexes, validate mailbox counts, and watch for client resync storms.
Keep rollback capability via snapshots and a clear cutover marker.

Operational baseline checklist (regardless of format)

Snapshot-based backups with tested restores.
Monitoring for inode usage, directory growth, and mail volume growth.
Monitoring for storage latency (await) and kernel storage errors.
Defined procedures for index rebuild and mailbox repair.
Client behavior controls where possible (aggressive sync patterns can DOS you).

FAQ

1) Is Maildir always safer than mdbox?

No. Maildir often has a smaller blast radius for single-file corruption, but it can fail spectacularly via inode exhaustion and metadata overhead. “Safer” depends on what you’re likely to screw up.

2) Is mdbox always faster?

No. mdbox can reduce small-file overhead, but if your bottleneck is index churn, sync settings, or slow storage latency, it won’t magically fix that. It can also add complexity during repairs and restores.

3) What’s the biggest reason Maildir systems become slow over time?

File count growth plus metadata operations. The system doesn’t slow down linearly; it slows down when directories become huge, backups start crawling, and inode usage approaches the cliff.

4) What’s the biggest reason mdbox systems become painful?

Operational dependence on Dovecot-managed metadata and the need for consistent backups. If you don’t snapshot, you can restore a mailbox that exists but doesn’t “make sense” to Dovecot until repaired.

5) Should I store mail on NFS?

Only if you understand your NFS server/client locking semantics, latency characteristics, and failure behavior under load. Mail storage is metadata-heavy and sensitive to latency spikes. Many “mysterious” mail issues are just network storage being itself.

6) Can I mix formats on the same system?

Yes, and sometimes it’s the pragmatic answer: keep most users on Maildir and move high-volume, inode-heavy accounts to mdbox. Just make sure your operational tooling covers both.

7) Do snapshots replace Dovecot replication?

No. Snapshots help you roll back and restore. Replication helps you stay available. They solve different problems and fail differently.

8) How do I know if it’s index corruption or real message loss?

Compare filesystem reality to IMAP view. If messages exist on disk (maildir files or mdbox storage) but don’t appear, rebuild indexes. If they don’t exist on disk, it’s loss and you need restores.

9) What’s the “minimum viable” maintenance for a healthy Dovecot storage layer?

Snapshot-based backups, periodic restore drills, monitoring inode usage and storage latency, and a runbook for index rebuild/repair. Everything else is optimization.

Next steps you can do this week

Run the basics: doveconf -n, df -ih, iostat -x, and doveadm mailbox status on a hot user. Write down what’s actually true.
Verify backup consistency: If you’re not snapshotting, treat your backups as “best effort copies,” not restores you can bet your job on.
Do one restore drill: Pick one mailbox, restore it to an isolated environment, rebuild indexes, validate IMAP. Time it. Document it.
Identify the growth cliff: Inodes, huge folders, and storage latency are the big three. Put alerts on them.
Decide with intent: If your pain is inode and metadata overhead, mdbox might be the move. If your pain is recoverability and operational simplicity, stick with Maildir and scale the filesystem and backup approach properly.

Pick the format that matches your failure budget and your restore reality. Your future self is going to be the one on-call. Don’t prank them.