Proxmox Ceph “mon down/out of quorum”: restoring monitors and quorum without panic

Was this helpful?

When Ceph monitors go “down” or “out of quorum” on a Proxmox cluster, the UI turns into a crime scene: red everywhere, VMs hesitate, and everyone suddenly remembers they “always meant to document the Ceph layout.”

Quorum loss is rarely mystical. It’s usually one of four boring failures wearing a scary mask: network reachability, time drift, disk/database damage, or a bad monmap. This guide is how to unmask it fast, fix it safely, and avoid making it worse.

How Ceph monitor quorum actually fails (mental model)

Ceph monitors (MONs) are not “just another daemon.” They’re the cluster’s source of truth: the monmap (who the monitors are), the osdmap (which OSDs exist and where), auth state (CephX), and a pile of consensus state that must agree across a majority.

Monitors form a quorum using a consensus algorithm (Ceph has used Paxos-like machinery historically and has evolved over releases). In plain terms: monitors must be able to talk to each other reliably and agree on the latest committed maps. If they can’t, they’d rather stop being authoritative than lie.

What “down” vs “out of quorum” usually means

  • mon down: that monitor process isn’t reachable or isn’t running (or is wedged so hard it might as well be off).
  • out of quorum: the process is up, but it’s not part of the majority that currently agrees on state. It might be partitioned, time-skewed, have an old monmap, or have a damaged store.
  • no quorum: the cluster is effectively blind. OSDs may keep serving existing IO for a bit, but state changes and health reporting become limited and risky. Treat this as “stop improvising.”

The golden rule

Never “fix quorum” by deleting things until you know which monitor has the freshest truth. If you wipe the wrong monitor, you can turn a recoverable outage into a reconstruction project.

Here’s the operational mindset I want you to adopt: monitors are the control plane; OSDs are the data plane. You can survive a bruised data plane with redundancy. You cannot survive a control plane that is both wrong and confident.

One quote worth keeping on the wall: Hope is not a strategy. — General Gordon R. Sullivan. In ops, that’s not motivation; it’s a reminder to verify before you change.

Joke #1: A Ceph monitor out of quorum is like a manager out of the meeting—still sending emails, but nobody’s taking them seriously.

Fast diagnosis playbook (first/second/third)

This is the “stop the bleeding” sequence. Don’t start by rebuilding monitors. Start by proving the failure mode.

First: Is there any quorum at all?

If at least one monitor is in quorum, you have an anchor. If none are, your job is to identify the most authoritative surviving monitor store and bring it back, not to create a new reality.

Second: Is this a network or time problem masquerading as Ceph?

Network partitions and time drift are the top two “everything looks broken” causes. Check them before you touch mon data.

Third: Is the monitor store healthy (disk space, filesystem, corruption)?

Monitors are sensitive to disk-full events and filesystem trouble. A monitor can be “running” but effectively dead because it can’t commit.

Fourth: Is the monmap/FSID wrong or stale on a node?

A common Proxmox/Ceph footgun: a node has leftover config from an old cluster or got reinstalled, and now it’s speaking confidently with the wrong identity.

Fifth: Only now consider rebuild/recreate

Rebuilding a monitor is fine when you have quorum and you’re replacing a member. It’s dangerous when you don’t have quorum and you’re guessing.

Interesting facts and context (why monitors are touchy)

  1. Ceph’s name comes from a mythological sea monster (Cephalopod). It’s a cute name for a system that will absolutely bite you for sloppy ops.
  2. Ceph monitors used a Paxos-based consensus mechanism for years; the design goal is strong consistency for maps, not “always up at any cost.”
  3. The “odd number of MONs” rule isn’t superstition. Majority consensus means 3 MONs tolerate 1 failure; 4 MONs still tolerate 1 failure; 5 tolerate 2.
  4. Time drift is a first-class failure in distributed systems. Monitors can refuse participation if clock skew breaks lease/timeout assumptions.
  5. CephX auth state is stored and served by monitors; a monitor with broken keyrings can look like “random permission errors” across the cluster.
  6. Monitors store critical maps on local disk; they are not “stateless.” Treat a monitor store like a tiny database with consensus replication.
  7. Proxmox integration makes deployment easy, but it also makes it easy to forget Ceph’s own lifecycle tools exist (and matter).
  8. Historically, “mon store corruption” has often been downstream of disk full, power loss, or filesystem bugs—rarely “Ceph just decided to forget.”

Practical tasks: commands, expected output, and decisions (12+)

These are real tasks I run during an incident. Each has: command, what the output means, and what decision you take next. Run them on a monitor node unless stated otherwise.

Task 1: Check current quorum view (from any node with ceph.conf + key)

cr0x@server:~$ ceph -s
  cluster:
    id:     0b3d3e6f-6c1a-4b2c-9c11-0c1e9d9e8a2f
    health: HEALTH_WARN
            1/3 mons down, quorum mon1,mon2
  services:
    mon: 3 daemons, quorum mon1,mon2 (age 5m), out of quorum: mon3
    mgr: mon1(active), standbys: mon2
    osd: 12 osds: 12 up, 12 in
  data:
    pools:   4 pools, 256 pgs
    objects: 1.2M objects, 4.6 TiB
    usage:   13 TiB used, 40 TiB / 53 TiB avail
    pgs:     256 active+clean

Meaning: You have quorum (mon1, mon2). mon3 is the patient, not the surgeon.

Decision: Repair/rejoin mon3 using the healthy quorum. Avoid “force” operations.

Task 2: If ceph -s hangs, test direct MON connectivity

cr0x@server:~$ timeout 5 ceph -s; echo $?
124

Meaning: Exit code 124 = timeout. This is often “no quorum” or client can’t reach monitors.

Decision: Move to network/time checks and local mon logs. Don’t start changing Ceph maps from a host that can’t even talk to MONs.

Task 3: Identify the monitor endpoints from config

cr0x@server:~$ grep -E '^(fsid|mon_host|public_network|cluster_network)' /etc/pve/ceph.conf
fsid = 0b3d3e6f-6c1a-4b2c-9c11-0c1e9d9e8a2f
mon_host = 10.10.10.11 10.10.10.12 10.10.10.13
public_network = 10.10.10.0/24
cluster_network = 10.10.20.0/24

Meaning: You have the intended IPs. If a monitor’s IP changed, it will “exist” but never be found.

Decision: Confirm those IPs are correct on each node. If not, fix networking/DNS and only then touch Ceph.

Task 4: Check MON systemd state (is it actually running?)

cr0x@server:~$ systemctl status ceph-mon@mon3 --no-pager
● ceph-mon@mon3.service - Ceph monitor daemon
     Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Fri 2025-12-26 09:11:32 UTC; 2min 12s ago
    Process: 18231 ExecStart=/usr/bin/ceph-mon -f --cluster ceph --id mon3 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)

Meaning: It’s not a quorum mystery; it’s a service failure on mon3.

Decision: Read logs, check disk space, check mon store, then restart.

Task 5: Read recent MON logs for the real reason

cr0x@server:~$ journalctl -u ceph-mon@mon3 -n 80 --no-pager
Dec 26 09:11:31 mon3 ceph-mon[18231]: mon.ceph-mon@mon3: error opening mon data directory at '/var/lib/ceph/mon/ceph-mon3': No space left on device
Dec 26 09:11:31 mon3 systemd[1]: ceph-mon@mon3.service: Main process exited, code=exited, status=1/FAILURE
Dec 26 09:11:32 mon3 systemd[1]: ceph-mon@mon3.service: Failed with result 'exit-code'.

Meaning: Disk full. Classic. Predictable. Fixable.

Decision: Free space safely, then run a filesystem sanity check if warranted, then restart the MON.

Task 6: Verify filesystem capacity and inode pressure

cr0x@server:~$ df -h /var/lib/ceph/mon/ceph-mon3
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        50G   50G     0 100% /
cr0x@server:~$ df -i /var/lib/ceph/mon/ceph-mon3
Filesystem       Inodes  IUsed   IFree IUse% Mounted on
/dev/sda2       3276800 220000 3056800    7% /

Meaning: Capacity is the problem, not inodes.

Decision: Remove the right junk (logs, old kernels, crash dumps). Don’t delete monitor store files in anger.

Task 7: Check time sync (quorum hates time travelers)

cr0x@server:~$ timedatectl
               Local time: Fri 2025-12-26 09:14:10 UTC
           Universal time: Fri 2025-12-26 09:14:10 UTC
                 RTC time: Fri 2025-12-26 09:14:09
                Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no

Meaning: This node is synced.

Decision: Run this on each MON. If any show “synchronized: no” or large drift, fix NTP/chrony before retrying quorum.

Task 8: Verify MON ports are reachable between monitors

cr0x@server:~$ nc -vz 10.10.10.11 3300
Connection to 10.10.10.11 3300 port [tcp/*] succeeded!
cr0x@server:~$ nc -vz 10.10.10.12 6789
Connection to 10.10.10.12 6789 port [tcp/*] succeeded!

Meaning: Modern Ceph uses 3300 (msgr2); older compatibility uses 6789 (msgr1). If these fail, quorum isn’t coming back until networking/firewalls are fixed.

Decision: If blocked: fix firewall rules, VLANs, MTU, routing. If open: proceed to mon store checks.

Task 9: Confirm the MON identity directory exists and permissions are sane

cr0x@server:~$ ls -ld /var/lib/ceph/mon/ceph-mon3
drwxr-x--- 8 ceph ceph 4096 Dec 26 09:02 /var/lib/ceph/mon/ceph-mon3

Meaning: Directory exists and owned by ceph.

Decision: If ownership is wrong (common after manual restores), fix it before restart: monitor won’t self-heal permission errors.

Task 10: See who the cluster thinks the monitors are (mon dump)

cr0x@server:~$ ceph mon dump
epoch 21
fsid 0b3d3e6f-6c1a-4b2c-9c11-0c1e9d9e8a2f
last_changed 2025-12-26T08:58:07.123456+0000
created 2025-11-02T12:12:01.000000+0000
min_mon_release 18 (reef)
election_strategy: 1
0: [v2:10.10.10.11:3300/0,v1:10.10.10.11:6789/0] mon.mon1
1: [v2:10.10.10.12:3300/0,v1:10.10.10.12:6789/0] mon.mon2
2: [v2:10.10.10.13:3300/0,v1:10.10.10.13:6789/0] mon.mon3

Meaning: mon3 is still in the monmap. Good. You want it to rejoin, not to be re-added from scratch unless necessary.

Decision: If an IP/hostname is wrong here, you’ll need to fix addressing (or, in last resort, remove/re-add the monitor properly).

Task 11: Check monitor store health hints (fsck is for filesystems; this is for MONs)

cr0x@server:~$ ceph-mon -i mon3 --cluster ceph --mon-data /var/lib/ceph/mon/ceph-mon3 --check
ceph-mon: mon data is consistent

Meaning: The monitor store looks consistent enough to start.

Decision: Restart the service. If the check reports corruption, stop and choose a rebuild path rather than repeatedly restarting.

Task 12: Restart the MON and watch it join quorum

cr0x@server:~$ systemctl restart ceph-mon@mon3
cr0x@server:~$ ceph -s
  cluster:
    id:     0b3d3e6f-6c1a-4b2c-9c11-0c1e9d9e8a2f
    health: HEALTH_OK
  services:
    mon: 3 daemons, quorum mon1,mon2,mon3 (age 10s)
    mgr: mon1(active), standbys: mon2
    osd: 12 osds: 12 up, 12 in

Meaning: Back in quorum. You’re done with the urgent part.

Decision: Now do the follow-up: why did disk fill? Add alerting and housekeeping so this doesn’t recur on a Friday night.

Task 13: If no quorum, inspect local mon’s view without contacting the cluster

cr0x@server:~$ ceph-mon -i mon1 --cluster ceph --show_monmap | head
dumped monmap epoch 21
fsid 0b3d3e6f-6c1a-4b2c-9c11-0c1e9d9e8a2f
last_changed 2025-12-26T08:58:07.123456+0000
created 2025-11-02T12:12:01.000000+0000

Meaning: You can read the monmap from local store. That helps prove which store is intact.

Decision: Compare across nodes; the freshest epoch and consistent FSID are your clues for “which monitor store to trust.”

Task 14: Confirm FSID matches across Proxmox ceph.conf and mon store

cr0x@server:~$ ceph fsid
0b3d3e6f-6c1a-4b2c-9c11-0c1e9d9e8a2f

Meaning: If this returns a different FSID than /etc/pve/ceph.conf, you’re mixing clusters or configs.

Decision: Stop and reconcile config drift before “fixing” anything else.

Task 15: Validate that the monitor is listening on expected addresses

cr0x@server:~$ ss -lntp | egrep ':(3300|6789)\s'
LISTEN 0      4096         10.10.10.13:3300       0.0.0.0:*    users:(("ceph-mon",pid=19012,fd=20))
LISTEN 0      4096         10.10.10.13:6789       0.0.0.0:*    users:(("ceph-mon",pid=19012,fd=21))

Meaning: It’s bound correctly. If it’s listening only on 127.0.0.1 or the wrong interface, you’ve found your culprit.

Decision: Fix host networking and/or Ceph public_network settings; restart.

Task 16: Check for firewall policy blocking intra-cluster traffic

cr0x@server:~$ pve-firewall status
Status: enabled/running
cr0x@server:~$ iptables -S | head
-P INPUT DROP
-P FORWARD DROP
-P OUTPUT ACCEPT
-N PVEFW-Drop
-N PVEFW-Input

Meaning: You have a default DROP posture. That’s fine, but only if Ceph ports are allowed between MONs.

Decision: If this changed recently, inspect rulesets and ensure TCP 3300 and (if needed) 6789 are permitted on the Ceph public network.

Task 17: Proxmox Ceph tooling sanity check (don’t fight your platform)

cr0x@server:~$ pveceph status
Cluster FSID: 0b3d3e6f-6c1a-4b2c-9c11-0c1e9d9e8a2f
MONs: 3, quorum: mon1, mon2, mon3
OSDs: 12 up/12 in
MGR: mon1 (active)

Meaning: Confirms Proxmox’s view matches Ceph’s view.

Decision: If these disagree, suspect config drift between /etc/pve, local /etc/ceph, and what your CLI is actually using.

Recovery paths: from “one mon down” to “no quorum”

Scenario A: One monitor down, quorum still exists (best case)

This is the “normal” failure. You have enough monitors to maintain majority. Your goal is to return the missing monitor to service or replace it cleanly.

  • Fix the local cause (disk full, service crashed, node rebooted into wrong kernel, NIC flapped).
  • Validate time sync and network ports.
  • Start the monitor and watch it rejoin.

Be strict: if a monitor store is corrupt, do not keep restarting it hoping it “eventually works.” That’s how you get flapping quorum and chronic instability.

Scenario B: Two monitors down in a 3-mon cluster (no quorum)

Now you’re in “restore majority” mode. With 3 MONs, you need 2. The safest recovery usually looks like this:

  1. Pick the monitor node most likely intact (powered, stable disk, no recent reinstall).
  2. Get that monitor to start cleanly and validate its store.
  3. Bring up one additional monitor to reach quorum.
  4. Once quorum exists, rebuild any remaining bad monitor from the healthy quorum.

If all you do is bounce services randomly, you can end up with each monitor believing in a different world. Consensus systems do not reward interpretive dance.

Scenario C: All monitors “up” but none in quorum (quorum flapping or split brain)

This is usually network partition or time skew. Sometimes it’s an MTU mismatch on the Ceph public network: small probes succeed, large messages fail, and everything looks “mostly reachable.”

Prioritize:

  • Clock sync check across all MONs.
  • Bidirectional connectivity on 3300/6789 between all MONs.
  • Interface/IP correctness (no duplicate IPs, no stale ARP entries, no VRRP failover surprises).
  • Firewall change review (Proxmox firewall or upstream).

Scenario D: Monitor store corruption (the unpleasant one)

If you have quorum, rebuilding a single monitor is routine: stop it, move aside its store, recreate from quorum, start it. If you do not have quorum, corruption recovery is about finding the least-bad surviving store and reconstituting quorum from it.

In practice, you will do one of the following:

  • Rebuild one monitor from an existing quorum (safe).
  • Recreate a new monitor and join it to quorum (safe if quorum exists).
  • Recover quorum by starting the best surviving monitor store and adding others (risky but sometimes necessary).

Joke #2: A corrupt monitor store is like a corrupted spreadsheet—someone will suggest “just retype it,” and that someone should not have production access.

Checklists / step-by-step plan (do this, not vibes)

Checklist 1: When you still have quorum (repair a single MON)

  1. Confirm quorum exists with ceph -s. If it hangs, switch to local diagnostics.
  2. On the failed MON node: systemctl status ceph-mon@<id> and journalctl -u ....
  3. Fix root cause: disk full, permissions, NIC down, time unsynced, firewall rule.
  4. Run store check: ceph-mon -i <id> --check (where supported) to avoid flapping.
  5. Restart MON and confirm it’s listening (ss -lntp).
  6. Confirm rejoin in ceph -s and ceph quorum_status.
  7. Do follow-up work: monitor disk usage trends, add alerting, prevent recurrence.

Checklist 2: When you have no quorum (restore majority safely)

  1. Stop changing things cluster-wide. Your goal is to find one or two good monitor stores.
  2. Pick a candidate “best monitor” node (most stable storage, least touched, likely still has intact mon store).
  3. On each MON node: check disk space (df -h), time sync (timedatectl), and recent errors (journalctl).
  4. Confirm FSID consistency between /etc/pve/ceph.conf and local store view (ceph-mon --show_monmap output).
  5. Fix network partitions first (ports 3300/6789). If monitors can’t talk, no amount of rebuild will help.
  6. Start one monitor cleanly. Watch logs. You’re looking for “forming quorum” behavior, not just “active (running).”
  7. Bring up a second monitor to reach majority. Two MONs out of three is quorum.
  8. Once quorum exists: rebuild any remaining monitor properly from quorum rather than trying to resurrect corrupted state.

Checklist 3: Rebuilding a monitor when quorum exists (controlled replacement)

I’m intentionally not giving you a one-command “nuke and pave” here. Rebuilds are safe when you do them deliberately and verify at each step.

  1. Verify quorum and health: ceph -s, ceph mon dump.
  2. Stop the target MON: systemctl stop ceph-mon@monX.
  3. Move aside the old store: rename the mon directory instead of deleting it. That gives you a rollback lever.
  4. Recreate the monitor store from the cluster’s current maps using the appropriate tooling for your Ceph/Proxmox version.
  5. Start the MON and verify join via ceph -s and ceph quorum_status.

If your organization expects runbooks, write this down with your actual monitor IDs and paths. Nothing ages worse than “monX” during a real outage.

Common mistakes: symptom → root cause → fix

1) Symptom: “mon down” after a routine reboot

Root cause: Monitor data directory missing/moved, permissions changed, or the node booted with a different hostname/ID mapping than expected.

Fix: Validate /var/lib/ceph/mon/ceph-<id> exists and is owned by ceph:ceph. Confirm the monitor ID matches the service unit name. Restart and verify ports listening.

2) Symptom: “out of quorum” but service is active

Root cause: Network partition, firewall blocking 3300/6789, MTU mismatch, or asymmetric routing between MONs.

Fix: Test connectivity MON-to-MON using nc -vz on 3300 and 6789. Check switch/VLAN/MTU, then firewall rules. Don’t rebuild monitors for a network problem.

3) Symptom: ceph commands hang or time out from some nodes only

Root cause: Client nodes pointing at stale monitor addresses in config, or DNS returns different IPs. Sometimes it’s a dual-stack mess (IPv6 half-configured).

Fix: Confirm mon_host is correct in the config the node actually uses. Standardize on explicit IPs if your DNS isn’t disciplined.

4) Symptom: monitors flap in and out of quorum every few minutes

Root cause: Time drift, overloaded host, intermittent packet loss, or disk latency causing slow commits.

Fix: Check timedatectl across MONs. Inspect dmesg for storage errors. Look for CPU steal or IO wait. Fix the platform, not the symptom.

5) Symptom: after “fixing” a monitor, Ceph shows the wrong FSID or weird auth errors

Root cause: Mixed clusters: stale /etc/ceph/ceph.conf, wrong keyring, or a node reinstalled and auto-generated a new cluster config in a different path.

Fix: Verify FSID from /etc/pve/ceph.conf and ceph fsid match. Ensure your CLI uses the intended config and keyring. Remove stale configs that shadow Proxmox-managed ones.

6) Symptom: monitor won’t start, log says “No space left on device”

Root cause: Root filesystem full (often logs, crash dumps, or backups stored locally).

Fix: Free space safely. Then restart. If the monitor store was impacted, run a consistency check if available and watch for repeated errors.

7) Symptom: MON starts but immediately exits with store-related errors

Root cause: Monitor database corruption, sometimes after power loss or disk errors.

Fix: If quorum exists, rebuild that monitor from quorum and keep the old store as evidence. If no quorum, identify the most intact store among monitors and restore quorum from it.

Three corporate mini-stories from the monitor trenches

Mini-story 1: The incident caused by a wrong assumption

They had a tidy three-node Proxmox cluster: three monitors, a bunch of OSDs, and a belief that “monitors are lightweight, so we can put them wherever.” One monitor lived on a node that also ran a loud CI runner and a metrics stack that wrote like it was paid by the inode.

A kernel update required a reboot window. The reboot went fine. Then, ten minutes later, Ceph health went sideways: one monitor down, another “slow ops,” and the third out of quorum. The on-call assumed it was a Ceph bug because “nothing changed except a reboot.” That assumption wasted the first hour.

Reality: the reboot restarted the CI runner, which immediately started saturating disk on that node. The monitor’s commit latency spiked, elections started timing out, and quorum became a revolving door. Nothing was “wrong” with Ceph. The platform was violating the monitor’s need for boring, predictable IO.

The fix wasn’t exotic. They moved the CI runner off the monitor host, pinned monitor IO to faster storage, and set alerting on root filesystem usage and disk latency. The postmortem’s key lesson was embarrassingly simple: monitors are “lightweight” only if you don’t bully their disks.

Mini-story 2: The optimization that backfired

A different shop wanted to tighten security. They enabled a strict firewall posture on Proxmox and congratulated themselves for “locking down the cluster.” It worked in the sense that nothing obvious broke immediately. The monitors stayed in quorum for weeks.

Then a network maintenance event caused a monitor to reboot. When it came back, it couldn’t rejoin quorum. The service was running, logs looked vaguely networky, and Ceph health screamed about a monitor being out. The team tried the standard playbook: restart, reinitialize, even moved the monitor’s data directory around. Still out.

The backfiring optimization was subtle: the firewall rules allowed established connections but were missing explicit allows for Ceph’s monitor ports on the Ceph network. In steady state, things worked because sessions were already open. After a reboot, the monitor had to establish new connections. It couldn’t. Quorum didn’t fail instantly; it failed when the “optimization” met a real-world state change.

The eventual fix was straightforward: explicit, audited firewall rules permitting MON-to-MON traffic on the correct interfaces, plus a change review item: “If you touch firewalling, validate Ceph ports between monitors.” They kept the security posture, but they stopped relying on lucky connection persistence.

Mini-story 3: The boring but correct practice that saved the day

This team did something deeply unsexy: they kept a small “cluster facts” file in their internal repo. It listed the Ceph FSID, monitor IDs, monitor IPs, the intended public/cluster networks, and the location of monitor data directories. It also listed what “normal” looks like for ceph -s and which host is the usual active manager.

When a monitor host suffered a storage failure and reboot-looped, the on-call didn’t have to guess which monitors existed or which IPs they should have. They verified quorum, confirmed the dead monitor’s identity, and immediately knew they still had majority. No drama. They treated it like replacing a failed RAID member: isolate, replace, reintroduce.

The second-order win: during the replacement, they avoided the most common human error—accidentally reusing an old ceph.conf from a different environment. Their “cluster facts” file included the FSID. They checked it before starting. It matched. They proceeded.

It wasn’t heroic. It was dull. And it saved them from the kind of outage where everyone learns the Ceph CLI by reading log files like tea leaves.

FAQ

1) What’s the minimum number of monitors I should run?

Three. Always three for real clusters. One is a lab. Two is a bad compromise: you can’t tolerate a failure without losing quorum.

2) Why does Ceph insist on an odd number of monitors?

Because majority consensus. With 3, you can lose 1. With 5, you can lose 2. With 4, you still can only lose 1, so you paid for extra complexity without more fault tolerance.

3) Can I run monitors on the same nodes as OSDs?

Yes, commonly. But keep monitor storage predictable: avoid filling root disks, avoid noisy neighbors, and don’t put MON stores on flaky consumer SSDs and then act surprised.

4) “ceph -s” hangs. Does that automatically mean no quorum?

Not automatically, but it’s a strong hint. It can also mean your node can’t reach any monitors due to firewall/DNS/routing. Test TCP 3300/6789 to each monitor IP from the node where it hangs.

5) Is it safe to delete a monitor data directory to “reset” it?

Only when you have quorum and you’re deliberately rebuilding that specific monitor. Even then, move it aside first. Deleting the wrong store during a no-quorum event is how you manufacture a longer outage.

6) How do I know whether it’s time drift versus network issues?

Time drift produces a special kind of chaos: intermittent elections, “out of quorum” with otherwise good connectivity, and logs complaining about timeouts in ways that feel random. Check timedatectl across MONs and fix NTP/chrony first. Network issues show up cleanly with failed port checks and consistent reachability problems.

7) Does Proxmox change how Ceph monitor recovery works?

The fundamentals are the same. Proxmox gives you convenience tooling and stores Ceph config in /etc/pve, which is great until you accidentally use a stale local /etc/ceph config. Your recovery still depends on quorum, network, time, and correct monmap state.

8) What should I alert on to avoid monitor quorum incidents?

Disk usage on monitor hosts (especially root), time sync state, packet loss/latency between monitors, and service health of ceph-mon@*. Also alert on “mon down” and “quorum changed” events; they’re early smoke.

9) If I lost two monitors in a 3-mon cluster, can I “force” quorum with one?

You can sometimes coerce a cluster in emergency modes depending on version and tooling, but it’s dangerous and context-dependent. Operationally: restore a second monitor instead. Majority consensus exists to stop you from committing to the wrong truth.

10) Why does disk-full break monitors so hard?

Because monitors must persist state updates to be authoritative. If they can’t write, they can’t safely participate. That’s not Ceph being fragile; that’s Ceph refusing to lie.

Conclusion and next steps

“mon down/out of quorum” feels like a Ceph-specific disaster, but most recoveries are platform work: fix disk pressure, fix time, fix network reachability, then let monitors do their job. If quorum exists, you repair or rebuild a monitor from the known-good majority. If quorum doesn’t exist, you stop improvising and focus on restoring a majority using the most intact monitor stores.

Next steps that pay rent:

  • Put monitor hosts on boring, reliable storage and keep root filesystems from filling up.
  • Standardize and audit firewall rules for MON ports across the Ceph network.
  • Make time sync non-negotiable: alert when any monitor isn’t synchronized.
  • Write down your cluster facts (FSID, monitor IDs, IPs, networks). It’s not glamorous. It’s effective.
← Previous
IPsec NAT-T: why VPN won’t come up behind NAT and how to fix it
Next →
Ubuntu 24.04: Apache vs Nginx confusion — fix port binding and proxy loops cleanly

Leave a comment