You run umount /mnt/data and Linux answers with the timeless shrug: target is busy. Now you’re stuck between doing the right thing (finding the holder) and doing the fast thing (rebooting and pretending it was “planned maintenance”).
This is a field guide for Debian 13: how to identify the exact process, file, directory, namespace, or kernel consumer keeping a mount pinned, using lsof, fuser, and a few /proc tricks that work when the “friendly” tools lie.
What “device busy” actually means (and what it doesn’t)
When umount fails with “device busy,” the kernel is refusing to detach a mount because something still references it. That “something” could be:
- A process with an open file descriptor under the mount.
- A process with its current working directory (
cwd) somewhere in that tree. - A process with its root directory (
chrootor pivot root) inside it. - A bind mount, overlay, or mount propagation keeping the subtree pinned.
- A kernel consumer (swapfile, loop device, dm-crypt target, NFS client state).
- A different mount namespace (container, systemd service) still using it.
What it usually is not: “the device is slow” or “the filesystem is corrupted.” Those can cause hangs, I/O errors, or timeouts, but “busy” is about references, not health.
There are two different pain modes:
- Immediate “busy”: the kernel knows there are active references and refuses clean unmount.
- Unmount hangs: you ran
umountand it blocks. That’s often network filesystems (NFS) or stuck I/O, not just an open file.
Fixing this reliably means you stop guessing and start asking the kernel: “Who holds this mount?” The fastest path is still lsof and fuser, but you need to know their blind spots and how to go around them.
Fast diagnosis playbook (first/second/third checks)
When you’re on-call, you don’t have time for interpretive dance with umount. Do this.
1) First check: is anything holding files or cwd under that mount?
- Run
findmntto confirm you’re targeting the right mount. - Run
fuser -vmfor a fast list of PIDs and access types. - Run
lsof +f -- /mountpointwhen you need filenames and users.
2) Second check: is this actually a mount stack / bind / propagation issue?
- Look for submounts:
findmnt -R /mountpoint. - Look for bind mounts:
findmnt -o TARGET,SOURCE,FSTYPE,OPTIONSand checkbind,rbind,shared. - If you unmount the parent while a child is mounted, the parent will be “busy” for a good reason.
3) Third check: is the holder outside your namespace (containers/systemd) or in-kernel?
- Check for container namespaces: look at
/proc/<pid>/mountinfoandlsns. - Check swapfiles and loop devices:
swapon --show,losetup -a. - Check dm/lvm stacking:
lsblk -o NAME,TYPE,MOUNTPOINTS,PKNAME,dmsetup ls --tree.
If you only memorize one thing: don’t jump to umount -l until you’ve identified the holder. Lazy unmount is a tool, not a lifestyle choice.
The core workflow: lsof → fuser → /proc → namespaces
Here’s the workflow I use on Debian systems when “device busy” hits. It scales from “developer left a shell there” to “container runtime pinned a bind mount in another namespace.”
Start with the mount truth: findmnt
mount output is fine, but findmnt is more structured and less misleading. You want: target, source, fstype, and options, and you want to see submounts.
Then ask for holders: fuser and lsof
fuser is blunt and fast: “these PIDs touch this mount.” It can show whether the process has cwd there, is executing a binary from there, or simply has open files.
lsof is slower but richer: it shows file paths, fd numbers, and whether the process is holding a deleted file (a classic).
When tools don’t agree: /proc is the source of truth
Sometimes lsof misses things due to permission issues, namespaces, or an absurd number of open files. Then you go to /proc and read the process links directly (cwd, root, fd), or the kernel mount table (/proc/self/mountinfo).
Finally: namespace awareness
If the mount is held inside a container or another mount namespace, you can stare at host-level lsof and still “see nothing.” Use lsns and nsenter to look from the holder’s perspective.
One dry truth: Linux is perfectly happy to let two different worlds think different mounts exist. That’s not a bug. That’s Tuesday.
Practical tasks (commands, output meaning, decisions)
These are real tasks I run in production, in roughly the order I reach for them. Each includes: the command, what the output means, and the decision you make next.
Task 1: Confirm the exact mount and its source
cr0x@server:~$ findmnt /mnt/data
TARGET SOURCE FSTYPE OPTIONS
/mnt/data /dev/sdb1 ext4 rw,relatime
What it means: You’re unmounting /mnt/data, backed by /dev/sdb1. If SOURCE is something like server:/export (NFS) or /dev/mapper/vg-lv (LVM), you already know which special cases might apply.
Decision: If this isn’t the mount you intended (common with trailing slashes or nested mounts), stop and re-target. Otherwise continue.
Task 2: Check for submounts that make the parent “busy”
cr0x@server:~$ findmnt -R /mnt/data
TARGET SOURCE FSTYPE OPTIONS
/mnt/data /dev/sdb1 ext4 rw,relatime
/mnt/data/cache tmpfs tmpfs rw,nosuid,nodev
/mnt/data/containers overlay overlay rw,relatime,lowerdir=...
What it means: You can’t unmount /mnt/data while /mnt/data/cache or /mnt/data/containers is still mounted. Parent mounts don’t magically evict children.
Decision: Unmount children first (deepest path first), or use umount -R with intent and care.
Task 3: Quick “who touches this mount” with fuser
cr0x@server:~$ sudo fuser -vm /mnt/data
USER PID ACCESS COMMAND
/mnt/data: root 1321 ..c.. bash
www-data 1874 ..c.. nginx
postgres 2440 ..c.. postgres
What it means: These processes have something under that mount as cwd or an open file (ACCESS flags vary; c commonly indicates current directory). That’s usually enough to explain “busy.”
Decision: Decide whether to stop the service cleanly, move the shell, or kill only the specific holder PID(s).
Task 4: Get filenames and file descriptors with lsof
cr0x@server:~$ sudo lsof +f -- /mnt/data | head -n 8
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nginx 1874 www-data 10u REG 8,17 4096 212 /mnt/data/www/access.log
postgres 2440 postgres 12u REG 8,17 1048576 877 /mnt/data/pg_wal/0000000100000000000000A1
What it means: Now you have the concrete paths. This is the difference between “something is using it” and “nginx is still logging there.”
Decision: If it’s a service, stop it. If it’s a one-off process, kill it. If it’s a log file, rotate/reload properly.
Task 5: Catch the classic “shell sitting in the directory”
cr0x@server:~$ sudo ls -l /proc/1321/cwd
lrwxrwxrwx 1 root root 0 Dec 29 11:05 /proc/1321/cwd -> /mnt/data
What it means: PID 1321’s working directory is inside the mount. That alone pins the mount.
Decision: If it’s your shell, cd /. If it’s someone else’s session, coordinate or terminate it.
Task 6: Find processes whose root is inside the mount (chroot/pivot_root)
cr0x@server:~$ sudo ls -l /proc/2890/root
lrwxrwxrwx 1 root root 0 Dec 29 11:06 /proc/2890/root -> /mnt/data/chroots/buildenv
What it means: That process is rooted inside the mount. This is common with build systems, recovery tools, and some container setups.
Decision: Stop that process cleanly if possible. Killing it may leave children; be ready to chase process groups.
Task 7: When lsof is slow or incomplete, scan /proc for cwd/root offenders
cr0x@server:~$ sudo bash -lc 'for p in /proc/[0-9]*; do
pid=${p#/proc/}
cwd=$(readlink -f $p/cwd 2>/dev/null || true)
root=$(readlink -f $p/root 2>/dev/null || true)
if [[ "$cwd" == /mnt/data* || "$root" == /mnt/data* ]]; then
printf "%s cwd=%s root=%s\n" "$pid" "$cwd" "$root"
fi
done | head'
1321 cwd=/mnt/data root=/
2890 cwd=/ root=/mnt/data/chroots/buildenv
What it means: You’ve quickly enumerated the obvious “pins” without relying on lsof’s file table walk.
Decision: Investigate each PID with ps and decide whether to stop/kill or relocate.
Task 8: Confirm what the process actually is (and whether it’s expected)
cr0x@server:~$ ps -p 1874 -o pid,user,comm,args
PID USER COMMAND COMMAND
1874 www-data nginx nginx: worker process
What it means: No mystery: it’s nginx. “Busy” is legitimate.
Decision: Stop/reload nginx, or repoint logs/content off the mount before unmounting.
Task 9: Check if systemd created or manages the mount (and can re-mount it)
cr0x@server:~$ systemctl status mnt-data.mount
● mnt-data.mount - /mnt/data
Loaded: loaded (/etc/fstab; generated)
Active: active (mounted) since Mon 2025-12-29 10:02:10 UTC; 1h 4min ago
Where: /mnt/data
What: /dev/sdb1
What it means: systemd knows this mount. If it’s an automount pair, it may be re-mounted as soon as something touches the path.
Decision: If you’re trying to keep it unmounted, stop/disable the relevant .automount or remove the fstab entry before proceeding.
Task 10: Detect systemd automount that re-triggers “busy” immediately
cr0x@server:~$ systemctl status mnt-data.automount
● mnt-data.automount - Automount /mnt/data
Loaded: loaded (/etc/fstab; generated)
Active: active (waiting) since Mon 2025-12-29 10:01:59 UTC; 1h 4min ago
Where: /mnt/data
What it means: Even after you unmount, the next access to /mnt/data may re-mount it. Also, some “probe” processes (indexers, monitoring, shell completion) can keep poking it.
Decision: For maintenance windows, explicitly stop the automount unit: systemctl stop mnt-data.automount, then unmount.
Task 11: Confirm whether a mount is shared/slave (propagation can surprise you)
cr0x@server:~$ findmnt -o TARGET,PROPAGATION /mnt/data
TARGET PROPAGATION
/mnt/data shared
What it means: Mount propagation is in play. A “shared” mount can propagate submounts across namespaces. This is common with container runtimes.
Decision: If you’re diagnosing a container host, assume another namespace may hold it. Move to namespace inspection.
Task 12: Identify mount namespaces and find the likely holder
cr0x@server:~$ sudo lsns -t mnt | head
NS TYPE NPROCS PID USER COMMAND
4026531840 mnt 168 1 root /sbin/init
4026532501 mnt 15 6123 root /usr/bin/containerd
4026532710 mnt 3 9012 root /usr/sbin/cron
What it means: You have multiple mount namespaces. If a mount is busy in one namespace, it may not show up how you expect in another.
Decision: Check the likely namespace owners (containerd, dockerd, podman, systemd services) and inspect from within that namespace.
Task 13: Enter the mount namespace of a suspected PID and run findmnt there
cr0x@server:~$ sudo nsenter -t 6123 -m -- findmnt /mnt/data
TARGET SOURCE FSTYPE OPTIONS
/mnt/data /dev/sdb1 ext4 rw,relatime
What it means: The mount exists inside containerd’s namespace too. There may be a container bind-mounting it, keeping it alive.
Decision: Use nsenter to run fuser/lsof inside that namespace, or stop the responsible container/service.
Task 14: When a deleted-but-open file pins space (and sometimes pins expectations)
cr0x@server:~$ sudo lsof +L1 -- /mnt/data | head
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
java 7777 app 25u REG 8,17 5242880 0 901 /mnt/data/app/logs/app.log (deleted)
What it means: The file was deleted but is still open. This doesn’t always block unmount by itself, but it’s often a clue: a process is still actively using the filesystem.
Decision: Restart/reload that process or force it to close logs (logrotate with proper postrotate). Don’t just delete files and hope.
Task 15: Check swapfiles (a kernel-level “busy” you can’t lsof away)
cr0x@server:~$ sudo swapon --show
NAME TYPE SIZE USED PRIO
/mnt/data/swapfile file 8G 2G -2
What it means: Swap is on that filesystem. The kernel will not let you unmount the filesystem while it’s providing swap.
Decision: swapoff /mnt/data/swapfile (and ensure there’s enough RAM or alternative swap), then reattempt unmount.
Task 16: Find loop devices backed by files on the mount
cr0x@server:~$ sudo losetup -a
/dev/loop2: [0801]:123456 (/mnt/data/images/debian.qcow2)
What it means: A loop device is using a file on the mount. That’s a kernel reference; unmount will fail until the loop is detached.
Decision: Stop whatever uses the loop device (VM tooling, mount of the loop), then losetup -d /dev/loop2.
Task 17: See device-mapper/LVM stacking that keeps a block device busy
cr0x@server:~$ lsblk -o NAME,TYPE,SIZE,MOUNTPOINTS,PKNAME
sdb disk 500G
└─sdb1 part 500G /mnt/data
What it means: Simple case: direct partition. If you see dm-* nodes or LVM logical volumes, “busy” might be due to holders above or below the filesystem layer.
Decision: If LVM/dm-crypt is involved, verify you’re unmounting the filesystem first, then deactivating LVs/closing luks only after it’s unmounted.
Task 18: Check open file descriptors directly (when you distrust lsof)
cr0x@server:~$ sudo ls -l /proc/1874/fd | head
total 0
lr-x------ 1 www-data www-data 64 Dec 29 11:10 0 -> /dev/null
l-wx------ 1 www-data www-data 64 Dec 29 11:10 10 -> /mnt/data/www/access.log
lr-x------ 1 www-data www-data 64 Dec 29 11:10 11 -> /mnt/data/www/index.html
What it means: Concrete proof: nginx has file descriptors to that mount. No debate.
Decision: Stop/reload nginx, or change config so it stops using those paths.
Task 19: Attempt a clean unmount with context
cr0x@server:~$ sudo umount /mnt/data
umount: /mnt/data: target is busy.
What it means: Still pinned. Now you don’t flail; you iterate based on holder evidence.
Decision: Resolve holders (stop services, detach loops, swapoff, unmount submounts) and retry.
Task 20: Use a lazy unmount when you’ve decided it’s acceptable
cr0x@server:~$ sudo umount -l /mnt/data
What it means: The mount is detached from the namespace now, but actual cleanup is deferred until references go away. Processes may continue to access the filesystem via existing file descriptors.
Decision: Only do this when you can tolerate “zombie access” and you’re sure the underlying device won’t disappear under active I/O. If you’re about to detach storage, lazy unmount can turn “maintenance” into “postmortem.”
Task 21: Force unmount (last resort, mostly for network mounts)
cr0x@server:~$ sudo umount -f /mnt/data
umount: /mnt/data: umount failed: Operation not permitted
What it means: On Linux, -f is meaningful primarily for certain network filesystems and specific situations; for local ext4/xfs, it often won’t behave like you hope.
Decision: Don’t treat -f as a universal crowbar. Fix the holder, or use lazy unmount with eyes open.
Task 22: Verify it’s really gone (and not re-mounted by automount)
cr0x@server:~$ findmnt /mnt/data
What it means: No output: it’s not mounted in your current namespace.
Decision: If it returns immediately after unmount, something is touching the path and triggering automount. Stop the automount unit or fix the consumer.
Joke 1: “Device busy” is the kernel’s way of saying it has plans tonight and you’re not on the calendar.
Special cases: systemd, NFS, containers, bind mounts, loop/LVM/dm-crypt
systemd mounts and automounts: your filesystem has a manager now
On Debian 13, systemd is usually in the loop even if you never asked it to be. Entries in /etc/fstab become generated units. That’s good for consistent boot behavior, but it can surprise you during maintenance:
- Automount makes unmounting feel flaky: you unmount, then a random process stats the directory, and it mounts again.
- Unit dependencies can keep mounts active because a service Requires= them.
When you need it to stay down, stop the automount first, and consider masking the mount temporarily if a misbehaving service keeps reviving it.
NFS and “umount hangs” versus “device busy”
NFS is a different kind of pain. You might see “device busy,” but more commonly umount just waits… and waits… because the kernel is trying to flush or resolve outstanding operations.
Workflow adjustments for NFS:
- If
umounthangs, check network reachability and server health first. “Busy” might be secondary. - Use
umount -lmore often for NFS when a server is gone and you need to recover the client, but understand the trade. - Stale file handles and hard mounts can make ordinary process listings misleading.
Containers: the holder is in a different mount namespace
Modern container stacks love bind mounts and overlayfs. They also love separate mount namespaces. This is how you get the classic scenario:
- Host: “Nothing is using
/mnt/data.” - Kernel: “I disagree.”
- Reality: a container has a bind mount of
/mnt/datainside its namespace, and some process there is holding a file.
The fix is boring: identify the namespace, enter it, find the holder, stop the container/service. Trying to outsmart namespaces with host-only tools is how you burn an hour.
Bind mounts and mount propagation: the invisible glue
Bind mounts can create “busy” situations that look irrational if you only look at the obvious target. The mount tree matters. So does propagation:
- A bind mount of a directory inside
/mnt/datato somewhere else means processes can hold the filesystem without ever touching/mnt/datapaths directly. - Shared mounts can propagate submounts and keep references alive across namespaces.
Use findmnt -R to visualize the subtree. Use findmnt -o TARGET,SOURCE,OPTIONS,PROPAGATION to see why it behaves like a hydra.
Loop devices, swapfiles, dm-crypt, LVM: kernel consumers don’t show up as “a file open”
If you’re dealing with images, encrypted volumes, or “temporary” swapfiles, you may have kernel-level references:
- Swapfile prevents unmount until swapoff.
- Loop device backed by a file prevents unmount until detached.
- Device-mapper stacks can keep block devices busy even after the filesystem unmounts, which matters if you’re trying to remove the underlying device.
Common mistakes: symptom → root cause → fix
These are the patterns I see repeated in tickets, chat rooms, and incident timelines. The symptoms are familiar; the root causes are usually mundane.
1) Symptom: “umount says busy but lsof shows nothing”
Root cause: Another mount namespace holds it; or you ran lsof without sufficient privileges; or the “busy” reference is a kernel consumer (swap/loop) rather than a userland open file.
Fix: Run sudo. Check namespaces with lsns -t mnt and inspect using nsenter -t PID -m. Check swapon --show and losetup -a.
2) Symptom: Parent mount won’t unmount, but you “stopped all apps”
Root cause: There’s a submount somewhere under the parent (tmpfs, overlay, bind mount, container mount).
Fix: findmnt -R /mnt/data, unmount deepest children first. If you’re doing a teardown, consider umount -R but understand it’s a recursive action with consequences.
3) Symptom: You unmounted successfully, but it mounts again immediately
Root cause: systemd automount, autofs, or a service that mounts on-demand. Sometimes it’s also your shell or a monitoring agent touching the directory.
Fix: Stop the automount unit (systemctl stop mnt-data.automount). Confirm with systemctl status. Temporarily mask if necessary for a maintenance window. Move probes off the path.
4) Symptom: “device busy” only when trying to detach the block device (not when unmounting)
Root cause: You may have unmounted the filesystem, but the block device is still held by dm-crypt, LVM, multipath, MD RAID, or loop devices.
Fix: Use lsblk to map holders; check dmsetup ls --tree; close luks mappings or deactivate LVs after the filesystem is cleanly unmounted.
5) Symptom: “umount hangs” rather than failing quickly
Root cause: Usually network filesystem problems (NFS server unreachable, hard mount), or blocked I/O flushing. Not the same problem as “busy.”
Fix: Check dmesg -T for stuck NFS or I/O errors. Verify network health. Consider lazy unmount for NFS as a recovery tactic if you accept the trade-offs.
6) Symptom: You killed the PID, but “busy” persists
Root cause: You killed the wrong process (thread leader vs worker), or children inherited the cwd, or a service restarted instantly via systemd, or a different namespace holder remains.
Fix: Re-run fuser -vm. Inspect process trees (ps --ppid or pstree if installed). Stop the systemd unit rather than whack-a-mole PIDs.
7) Symptom: “busy” after a chroot-based maintenance task
Root cause: The chroot shell or some helper process kept root inside the mount.
Fix: Find offenders via /proc/*/root checks; exit the chroot; kill any lingering processes.
8) Symptom: You used umount -l, storage got unplugged, apps exploded later
Root cause: Lazy unmount allowed continued access through existing FDs. When the underlying device vanished, those FDs started erroring in exciting ways.
Fix: Don’t lazy-unmount a filesystem you’re about to remove unless you’re also terminating the holders or you’re in a controlled shutdown path.
Checklists / step-by-step plan (safe unstick)
This is the “don’t improvise under pressure” plan. Use it for local filesystems, and adapt for NFS and container hosts.
Checklist A: Standard local filesystem unmount (ext4/xfs/btrfs)
- Confirm the mount and submounts:
findmnt /mnt/datafindmnt -R /mnt/data
Decision: If there are submounts, schedule unmount order (deepest first).
- Find userland holders:
sudo fuser -vm /mnt/datasudo lsof +f -- /mnt/data
Decision: Stop services cleanly; don’t kill blindly.
- Check cwd/root pins:
- Scan
/proc/*/cwdand/proc/*/rootfor paths under the mount.
Decision: Move shells, exit chroots, stop build jobs.
- Scan
- Check kernel consumers:
swapon --showlosetup -a
Decision: swapoff, detach loops, then retry.
- Unmount children, then parent:
sudo umount /mnt/data/cacheetc.sudo umount /mnt/data
Decision: If still busy, re-run fuser; something is still alive.
- Only then consider lazy unmount:
sudo umount -l /mnt/data
Decision: If underlying storage will be removed, don’t leave holders running.
Checklist B: Container host unmount (namespaces expected)
- Identify mount propagation and bind mounts:
findmnt -o TARGET,SOURCE,OPTIONS,PROPAGATION /mnt/datafindmnt -R /mnt/data
Decision: If shared, assume other namespaces participate.
- Find namespace owners:
sudo lsns -t mnt
Decision: Target container runtime PID(s) for
nsenter. - Inspect from inside suspected namespace:
sudo nsenter -t <pid> -m -- fuser -vm /mnt/data
Decision: Stop the specific container or runtime workload holding the mount.
- Stop orchestrated services cleanly:
Decision: Stop via systemd or orchestrator, not SIGKILL roulette.
Checklist C: NFS client unmount (when timeouts and hangs show up)
- Check whether this is “busy” or “hung”:
- Does
umountfail immediately, or block?
Decision: If it blocks, investigate server/network health first.
- Does
- Find processes:
sudo fuser -vm /mnt/nfs
Decision: Stop holders; for stuck clients, consider lazy unmount.
- Recovery tactic:
sudo umount -l /mnt/nfs
Decision: Use when server is gone and you need to restore local operability; clean up later.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
The team was doing a routine storage migration on a Debian fleet. The plan was clean: drain traffic, stop the app, unmount the old filesystem, flip to the new one, start the app. They had rehearsed it on staging and it was boring. That should have been the clue that production would find a new way to be interesting.
On the first host, umount returned “target is busy.” The on-call assumed it was “some leftover process,” ran lsof on the mount, saw nothing, and decided it must be safe to lazy unmount. The mount disappeared, the script continued, and the physical volume was detached a few minutes later.
About ten minutes after that, the app started throwing sporadic I/O errors. Not on every request. Not consistently. Only on background jobs that used a long-lived worker process. The main request path looked fine, because those processes restarted more often and “forgot” old file descriptors.
The wrong assumption was subtle: “if lsof shows nothing, nothing is using it.” In reality, the holder was in a different mount namespace created by a helper service that launched workers in an isolated context. Host-level lsof wasn’t seeing the full picture they thought they were looking at.
The fix was straightforward once they stopped guessing: identify mount namespaces with lsns, nsenter into the worker’s namespace, run fuser, stop the right unit cleanly, then unmount normally. The postmortem action item wasn’t “don’t use lazy unmount.” It was “don’t use lazy unmount until you’ve proven who’s holding it, including other namespaces.”
Mini-story 2: The optimization that backfired
A different org had a habit: when maintenance windows were tight, they’d preemptively use automount in /etc/fstab for bulky datasets. The idea was reasonable: don’t mount huge volumes at boot; mount on demand; save boot time; reduce “blast radius.” They even had monitoring around mount latency.
Then a security scanning agent landed. It crawled filesystem trees looking for interesting file types. It didn’t know that “touching a path” could mount a dataset, and it didn’t care. Suddenly, datasets were being auto-mounted during maintenance prep, right after the team unmounted them “successfully.”
The unmount step started flapping: unmount worked, a second later it was mounted again, and umount returned busy because the scanner now had files open on it. The team responded the way humans respond when annoyed: kill the scanner process. It restarted. More killing. More restarting. Eventually someone got clever and masked the automount unit.
The optimization had backfired because automount turns “a directory exists” into “a device should be present.” That’s convenient until you want an unmounted state to persist. The boring fix was to explicitly stop/disable the automount during maintenance and to exclude those paths from scanners. The cultural fix was better: treat automount as a feature that needs operational controls, not a default for everything that’s large and inconvenient.
Joke 2: Automount is great until it becomes a very enthusiastic intern who keeps plugging things back in because “it looked unplugged.”
Mini-story 3: The boring but correct practice that saved the day
There was a storage-heavy service with strict change control. The SRE team had a playbook that required: identify holders with two independent methods, stop services by unit name (not PID), and verify mounts are gone with findmnt in the correct namespace. Everyone complained it was too slow. It felt like bureaucracy disguised as rigor.
During a hardware failure, they needed to evacuate data and detach a failing disk. The filesystem wouldn’t unmount. The junior on-call started to panic and reached for umount -l, because that’s what they had seen on the internet. The lead asked for the playbook steps. No debates. Just run them.
fuser pointed to a backup agent process. lsof showed it was writing to a temporary file on the mount. Stopping the agent via systemd didn’t stop it, because it was spawned by a timer unit that immediately restarted a new instance. The playbook’s second method caught that: systemctl list-timers and unit status showed the culprit.
They stopped the timer, stopped the service, verified no holders, unmounted cleanly, then detached the disk with no further surprise. Nobody cheered. That’s the point. In incident response, the win condition is “boring,” not “clever.”
Interesting facts and historical context
- 1) “EBUSY” is old Unix DNA. The “device or resource busy” error goes back decades; it’s a kernel-level refusal to detach something still referenced.
- 2) Linux mount namespaces changed the meaning of “who is using it.” Since namespaces, a mount can be busy “somewhere else” even if your shell can’t see it.
- 3)
lsofisn’t a kernel syscall; it’s an investigator. It walks process tables and file descriptors to infer open files, which means permissions and scale matter. - 4)
fuserpredates containers but still wins on speed. It’s often faster for “give me PIDs,” because it asks the kernel differently than lsof’s full-path mapping. - 5) Lazy unmount exists because reality is messy.
umount -lwas created for cases where you need to detach a path even if references remain, commonly in recovery. - 6) “Busy” is not only about files. A process holding cwd/root in the mount doesn’t need any open files; directory references are enough.
- 7) Swapfiles are a classic footgun. They look like “just a file” until you learn the kernel treats them as a paging device and refuses unmount.
- 8) Loop devices turn a file into a block device. That’s why loop-backed images keep filesystems busy even when no userland process shows a file open in the obvious way.
- 9) systemd generating mount units from fstab changed operational expectations. Mounts are now “managed objects” with dependencies, timeouts, and automount behavior.
FAQ
Q1: What’s the fastest single command to find what blocks umount?
A: Start with sudo fuser -vm /mnt/data. It’s fast, readable, and tells you which PIDs are involved. Follow up with lsof for filenames.
Q2: Why does lsof sometimes take forever on big systems?
A: It can be proportional to “processes × file descriptors,” and it often resolves paths. On hosts with lots of containers and thousands of FDs per process, it’s doing real work. Use fuser first, or targeted lsof by PID once you have suspects.
Q3: Why does umount -f not work for local filesystems?
A: On Linux, “force” unmount is mostly meaningful for certain network filesystems or special cases. For local ext4/xfs, the kernel won’t just tear it away if it’s busy, because that creates consistency problems.
Q4: Is umount -l safe?
A: It can be safe in controlled scenarios: you need the mount point detached and you know the underlying storage remains accessible until holders exit. It’s unsafe if you plan to remove the storage while holders are still running.
Q5: How do I know if a container is holding my mount?
A: Check mount namespaces with sudo lsns -t mnt, identify likely runtime PIDs, then nsenter -t PID -m and run findmnt/fuser there. If you find holders, stop the container/service cleanly.
Q6: Can a “current directory” really block unmount even with no files open?
A: Yes. A process’s cwd is a reference to an inode in that filesystem. The kernel won’t detach a filesystem that still has active references.
Q7: Why does it re-mount right after I unmount?
A: Automount (systemd or autofs) or a service touching the path. Disable/stop the automount unit during maintenance, and figure out who is probing the directory.
Q8: What if fuser shows PIDs but killing them doesn’t help?
A: You may be killing workers while a supervisor restarts them (systemd, cron, timers), or the holder is in another namespace. Stop the unit, not the symptom, and re-check namespaces.
Q9: What’s a reliable way to prove it’s unmounted?
A: Use findmnt /mnt/data (no output means unmounted in that namespace). Also check findmnt -R /mnt/data for lingering submounts. If namespaces are involved, verify inside the relevant namespace too.
Q10: Any operational philosophy here, beyond commands?
A: Yes: identify the holder before choosing the unstick method. As systems thinker John Allspaw has stressed (paraphrased idea), resilience comes from understanding how work is actually done, not how you hoped it was done.
Next steps you should actually do
When the page quiets down and you’re no longer staring at “device busy,” do the small fixes that prevent the next round of pain:
- Encode the workflow in a runbook:
findmnt→findmnt -R→fuser→lsof→ namespaces → kernel consumers. - Stop killing PIDs as a first response. Prefer stopping the owning systemd unit, container, or timer. Processes grow back.
- Audit for swapfiles and loop-backed images under removable mounts. If you must use them, document them and create a teardown procedure.
- For automounts, define a maintenance mode: a known command sequence that stops automount and suppresses “helpful” probes.
- Train your team on namespaces. It’s not advanced anymore; it’s normal Linux. “I can’t see it” is not the same as “it’s not there.”