The symptom is boring. “TASK ERROR: startup for container ‘101’ failed.” The impact is not. You’re on-call, a service is down, and the error looks like a ransom note written by three subsystems that don’t like each other: LXC, cgroups, and AppArmor.
If you treat those messages as noise, you’ll thrash—toggling “nesting,” rebooting nodes, and praying to the kernel. If you learn how to read them, you’ll usually fix the issue in minutes, and you’ll know whether you’re dealing with a host-level control plane failure (cgroups) or a policy failure (AppArmor) or a plain old filesystem/ID map faceplant.
A mental model: what must succeed for an LXC to start
LXC startup on Proxmox is a chain of dependencies. When it fails, the error is usually accurate—it’s just not contextual. You need the chain in your head so you can locate the broken link.
The chain, in plain terms
- Proxmox calls LXC via
lxc-start(wrapped bypve-containertooling), creating a container process tree. - Kernel namespaces get set up: mount, PID, network, user (for unprivileged), etc.
- cgroups get created and controllers attached (CPU, memory, pids, io). If cgroups are mis-mounted, missing controllers, or incompatible with config expectations, startup stops.
- Security policy gets applied: AppArmor profile loaded/applied; seccomp filters possibly attached.
- Root filesystem is mounted: bind mounts (like
/dev,/proc,/sys), storage volume attached, optional mounts frommp0, etc. - Init inside the container starts (often systemd). If systemd can’t mount cgroups or gets denied, it may exit immediately—making it look like “container won’t start.”
The practical takeaway: cgroups errors are often host-level (node config, kernel boot flags, mount layout), while AppArmor errors are policy-level (profile denies a specific action, commonly after enabling nesting, FUSE, CIFS, or privileged device access).
One quote worth keeping on your monitor: “Hope is not a strategy.”
— paraphrased idea often attributed to operations leaders. The point stands: stop guessing and follow the evidence trail.
Short joke #1: If your container fails to start because of cgroups, congratulations—you’ve discovered a problem that can’t be fixed with more YAML.
Fast diagnosis playbook (check first/second/third)
This is the “I have five minutes” path. It’s not comprehensive; it’s designed to find the bottleneck quickly and decide whether you can fix in-place or need to drain the node.
First: confirm it’s not a simple config or storage faceplant
- Check task log details from Proxmox (it usually includes the first hard failure).
- Check container config for suspicious mounts, nesting, and unprivileged settings.
- Confirm the rootfs volume exists and is mountable on the host.
Second: decide whether it smells like cgroups or AppArmor
- If you see strings like
cgroup,cgroup2,controllers,cgroupfs,failed to mount /sys/fs/cgroup: go cgroups-first. - If you see
apparmor="DENIED",profile=,operation=,audit: go AppArmor-first. - If it says
permission deniedbut without AppArmor audit lines, suspect ID maps, ownership on mountpoints, or storage permissions.
Third: isolate whether it’s one container or the node
- Start a known-good tiny container. If that also fails, it’s the node.
- Check whether other containers are starting; if not, stop poking individual configs and fix the host substrate.
- Look for recent changes: kernel upgrade, Proxmox upgrade, toggled nesting, changed AppArmor mode, changed cgroups mode, new mountpoints.
Decisive rule: if it’s a node-level cgroups mount/controller issue, you don’t “fix” it per-container. You fix the node or evacuate workloads. Anything else is busywork with better keyboard sounds.
How to read cgroups and AppArmor errors without guessing
Where the truth is written
Proxmox’s UI shows a summarized error. The actionable details are usually in one of these places:
- Task log (what the UI shows, but sometimes truncated)
journalctlon the host (systemd and kernel messages)/var/log/syslogon Debian-based hosts- Kernel ring buffer (
dmesg) - LXC’s own logging (higher verbosity via debug start)
- Audit logs for AppArmor denials (often visible via journal)
Reading cgroups errors: the keywords matter
cgroups errors tend to fall into a few buckets:
- Mount problems: LXC can’t mount or access
/sys/fs/cgroup(common with cgroup v2 expectations vs host setup). - Controller availability: required controllers (like
memoryorpids) aren’t available or not delegated properly. - Permission/ownership problems: delegation issues when using systemd-managed cgroups and unprivileged containers.
- Mixed hierarchy confusion: v1 vs v2 vs hybrid mismatches, sometimes after upgrades.
When you see an error like Failed to mount "cgroup" or cgroup2: No such file or directory, don’t interpret it as “the container’s broken.” Interpret it as: the host’s cgroup filesystem layout doesn’t match what LXC expects for that container’s configuration and that host’s init system.
Reading AppArmor errors: the denial line is a mini incident report
An AppArmor denial line includes: timestamp, profile, operation, requested resource, and sometimes the “name” (path). It’s usually enough to decide if you should:
- Adjust container features (nesting, keyctl, fuse, mount types)
- Fix a file path permission/labeling issue on the host
- Switch AppArmor profile mode (complain vs enforce) temporarily while you collect proof
- Stop doing the risky thing (my preferred option in production)
AppArmor is not “randomly breaking containers.” It’s doing exactly what you asked: block actions that aren’t in policy. The annoying part is that we often didn’t realize we asked.
Practical tasks: commands, expected output, and decisions
These are the moves I actually use under pressure. Each one includes the decision you should make from the output. Do them in order if you’re lost; cherry-pick if you already have a strong lead.
Task 1: Pull the exact failure from Proxmox task output (CLI)
cr0x@server:~$ pct start 101
starting container 101
lxc-start: 101: conf.c: run_buffer: 312 Script exited with status 1
lxc-start: 101: start.c: __lxc_start: 2107 Failed to initialize container "101"
TASK ERROR: startup for container '101' failed
What it means: LXC itself failed; Proxmox is just reporting. The message is generic; you need deeper logs.
Decision: Immediately go to journal logs and AppArmor/cgroups evidence (next tasks). Don’t edit random config yet.
Task 2: Get container config and look for “sharp edges”
cr0x@server:~$ cat /etc/pve/lxc/101.conf
arch: amd64
cores: 2
hostname: app-101
memory: 2048
net0: name=eth0,bridge=vmbr0,firewall=1,ip=dhcp,type=veth
ostype: debian
rootfs: local-lvm:vm-101-disk-0,size=8G
unprivileged: 1
features: nesting=1,keyctl=1
mp0: /srv/shared,mp=/mnt/shared,backup=0
What it means: This container is unprivileged, with nesting+keyctl, and a host bind mount (mp0). Those three are common sources of AppArmor and permission errors.
Decision: If you recently enabled nesting or added mp0, suspect AppArmor or ownership/ID mapping first.
Task 3: Check host cgroup mode (v1 vs v2)
cr0x@server:~$ stat -fc %T /sys/fs/cgroup
cgroup2fs
What it means: Host is using cgroups v2 unified hierarchy.
Decision: If container or LXC expects legacy v1 mounts (rare on modern PVE, but can happen with older configs), plan to align. Also check controller availability (next task).
Task 4: Verify required cgroup controllers are available and enabled
cr0x@server:~$ cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma
What it means: These are controllers supported by the kernel and available in the root cgroup. If memory or pids is missing, containers that set memory/pids limits can fail or behave badly.
Decision: Missing controllers means a host/kernel configuration issue. Stop debugging per-container and fix the node boot/kernel config.
Task 5: Check whether systemd has delegated controllers appropriately
cr0x@server:~$ systemctl show -p DefaultControllers
DefaultControllers=cpu cpuacct io memory pids
What it means: On systemd systems, delegation and controller configuration influence whether LXC can create/manage cgroups. Newer systemd uses v2 differently than older releases.
Decision: If you see oddities (missing expected controllers) after an upgrade, suspect systemd+cgroups interaction. Check recent package changes and consider a controlled reboot after fixing boot parameters.
Task 6: Inspect mounts for cgroup weirdness (hybrid layouts, missing mount)
cr0x@server:~$ mount | grep -E 'cgroup|cgroup2'
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
What it means: Clean v2 mount. If you see multiple v1 cgroup mounts plus cgroup2, you might be in hybrid mode.
Decision: If LXC complains about mounting cgroup2 but you’re in v1 or hybrid, align the host (boot params) with what PVE expects for your version.
Task 7: Pull the relevant journal slice for the container start attempt
cr0x@server:~$ journalctl -u pve-container@101.service -n 200 --no-pager
Dec 26 10:18:01 pve1 systemd[1]: Starting PVE LXC Container: 101...
Dec 26 10:18:01 pve1 lxc-start[18277]: lxc-start: 101: cgfsng.c: cgroup_init: 846 Failed to create cgroup /lxc/101
Dec 26 10:18:01 pve1 lxc-start[18277]: lxc-start: 101: cgfsng.c: cgroup_init: 850 No such file or directory
Dec 26 10:18:01 pve1 systemd[1]: pve-container@101.service: Main process exited, code=exited, status=1/FAILURE
Dec 26 10:18:01 pve1 systemd[1]: pve-container@101.service: Failed with result 'exit-code'.
What it means: LXC couldn’t create the cgroup path it wanted. That’s usually cgroup filesystem layout, delegation, or a stale/incorrect cgroup root configuration.
Decision: Stay in cgroups lane. Look for missing /sys/fs/cgroup permissions, delegation issues, or broken systemd cgroup tree.
Task 8: Check AppArmor denials around the failure time
cr0x@server:~$ journalctl -k -g 'apparmor="DENIED"' -n 50 --no-pager
Dec 26 10:18:01 pve1 kernel: audit: type=1400 audit(1735208281.214:915): apparmor="DENIED" operation="mount" profile="lxc-container-default-cgns" name="/sys/kernel/security" pid=18290 comm="(mount)" flags="rw, nosuid, nodev, noexec"
What it means: The container attempted a mount operation blocked by the applied profile. That can prevent systemd inside the container from booting.
Decision: Decide whether the mount is necessary. If it’s coming from nesting/systemd expectations, fix container features or adjust policy in a controlled way. Don’t just disable AppArmor globally.
Task 9: Verify AppArmor is enabled and profiles loaded
cr0x@server:~$ aa-status
apparmor module is loaded.
60 profiles are loaded.
58 profiles are in enforce mode.
2 profiles are in complain mode.
0 processes are unconfined but have a profile defined.
What it means: AppArmor is active. If it says “module is not loaded,” then AppArmor isn’t your blocker—your “DENIED” lines won’t exist.
Decision: If denials exist and AppArmor is enforcing, treat those denials as first-class causes, not background noise.
Task 10: Run LXC with debug logging (surgical, noisy, useful)
cr0x@server:~$ lxc-start -n 101 -F -l DEBUG -o /tmp/lxc-101.debug.log
lxc-start: 101: start.c: lxc_init: 846 Container "101" initialized
lxc-start: 101: cgfsng.c: cgroup_init: 846 Failed to create cgroup /lxc/101
lxc-start: 101: start.c: __lxc_start: 2107 Failed to initialize container "101"
What it means: Confirms failure occurs during cgroup init, not filesystem mount of rootfs or network setup.
Decision: Stop chasing AppArmor if debug logs stop before security profile application. Fix cgroups first.
Task 11: Validate the container’s rootfs volume exists and is accessible
cr0x@server:~$ pvesm status
Name Type Status Total Used Available
local dir active 100.00G 42.10G 57.90G
local-lvm lvmthin active 60.00G 25.00G 35.00G
What it means: Storage backend is active. If it’s inactive, you can get misleading “permission denied” or mount failures that look like AppArmor.
Decision: If storage is down, fix storage first. Containers can’t mount imaginary disks.
Task 12: Confirm the rootfs LV actually exists (LVM-thin example)
cr0x@server:~$ lvs -a -o lv_name,vg_name,lv_size,lv_attr | grep vm-101-disk-0
vm-101-disk-0 pve 8.00g Vwi-a-tz--
What it means: The volume exists. If it doesn’t, the container can’t start.
Decision: Missing volume means a restore/backup problem or storage misreference in 101.conf. Fix config or recreate volume.
Task 13: Check bind mount source paths and permissions on the host
cr0x@server:~$ ls -ld /srv/shared
drwxr-x--- 2 root root 4096 Dec 25 09:12 /srv/shared
What it means: For an unprivileged container, a bind mount owned by root:root with restrictive permissions can break startup if LXC tries to set up mount ownership or the container init expects access.
Decision: Either adjust ownership to match the container’s mapped IDs (see ID map task), or mount read-only, or stop bind-mounting host paths into unprivileged containers unless you’ve planned ownership.
Task 14: Inspect ID mapping for an unprivileged container
cr0x@server:~$ grep -E '^(root|lxc)' /etc/subuid /etc/subgid
/etc/subuid:root:100000:65536
/etc/subgid:root:100000:65536
What it means: Host has subordinate UID/GID ranges for root, which unprivileged containers need. If missing, unprivileged containers often fail with unhelpful permission errors.
Decision: If ranges are missing or too small for your idmap usage, fix /etc/subuid and /etc/subgid, then restart relevant services or reboot.
Task 15: Confirm kernel supports required namespaces and cgroup features
cr0x@server:~$ zgrep -E 'CONFIG_CGROUPS=|CONFIG_USER_NS=|CONFIG_CGROUP_BPF=|CONFIG_CGROUP_FREEZER=' /boot/config-$(uname -r)
CONFIG_CGROUPS=y
CONFIG_USER_NS=y
CONFIG_CGROUP_BPF=y
# CONFIG_CGROUP_FREEZER is not set
What it means: Core features are present. A missing CONFIG_USER_NS would kill unprivileged containers outright. Freezer being absent is usually not fatal on modern setups but can affect certain behaviors/tools.
Decision: If key configs are missing, you’re on the wrong kernel or a custom build. Use the supported Proxmox kernel series and stop improvising.
Task 16: Check whether the container’s systemd is failing due to cgroup mounting
cr0x@server:~$ pct start 101 --debug
lxc-start: 101: conf.c: run_buffer: 312 Script exited with status 1
lxc-start: 101: conf.c: run_buffer: 313 Script exited with status 1: mount: /sys/fs/cgroup: permission denied
What it means: The failure is at mount time of cgroups inside the container. That can be AppArmor denial, missing delegation, or LXC config mismatch for cgroup version.
Decision: Correlate with AppArmor denials (Task 8). If no denials, check cgroup delegation/permissions and container features.
cgroups failure modes that stop containers cold
1) cgroups v2 unified hierarchy mismatched with expectations
cgroups v2 is the modern way: one hierarchy, consistent semantics, improved delegation. But reality is messy: older container images, older LXC versions, or hand-me-down configs can assume v1 mountpoints (like separate /sys/fs/cgroup/memory).
In Proxmox, the host is typically systemd-managed and increasingly v2. If a container start sequence tries to mount v1 controllers or expects v1 paths, you’ll see mount errors or “No such file or directory” in cgroup init.
What to do: Align. Prefer running the Proxmox-supported kernel and LXC stack as a set. Avoid forcing cgroup legacy modes unless you have a concrete compatibility requirement and a rollback plan.
2) Controllers available but not delegated (systemd boundary problem)
Even when the controllers exist, the process creating the container must have permission to create sub-cgroups and enable controllers. With cgroups v2, enabling controllers is explicit and can be blocked by parent cgroup configuration.
Symptoms look like “Failed to create cgroup” or “Operation not permitted,” sometimes only for some containers (those that set limits, or those started through different units).
What to do: Treat it as a node config issue. Validate systemd’s delegation and the cgroup tree. If you recently changed how containers are started (custom units, wrappers), roll that back first.
3) Read-only or broken /sys/fs/cgroup inside container
Some container images are designed for Docker-style environments and assume they can manage cgroups or mount certain pseudo-filesystems. LXC containers often run with tighter boundaries. If systemd inside the container wants to mount and it’s denied, it exits fast. The container “starts” and “stops” instantly, which looks like a Proxmox problem.
What to do: Decide whether systemd inside that container should be the PID 1 at all. If it must be, adjust container features carefully. Otherwise, run a simpler init for that workload.
4) Kernel regressions and “helpful” boot parameters
Sometimes the node was upgraded and cgroups behavior changed. Sometimes someone added boot parameters for another problem and accidentally broke containers. cgroups is sensitive to kernel cmdline flags and systemd defaults.
My operational stance: keep nodes consistent. If you need to “special-case” one node, you’re creating a future incident with a delayed fuse.
AppArmor failure modes: denials, profiles, and the “it worked yesterday” trap
AppArmor basics you actually need
AppArmor is Mandatory Access Control by profile: it limits what a process can do even if Unix permissions say “allowed.” LXC on Proxmox uses AppArmor profiles to restrict containers. When something in the container attempts a blocked operation (mounting, accessing kernel interfaces, using FUSE, etc.), you get a denial in the kernel audit log.
1) Nesting enables behavior that AppArmor blocks by default
Nesting is the feature that makes people brave: “Let’s run Docker inside this container.” It expands the syscall and mount behavior inside the container, which collides with conservative policies. Suddenly the container wants to mount overlayfs, access /sys/kernel/security, use keyrings, or manipulate cgroups in ways that AppArmor says “no.”
What to do: If you need Docker/Kubernetes inside something, use a VM unless you have a disciplined reason not to. Nested containers are neat until they become your incident response hobby.
2) Bind mounts and host paths: policy + ownership double trouble
Bind mounts (mp0, mp1, etc.) are the fastest way to make an LXC container useful, and also the fastest way to break it. AppArmor may restrict certain mount operations, and unprivileged containers may not be able to access the mounted files due to UID/GID mapping. These failures can present as either AppArmor denials or generic permission errors.
What to do: Treat bind mounts as an interface contract. Decide on ownership and access patterns up front. Document them in the container config. Don’t “just mount /srv” and hope.
3) “Disable AppArmor” is the wrong reflex
Yes, turning off AppArmor can make the container start. It can also make the host’s security posture do a dramatic interpretive dance. Your goal is not “green status.” Your goal is correct control.
Better approach: Use AppArmor denials to identify exactly what operation is blocked, then decide whether the operation is necessary and safe. Sometimes the correct fix is to remove the feature that triggered the behavior. Sometimes it’s to change how the workload runs. Occasionally it’s to adjust the profile—carefully, minimally, and consistently across nodes.
Short joke #2: Disabling AppArmor to fix a startup error is like removing the smoke alarm because it’s loud.
Three corporate mini-stories from the trenches
Mini-story 1: An incident caused by a wrong assumption
A team migrated a batch processing service from VMs to LXC to “save overhead.” Their assumption was simple: containers are just lightweight VMs. They weren’t wrong in spirit, but they were wrong where it counts—kernel control planes are shared, and systemd inside containers is picky about cgroups.
After a routine OS update on the Proxmox node, a subset of containers stopped starting. The logs said “Failed to create cgroup” and “permission denied” on /sys/fs/cgroup. The team focused on the containers: rebuilt images, rolled back application changes, even restored from backup. Nothing changed. The node kept refusing to provide the cgroup layout the containers expected.
The real problem was a boot-time cgroups mode mismatch introduced during maintenance. The node had drifted into a hybrid setup that didn’t match the rest of the cluster. Most containers started because they didn’t set strict resource limits; the batch workers did. The assumption “if one container starts, the node is fine” was false. Some workloads are more cgroup-sensitive than others.
The fix was boring: align kernel parameters and systemd behavior with the rest of the fleet, then reboot in a maintenance window. The lesson stuck: in container land, “works for one” is not evidence of correctness; it’s evidence of incomplete coverage.
Mini-story 2: An optimization that backfired
A platform team wanted faster provisioning. They standardized on unprivileged containers for security (good) and added a shared host directory mounted into dozens of LXCs for caching artifacts (questionable). The cache directory was owned by root on the host, but inside each container it needed to be writable by an app user.
To “solve” permissions, someone enabled nesting and keyctl broadly, then tweaked a few containers to run helper scripts on boot to chown directories. It worked in a test environment. In production, AppArmor started blocking mount-related operations triggered by systemd units and helper scripts. Containers would flap: start, attempt mounts, get denied, exit. Monitoring saw them as “unstable” rather than “blocked by policy.”
The backfire wasn’t just the denials. The bigger issue was the operational model: dozens of containers depended on a shared host path with implicit ownership translation and runtime mutation. Every change to that directory’s permissions became a cluster-wide event. Debugging became archaeology.
The eventual fix was to stop trying to make bind mounts behave like a distributed filesystem. They moved cache artifacts to a proper service and kept bind mounts for simple read-only data. Startup became boring again. That was the win: boring is a feature.
Mini-story 3: A boring but correct practice that saved the day
An infrastructure group ran Proxmox nodes like cattle, not pets—at least as much as you can in virtualization. They had two habits that looked tedious: they kept a “known-good tiny container” template for smoke tests, and they recorded node-level drift (kernel version, boot flags, AppArmor status, cgroup mode) after every change.
One morning, after a security update wave, a node started rejecting new container starts with cgroup errors. Instead of spelunking a production workload’s config, the on-call started the tiny smoke-test container. It failed the same way. That single datapoint collapsed the search space: this wasn’t about application images or mountpoints; it was the node substrate.
They compared the node’s drift record against a healthy peer. The only difference was a subtle boot parameter change introduced during a previous troubleshooting session, left behind like a banana peel. They reverted it, rebooted the node in a controlled manner, and workloads recovered.
No heroics. No “temporary” hacks that became permanent. The correct practice wasn’t glamorous—it was consistency and a repeatable test. In operations, glamour is usually a sign you’re about to do something you’ll regret.
Common mistakes: symptom → root cause → fix
1) Symptom: “Failed to create cgroup /lxc/101”
Root cause: cgroup filesystem layout or delegation mismatch (often v2 controller enabling or missing cgroup mount).
Fix: Verify /sys/fs/cgroup is mounted correctly, controllers present, and systemd delegation sane. Align node configuration; don’t patch per-container.
2) Symptom: “mount: /sys/fs/cgroup: permission denied” during start
Root cause: AppArmor denial on mount, or container attempting mounts not allowed under its profile, sometimes triggered by nesting.
Fix: Check kernel audit denials. Remove or limit nesting/keyctl unless required. If required, use a VM or a carefully scoped profile adjustment.
3) Symptom: Container starts then immediately stops; UI shows no clear error
Root cause: PID 1 inside container (often systemd) fails early due to cgroups or security policy.
Fix: Use debug start and journal logs. Look for systemd errors about cgroups. Decide whether systemd is appropriate; if yes, fix cgroup/AppArmor constraints.
4) Symptom: Works as privileged, fails as unprivileged
Root cause: ID mapping/subuid/subgid issues, bind mount ownership mismatch, or host paths inaccessible under mapped IDs.
Fix: Validate /etc/subuid//etc/subgid. Fix mount path ownership to match mapped range. Avoid using privileged as a “fix”; it’s a different security model.
5) Symptom: Only containers with resource limits fail
Root cause: Missing controllers (memory/pids) or controller enablement issues in v2.
Fix: Confirm controller presence and systemd delegation. Fix node configuration; keep nodes uniform.
6) Symptom: AppArmor denials mention /sys/kernel/security or mount operations
Root cause: Container trying to access kernel security interfaces or mount sensitive pseudo-filesystems—often nesting or special workloads.
Fix: Re-evaluate why the workload needs that. Prefer VM for nested runtimes. If you must, scope policy changes tightly and test across upgrades.
7) Symptom: “No such file or directory” for cgroup paths
Root cause: Expecting v1 paths on v2 system, or stale LXC config referencing old mountpoints.
Fix: Remove legacy cgroup mount hacks from container config. Ensure host and LXC versions are compatible and consistent across cluster.
8) Symptom: Adding mp0 made the container stop starting
Root cause: Host path missing, wrong permissions, or AppArmor mount restrictions.
Fix: Confirm the host source path exists and permissions align with unprivileged mapping. Try temporarily removing the mount to confirm causality; then redesign access properly.
Checklists / step-by-step plan
Checklist A: Five-minute triage (single container won’t start)
- Run
pct start <id>from CLI to get the raw error. - Inspect
/etc/pve/lxc/<id>.conffor:unprivileged,features,mp*, customlxc.*lines. - Check
journalctl -u pve-container@<id>.service -n 200for the first concrete failure. - Search for AppArmor denials around the timestamp:
journalctl -k -g 'apparmor="DENIED"'. - If the error is cgroup-ish, confirm cgroup mode and controllers (
stat -fc %T /sys/fs/cgroup,cat /sys/fs/cgroup/cgroup.controllers). - Temporarily remove the newest risky change (a new bind mount, nesting) to confirm causality.
Checklist B: Node-level failure (multiple containers won’t start)
- Start a known-good tiny container. If it fails similarly, treat as node problem.
- Verify cgroup mount and mode:
mount | grep cgroupandstat -fc %T /sys/fs/cgroup. - Verify controller set:
cat /sys/fs/cgroup/cgroup.controllers. - Check for obvious kernel/audit noise:
dmesg -T | tail -n 200. - Confirm AppArmor module is loaded and enforcing:
aa-status. - Compare node packages/kernel version to a healthy node (same cluster):
uname -r,pveversion -v. - Roll back drift (boot flags, custom units) before doing “creative fixes.”
- If you can’t restore quickly, evacuate workloads and repair in maintenance. Production likes decisive moves.
Checklist C: When you must change something (safe change discipline)
- Make one change at a time: remove nesting, remove a mount, revert a boot flag—never all at once.
- Capture before/after evidence: journal slice, denial lines, and cgroup controller list.
- Test with two containers: one “simple” and one “complex” (resource limits + bind mount + systemd).
- Propagate the fix consistently to all nodes; inconsistency is an incident generator.
Interesting facts and historical context (the parts people forget)
- cgroups started in the late 2000s as a kernel mechanism to account and limit resources per process group—long before “containers” became a marketing term.
- LXC predates Docker’s popularity; it’s one of the older “system container” approaches, closer to lightweight VMs than app containers.
- cgroups v2 is not just v1 with a new name; it changes semantics (like unified hierarchy and controller enablement), which is why mismatches hurt.
- systemd became a central actor in Linux cgroups management; even if you never asked for it, it now owns much of the cgroup tree on most distros.
- AppArmor is pathname-based (unlike SELinux’s label-based model), which makes denials more readable but can make policy brittle across path changes.
- Unprivileged containers rely on user namespaces; the host maps container “root” to a non-root UID range, which is great until you bind-mount host paths without planning ownership.
- The “nesting” feature is a trade: it enables workloads that want deeper kernel-like behavior, but it reduces the isolation guarantees you probably wanted from LXC.
- Many startup failures are really PID 1 failures: systemd inside the container exits because it can’t mount cgroups or gets denied, and LXC reports “container failed.”
- Hybrid cgroups modes existed for compatibility, but they’re operationally awkward; when a fleet drifts between modes, you get “it works on node A” failures.
FAQ
1) How do I tell if it’s cgroups or AppArmor in under a minute?
Search the kernel log for denials: journalctl -k -g 'apparmor="DENIED"'. If you get relevant hits at the failure time, it’s policy. If logs show cgroup creation/mount failures without denials, it’s substrate (cgroups/systemd).
2) The UI shows “startup failed,” but the container worked yesterday. What changed?
Usually: a kernel update, systemd update, Proxmox update, or a container config change (nesting, bind mounts, resource limits). Start by comparing pveversion -v and uname -r with a healthy node.
3) Is switching the container to privileged an acceptable fix?
It’s a diagnostic tool, not a fix. If privileged starts and unprivileged fails, you’ve learned it’s likely ID mapping, bind mount ownership, or user namespace constraints. Fix those and go back to unprivileged unless you have a formal risk sign-off.
4) Do I need nesting to run Docker inside an LXC?
Often yes, and that’s the problem. If you’re serious about running Docker/Kubernetes reliably, prefer a VM. LXC nesting can be made to work, but it’s operationally fragile and frequently interacts badly with AppArmor and cgroups expectations.
5) What does “Failed to mount /sys/fs/cgroup” inside the container usually mean?
Either the container is trying to mount cgroups but is not allowed (AppArmor) or the host’s cgroup configuration/delegation doesn’t support what the container’s init expects. Correlate with AppArmor denials first, then validate host cgroup mode and controllers.
6) Why do only some containers fail after a node upgrade?
Because not all containers exercise the same kernel interfaces. Containers with memory/pids limits, systemd, nesting, or complex mountpoints touch cgroups and AppArmor more aggressively.
7) Can AppArmor denials be safely ignored if the container still starts?
No. A denial you “get away with” today can become a hard failure after a package update changes behavior. Treat denials as technical debt with an interest rate.
8) My bind mount works on a privileged container but not unprivileged. Why?
Unprivileged containers map UIDs/GIDs; “root” inside is not root on the host. Host path ownership and permissions must match the mapped range, or the container won’t be able to access it. Fix ownership or redesign the mount strategy.
9) Should I disable AppArmor to get through an incident?
Only as a last-resort containment move and only if you understand the blast radius. Prefer removing the triggering feature (like nesting) or fixing the specific denied behavior. Disabling AppArmor is a policy change, not a “restart.”
10) What’s the safest way to test changes?
Use a tiny known-good container template, apply one change, test start/stop, then test a “complex” container (systemd + limits + mount). Consistency across nodes matters more than hero debugging on one box.
Conclusion: practical next steps
When an LXC won’t start on Proxmox and the logs mention cgroups or AppArmor, the system is telling you exactly what failed—just not in a friendly order. Your job is to classify the failure: substrate (cgroups/systemd/kernel) vs policy (AppArmor) vs ownership/storage (bind mounts, idmaps).
Next steps I’d actually take:
- Grab evidence:
pct startoutput,journalctl -u pve-container@ID, and any AppArmor denials. - If multiple containers fail, stop debugging individual configs and validate host cgroups mode/controllers immediately.
- If AppArmor is denying mounts or kernel security access, remove the triggering feature first (nesting/bind mount), then reintroduce carefully—or move the workload to a VM.
- Write down what changed (kernel, boot flags, container features). Drift is how “one weird node” becomes a recurring outage.
Fix the real layer. Keep nodes uniform. And when you’re tempted to disable a safety system to “get it running,” take a breath and read the denial line again. It’s usually right.