openSUSE Leap 15.6 Install: The ‘Stable Linux’ Setup You’ll Keep for Years

Was this helpful?

Most Linux installs fail the same way: not during the installer, but six months later, when a quiet update collides with an ambitious storage layout and your “I’ll remember what I did” notes are… missing.

Leap 15.6 is the antidote for people who want their machines to behave like appliances: predictable, boring, and recoverable. If you set it up right, you’ll stop treating your OS like a science fair project and start treating it like infrastructure.

Why Leap 15.6 is the “keep it for years” distro

Leap is the release you pick when you value your weekends. It’s not trying to be the newest. It’s trying to be correct, consistent, and supportable. The key idea: you get a distribution that tracks a stable base from SUSE Linux Enterprise, with a community wrapper that keeps it approachable.

For production-ish workloads—home lab services, small business servers, developer workstations that must also run Zoom and not melt—Leap behaves like a long-term tenant. It pays rent on time. It doesn’t throw parties.

What makes Leap particularly install-and-forget is the combination of:

  • YaST for system configuration that remains readable years later, including storage and network.
  • Btrfs + Snapper integration that turns many “I guess we reinstall” incidents into “roll back and keep moving.”
  • zypper patch workflows that align well with controlled change management.
  • AppArmor as a pragmatic hardening layer that’s actually used and maintained.

One opinion up front: if you want “latest everything,” this is not your distro. If you want “I can explain every change to an auditor,” you’re home.

Interesting facts and context (the stuff that explains the culture)

  1. openSUSE has two personalities by design: Tumbleweed (rolling) and Leap (stable). That split is intentional, not a compromise.
  2. YaST predates most modern Linux installers: it’s been a defining SUSE trait for decades, and the storage module remains one of the best “see everything in one place” tools.
  3. Leap’s base is tied to SUSE’s enterprise work: the conservatism is inherited. That’s why it feels slower—and why it breaks less.
  4. Btrfs became a default on openSUSE earlier than most distros: not because it was fashionable, but because snapshots plus rollbacks reduce support load.
  5. Snapper’s “pre/post” snapshot model is change-management encoded in tooling: it turns package operations into auditable system states.
  6. zypper’s patch concept is a first-class workflow: it’s not just “update everything”; it’s “apply a set of vetted fixes.”
  7. AppArmor is historically a SUSE strength: different philosophy than SELinux, more profile-centric, generally easier to operationalize for mixed teams.
  8. Wicked exists because enterprise networking is messy: NetworkManager is great for laptops; servers often want declarative, boring behavior.
  9. openSUSE has long treated packaging and build pipelines as core competency: the culture expects reproducibility, not improvisation.

Decisions that matter before you boot the installer

Pick the right install target: workstation, server, or “one box that does everything”

Be honest. If this machine is a server, do not install a desktop “just in case.” GUIs are fine, but GUI stacks increase churn: more packages, more updates, more moving parts.

  • Server: minimal system, SSH, firewalld, NTP, your services.
  • Workstation: KDE Plasma is a strong default on Leap. GNOME is fine too. Pick one.
  • Single box lab: still install minimal + one lightweight UI if needed. Avoid installing “every desktop.” That’s how you end up debugging two audio stacks on a server.

Filesystem choice: Btrfs where it helps, XFS where it’s boring

Leap’s default root on Btrfs is a feature, not a stunt. Snapshots and rollbacks save you when updates go sideways. But don’t put everything on Btrfs just because it’s there.

My default:

  • / on Btrfs with Snapper enabled.
  • /home on XFS for a workstation (simple, fast, low drama) or on Btrfs if you value snapshotting user data and accept the operational overhead.
  • /var/lib for databases on XFS (or ext4) unless you have a clear reason to use Btrfs and you understand copy-on-write behavior and tuning.

Joke #1: Btrfs snapshots are like seatbelts—you only notice them when something tries to ruin your day.

Partitioning and boot: UEFI, GPT, and “don’t get creative”

Use UEFI if your hardware supports it. Use GPT. Create an EFI System Partition (ESP) of at least 512 MiB. Don’t share ESPs across too many OS installs unless you enjoy archaeology.

If you plan disk encryption, decide now: full-disk encryption for laptops, selective encryption for servers (often just data volumes). If you need unattended reboot after power loss, keep root unencrypted or use TPM-based unlocking with a plan you’ve tested.

Networking stack: Wicked for servers, NetworkManager for laptops

Wicked is great when you want “interface comes up the same way every time.” NetworkManager is great when you roam between Wi‑Fi networks. Don’t mix them without a reason.

Repository discipline: fewer repos, fewer mysteries

The fastest way to turn a stable distro into a fragile one is to add random repositories because a blog said so. Start with official repos. Add one extra only when you can explain why it exists and how you’ll remove it later.

Installation walkthrough with opinionated defaults

1) Boot media and firmware settings

Use the official installer ISO (offline installer is fine for servers; network installer is fine if your network is reliable). In firmware settings:

  • Enable UEFI mode.
  • Disable “RAID” mode if it’s actually fake RAID and you want Linux mdadm.
  • Leave Secure Boot enabled unless you have a concrete reason to disable it (custom kernel modules, niche drivers).

2) Install pattern selection: keep it minimal

On servers: start with a minimal install and add packages intentionally. On workstations: KDE Plasma is a practical default on Leap. It tends to behave and it’s well integrated.

3) Storage proposal: accept the spirit, edit the details

YaST will propose a layout. It’s often decent, but don’t rubber-stamp it. Your job is to make future operations easy:

  • ESP: 512 MiB–1 GiB (FAT32, mounted at /boot/efi).
  • Root: Btrfs, with snapshots enabled.
  • Swap: size depends on hibernation needs and RAM. For servers, small swap is fine; for laptops with hibernation, match RAM.
  • Data: separate partition/LV for /var/lib (containers, databases) if you’ll run them.

4) Users and SSH: set policy now

Create a normal user. If this is a server, avoid enabling password SSH logins. Plan for key-based authentication. If you must use passwords temporarily, set a date to remove it.

5) Time sync and hostname: boring details that bite later

Set the correct timezone. Enable NTP. Pick a hostname that won’t embarrass you in logs. “server2-final” is not a plan.

6) First reboot: verify boot and snapshot tooling

After install completes, don’t rush into customizing. First confirm boot is clean, networking is stable, and snapshots are working. Then proceed.

Storage plan: Btrfs, XFS, LVM, RAID, and what I’d deploy

Btrfs on /: treat it as an OS safety system

Leap’s Btrfs root with Snapper is designed around one idea: most failures happen during change. If you can roll back the OS state quickly, you reduce downtime and panic.

What Btrfs root is good at:

  • Snapshots before/after package operations.
  • Fast rollback after a bad update or misconfiguration.
  • Reasonable performance for OS files.

What Btrfs root is not automatically good at:

  • High-write databases without tuning and understanding of copy-on-write.
  • “Set and forget” when you never check snapshot retention and free space.

XFS for write-heavy data

XFS is the boring grown-up filesystem: scalable, mature, predictable under heavy write load. If you’re hosting databases, VM images, or container layers with heavy churn, XFS is usually the safer default.

LVM: optional, but helpful for humans

LVM isn’t required, but it’s often worth it on servers because it gives you flexible growth and clear separation between OS and data. YaST makes it manageable, and it’s a tool your future self will recognize.

Software RAID: mdadm for Linux-native RAID

If you need RAID1/10 for availability, Linux mdadm is solid and transparent. Avoid “fake RAID” unless you have a strong reason, because it tends to complicate recovery.

If you’re considering ZFS: it’s a great filesystem, but it’s not integrated into Leap’s default installer path the way Btrfs is. If you want “keep it for years” and minimal weirdness, stick with the native defaults unless you’re prepared to own the integration.

Snapshot retention: don’t let backups become a disk-full event

Snapshots are not backups. They live on the same disk. They protect you from bad changes, not from dead hardware. Still, they can save you from yourself—provided you manage retention and monitor free space.

First boot tasks that prevent future regret (commands included)

Below are practical tasks you can run right after installation. Each one includes: a command, what the output means, and the decision you make from it.

Task 1: Confirm OS version and kernel

cr0x@server:~$ cat /etc/os-release
NAME="openSUSE Leap"
VERSION="15.6"
ID="opensuse-leap"
PRETTY_NAME="openSUSE Leap 15.6"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.6"

What it means: You are actually on Leap 15.6, not the wrong ISO or a partially upgraded system.

Decision: If it’s not 15.6, stop and fix your base before doing anything else. “I’ll upgrade later” is how you inherit chaos.

Task 2: Check boot mode (UEFI vs legacy)

cr0x@server:~$ test -d /sys/firmware/efi && echo UEFI || echo BIOS
UEFI

What it means: UEFI is in use. Good for modern systems and predictable boot management.

Decision: If you expected UEFI and see BIOS, revisit firmware settings before you build on the install.

Task 3: Verify filesystems and mount options

cr0x@server:~$ findmnt -no SOURCE,FSTYPE,OPTIONS /
/dev/nvme0n1p3 btrfs rw,relatime,ssd,space_cache=v2,subvolid=256,subvol=/@/.snapshots/1/snapshot

What it means: Root is Btrfs and you’re on a snapshot subvolume (common with Snapper integration).

Decision: If root is not Btrfs and you wanted Snapper rollback, fix it now rather than after you’ve installed half a world of packages.

Task 4: Confirm Snapper is working and has a baseline snapshot

cr0x@server:~$ sudo snapper list
 # | Type   | Pre # | Date                     | User | Cleanup | Description | Userdata
---+--------+-------+--------------------------+------+---------+-------------+---------
0  | single |       |                          | root |         | current     | 
1  | single |       | 2026-02-05 09:18:24      | root |         | first root filesystem | 

What it means: Snapper sees at least one snapshot. The system can roll back OS state when needed.

Decision: If Snapper isn’t configured, decide whether you want it. On Leap, I usually do for /.

Task 5: Inspect block devices and confirm you’re on the disk you think you are

cr0x@server:~$ lsblk -o NAME,SIZE,TYPE,FSTYPE,MOUNTPOINTS
NAME        SIZE TYPE FSTYPE MOUNTPOINTS
nvme0n1   953.9G disk        
├─nvme0n1p1   1G part vfat   /boot/efi
├─nvme0n1p2  32G part swap   [SWAP]
└─nvme0n1p3 920G part btrfs  / /.snapshots

What it means: Partitioning matches expectations: ESP, swap, and Btrfs root.

Decision: If you see your data disk accidentally used for root, stop and correct it before you store anything valuable.

Task 6: Check free space the way Btrfs sees it

cr0x@server:~$ sudo btrfs filesystem usage /
Overall:
    Device size:                 920.00GiB
    Device allocated:             60.00GiB
    Device unallocated:          860.00GiB
    Used:                         18.50GiB
    Free (estimated):            900.00GiB      (min: 900.00GiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00

What it means: Plenty of headroom; allocations are sane.

Decision: If metadata is near full while “free space” looks fine, you need to rebalance or adjust snapshot retention. Btrfs can fail in surprising ways when metadata fills.

Task 7: Validate repositories (and catch “random repo” drift early)

cr0x@server:~$ sudo zypper lr -u
#  | Alias                       | Name                        | Enabled | GPG Check | Refresh | URI
---+-----------------------------+-----------------------------+---------+-----------+---------+---------------------------
1  | repo-oss                    | Main Repository (OSS)       | Yes     | (r ) Yes  | Yes     | https://download...
2  | repo-non-oss                | Main Repository (NON-OSS)   | Yes     | (r ) Yes  | Yes     | https://download...
3  | repo-update                 | Update Repository (OSS)     | Yes     | (r ) Yes  | Yes     | https://download...

What it means: You have a clean baseline: main repos plus updates.

Decision: If you see a pile of third-party repos on day one, ask who added them and why. Reduce now. Stability is a diet, not a vitamin.

Task 8: Apply security patches (patch-focused, not “YOLO upgrade”)

cr0x@server:~$ sudo zypper patch
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following 3 patches are going to be installed:
  openSUSE-SLE-15.6-2026-1234 security  openssl security update
  openSUSE-SLE-15.6-2026-2345 security  openssh security update
  openSUSE-SLE-15.6-2026-3456 recommended  systemd bugfix update

3 patches to install.
Overall download size: 4.2 MiB. Already cached: 0 B. After the operation, additional 12.0 MiB will be used.
Continue? [y/n/v/...? shows all options] (y): y

What it means: You’re applying curated patches. This tends to be safer than full upgrades when you want controlled change.

Decision: On servers, prefer zypper patch routinely; schedule full package upgrades when you can test and reboot.

Task 9: Confirm services and boot health

cr0x@server:~$ systemctl --failed
0 loaded units listed.

What it means: No systemd units are currently failed.

Decision: If you have failed units, fix them now while the system is fresh and the diff from “working” is small.

Task 10: Check the last boot logs for warnings that matter

cr0x@server:~$ sudo journalctl -b -p warning --no-pager | tail -n 8
Feb 05 09:22:11 server kernel: ACPI: \_SB.PCI0... AE_NOT_FOUND
Feb 05 09:22:12 server systemd[1]: Found device /dev/disk/by-uuid/...
Feb 05 09:22:14 server wicked[915]: eth0: link detected
Feb 05 09:22:15 server chronyd[1022]: System clock was stepped by 0.423 seconds

What it means: Warnings can be firmware noise or real problems. Chrony stepping time on first boot is normal.

Decision: If you see repeated disk I/O errors, network flaps, or filesystem warnings, investigate before you deploy services.

Task 11: Confirm time sync (because certificates and clusters hate time travel)

cr0x@server:~$ timedatectl status
               Local time: Thu 2026-02-05 09:25:21 UTC
           Universal time: Thu 2026-02-05 09:25:21 UTC
                 RTC time: Thu 2026-02-05 09:25:21
                Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no

What it means: Clock is synchronized and NTP is active.

Decision: If not synchronized, fix time before you debug TLS, Kerberos, or “random” auth failures.

Task 12: Confirm network management stack (Wicked vs NetworkManager)

cr0x@server:~$ systemctl is-active wicked NetworkManager
active
inactive

What it means: Wicked is managing the network, NetworkManager is not. That’s a sane server stance.

Decision: If both are active, pick one. Dual control is how you get “it works until reboot.”

Task 13: Confirm firewall status and default zone

cr0x@server:~$ sudo firewall-cmd --state
running
cr0x@server:~$ sudo firewall-cmd --get-default-zone
public

What it means: firewalld is active; default zone is public (typically restrictive).

Decision: Keep it running. Open only what you need. “I’ll disable the firewall while testing” has a long résumé of regret.

Task 14: Check CPU, memory pressure, and swap posture

cr0x@server:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:            31Gi       1.2Gi        27Gi       120Mi       2.8Gi        29Gi
Swap:           32Gi          0B        32Gi

What it means: Plenty of RAM free; swap unused (normal on idle system).

Decision: If swap is heavily used under normal load, you likely have memory pressure or a mis-sized workload.

Task 15: Check disk health quickly (NVMe example)

cr0x@server:~$ sudo smartctl -a /dev/nvme0n1 | sed -n '1,18p'
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.21-lp156.12-default] (local build)
=== START OF INFORMATION SECTION ===
Model Number:                       ACME NVMe 1TB
Serial Number:                      ABCD123456
Firmware Version:                   1.0.3
PCI Vendor/Subsystem ID:            0x1234
IEEE OUI Identifier:                0xdeadbe
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      0
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            54,321,098,752 [54.3 GB]

What it means: SMART data is readable; model and firmware are identified.

Decision: If SMART can’t read or shows critical warnings, don’t trust the disk with anything important.

Task 16: Confirm AppArmor is enabled

cr0x@server:~$ sudo aa-status | sed -n '1,10p'
apparmor module is loaded.
30 profiles are loaded.
28 profiles are in enforce mode.
2 profiles are in complain mode.
0 profiles are in kill mode.

What it means: AppArmor is active, most profiles are enforcing.

Decision: Keep it enforcing unless you have a specific compatibility issue. If you must loosen it, do it per-service, not globally.

Task 17: Verify SSH policy basics

cr0x@server:~$ sudo sshd -T | egrep '^(permitrootlogin|passwordauthentication|pubkeyauthentication)'
permitrootlogin no
passwordauthentication no
pubkeyauthentication yes

What it means: Root login is blocked; password SSH is disabled; keys are enabled.

Decision: This is the baseline you want for servers. If you need temporary password access, set a reminder to remove it the same day.

Update strategy: zypper, patches, and when to be conservative

Leap rewards predictable change windows. Don’t do rolling “upgrade everything whenever” if you want stability. Instead:

  • Routine cadence: apply security and recommended patches with zypper patch.
  • Controlled upgrades: schedule zypper update or zypper dup only when you mean to change more widely (and when you can reboot).
  • Snapshot-aware workflow: ensure Snapper creates pre/post snapshots for package operations. If something breaks, rollback is a plan, not a prayer.

Rolling back after a bad change

When an update breaks boot or a service, the recovery path should be procedural. On a Btrfs+Snapper root, you can often rollback.

cr0x@server:~$ sudo snapper list | tail -n 5
98  | pre    |       | 2026-02-05 10:12:01 | root | number | zypp(zypper patch) | 
99  | post   | 98    | 2026-02-05 10:13:12 | root | number | zypp(zypper patch) | 
100 | single |       | 2026-02-05 10:20:44 | root | number | before-nginx-config | 

What it means: You have a pre/post pair around patching and a manual “before config” snapshot.

Decision: If the system broke after patch #99, you know what to test or roll back to. You’re not guessing.

One guiding principle, from reliability engineering: Hope is not a strategy. — Gene Kranz

Security baseline: firewall, SSH, AppArmor, and logging

Firewall: open ports only when a service is ready

On servers, keep the default zone restrictive. Add services explicitly. Example: allow SSH and (only if needed) HTTP/HTTPS.

cr0x@server:~$ sudo firewall-cmd --permanent --add-service=ssh
success
cr0x@server:~$ sudo firewall-cmd --reload
success
cr0x@server:~$ sudo firewall-cmd --list-services
ssh

What it means: SSH is allowed; nothing else is implicitly exposed.

Decision: Don’t open ports for “future services.” Open ports when the service is installed, configured, and monitored.

SSH: keys, no root, minimal attack surface

Set PasswordAuthentication no, PermitRootLogin no. Keep a break-glass path: console access, or out-of-band management if this is real production.

Logging: you can’t debug what you didn’t record

Systemd’s journal is powerful, but don’t treat it like infinite storage. If this is a server, decide where logs live and how long you keep them. If you ship logs centrally, start early—retroactive log collection is a myth.

AppArmor: enforce, don’t disable

AppArmor is not just “security theater” on Leap; it’s integrated into how services are expected to run. If you run into a profile violation, use complain mode tactically for that service, learn what it needs, then re-enforce.

Three corporate mini-stories (what actually goes wrong)

Incident: the wrong assumption (“snapshots are backups”)

A mid-sized company ran a few internal services on a Leap-based VM host. They loved Btrfs snapshots. They had weekly snapshots and felt safe. The admin team even practiced rolling back a broken package update once. Confidence went up.

Then a storage array had a controller issue. Not a full meltdown—just enough corruption and timeouts that the hypervisor marked the datastore unhealthy. VMs were paused, then some disks came back, then went away again. It was the worst kind of failure: intermittent and noisy.

The team’s first instinct was “roll back snapshots.” That worked for one VM… briefly. Then the underlying storage faults reappeared. Snapshots, of course, live on the same storage. They protect against bad changes, not against bad hardware or a bad day in the SAN.

The long night was spent restoring from an actual backup system that they did have—just not as recently tested as it should’ve been. Their postmortem was blunt: the assumption wasn’t malicious, it was just lazy semantics. Snapshot is not backup. Same disk means same failure domain.

What changed afterward was boring and correct: monthly restore tests, and a dashboard widget that shows backup freshness alongside snapshot counts. The team stopped arguing over words and started measuring recovery.

Optimization that backfired: “let’s tune for speed”

A different org had a Leap 15.x fleet running containerized services. Someone noticed disk usage and I/O spikes during peak hours and decided the system was “wasting time” on copy-on-write behavior for container layers.

They made a fast change: they moved container storage onto the Btrfs root and started experimenting with mount options and aggressive cleanup policies. The system felt faster under synthetic tests. Everyone was pleased. A ticket was closed with the word “optimized.”

Two weeks later, the first real incident: a node rebooted after patching and came up with filesystem complaints and missing container layers. Not always—just sometimes. It turned out the cleanup policy, combined with snapshot retention and churn, created metadata pressure and fragmentation patterns that made recovery unpredictable.

They rolled back to a simpler design: keep Btrfs for the OS, put container and database write-heavy paths on XFS, and keep snapshot retention sensible. Performance returned to “slightly less exciting,” and availability returned to “boringly consistent.” The team also learned a lesson that should be stitched on a pillow: optimization without an exit strategy is just experimentation in production.

Boring practice that saved the day: change windows + pre-flight checks

A small finance-adjacent shop ran Leap servers for internal apps. They were not flashy. They had a monthly patch window, a checklist, and a habit of taking a manual snapshot with a human-readable label before any meaningful change.

One month, a routine patch introduced a subtle regression for a niche driver used by their scanning workflow. The service didn’t fail hard. It slowed, then hung under load. Users reported “it’s glitchy” rather than “it’s down.” Those are the tickets that rot your day.

Because they had a checklist, the on-call didn’t start debugging the scanner vendor’s app. They followed the script: confirm the regression correlates with the patch, compare logs between the pre/post snapshot, and rollback the OS snapshot on one canary server.

The canary was instantly healthy. They rolled back the remaining servers, pinned the offending package version, and filed a vendor ticket with logs and exact package versions. No drama. No heroics. Just a team that treated operations like a process, not a vibe.

Fast diagnosis playbook (find the bottleneck fast)

When a Leap system feels slow or “broken,” you don’t start by reinstalling packages. You start by locating the constraint. Here’s the order that consistently finds truth quickly.

First: is it a systemic failure or one service?

cr0x@server:~$ uptime
 09:41:03 up  1:12,  2 users,  load average: 6.21, 5.98, 5.10

Interpretation: High load might mean CPU saturation, runnable queue buildup, or blocked I/O (load includes uninterruptible sleep).

Decision: If load is high, move to CPU/memory/I/O triage. If load is normal, focus on the specific service and its dependencies.

Second: CPU vs memory vs I/O (pick the right war)

cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 2780000  42000 820000    0    0    12    25  220  410  8  2 89  1  0
 7  3      0  120000  43000 790000    0    0  8800  9200 1200 3500 10  5 40 45  0

Interpretation: The b column (blocked processes) and high wa (I/O wait) point toward storage bottlenecks.

Decision: If wa is high, stop blaming the app. Look at disk latency, filesystem, and underlying hardware.

Third: find the hottest process, then verify it’s the cause

cr0x@server:~$ top -b -n 1 | head -n 15
top - 09:42:55 up  1:14,  2 users,  load average: 6.80, 6.05, 5.20
Tasks: 214 total,   3 running, 211 sleeping,   0 stopped,   0 zombie
%Cpu(s): 11.0 us,  4.0 sy,  0.0 ni, 40.0 id, 45.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  32000.0 total,   118.0 free,  1250.0 used,  30632.0 buff/cache
MiB Swap:  32768.0 total, 32768.0 free,     0.0 used.  30100.0 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 3221 postgres  20   0 3248000 220000  12000 D  12.0   0.7   0:34.11 postgres

Interpretation: Process in state D (uninterruptible sleep) plus high I/O wait strongly suggests storage stalls.

Decision: Move to storage checks. Don’t restart services blindly; you’ll just add recovery load to a struggling disk.

Fourth: storage latency and errors (the quiet killers)

cr0x@server:~$ iostat -xz 1 3
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
         10.70    0.00    4.10   44.90    0.00   40.30

Device            r/s     w/s   rkB/s   wkB/s  rrqm/s  wrqm/s  %util  r_await  w_await
nvme0n1          85.0   120.0  9800.0 11000.0     0.0     5.0   98.0    12.4    35.8

Interpretation: %util near 100 and high await indicate the disk is saturated or unhealthy, or you’re hitting a queueing issue.

Decision: If latency is high, check SMART, dmesg/journal for I/O errors, and workload patterns. Consider moving write-heavy paths off Btrfs root.

Fifth: network, but only after you exonerate disk and CPU

cr0x@server:~$ ss -s
Total: 278
TCP:   41 (estab 25, closed 8, orphaned 0, timewait 6)

Transport Total     IP        IPv6
RAW       0         0         0
UDP       7         5         2
TCP       33        22        11
INET      40        27        13
FRAG      0         0         0

Interpretation: Gives you a quick view of socket load; not a deep analysis but good for “are we drowning in connections?”

Decision: If connection counts look abnormal, investigate the service and upstream load balancers, then check NIC errors and DNS.

Common mistakes: symptoms → root cause → fix

1) “After updates, the system won’t boot”

Symptoms: Boot drops to emergency shell, or service stack won’t start after patching.

Root cause: Kernel/initramfs mismatch, bad driver update, or a misconfigured filesystem mount introduced during change.

Fix: Use Snapper rollback (if on Btrfs root) to revert to pre-update snapshot; then re-apply patches with a canary approach.

2) “Disk is full but df shows space” (Btrfs edition)

Symptoms: Writes fail; services crash; df -h looks fine.

Root cause: Btrfs metadata exhaustion or snapshot retention consuming allocated space.

Fix: Check btrfs filesystem usage /, prune Snapper snapshots, and consider rebalancing. Also set sane snapshot cleanup policies.

3) “Networking works until reboot”

Symptoms: Manual ip addr add works; after reboot it’s gone; sometimes Wi‑Fi disappears.

Root cause: Competing network managers (Wicked and NetworkManager), or config edited in one tool but managed by another.

Fix: Choose one manager. On servers: disable NetworkManager. On laptops: disable Wicked. Keep configs in the appropriate place.

4) “SSH suddenly rejects logins”

Symptoms: Keys that used to work stop working after hardening or updates.

Root cause: Permissions/ownership on ~/.ssh or authorized_keys too open; or user home on a filesystem mounted with unexpected options.

Fix: Fix permissions and confirm sshd config. Validate with sshd -T and journal logs.

5) “AppArmor broke my service”

Symptoms: Service starts then fails, with denied operations in logs.

Root cause: AppArmor profile blocks a new path, socket, or capability introduced by config changes.

Fix: Put that profile into complain mode temporarily, adjust profile, then return to enforce. Do not disable AppArmor globally.

6) “zypper conflicts and dependency hell”

Symptoms: zypper wants to vendor-change packages or remove half the system.

Root cause: Too many repositories, mismatched priorities, or mixing Leap and incompatible third-party repos.

Fix: Remove or disable non-essential repos, align vendor, and keep repository set minimal. If you must use extras, document ownership and rollback plan.

Checklists / step-by-step plan

Install checklist (30 minutes, no heroics)

  1. Firmware: UEFI enabled, correct disk mode selected, Secure Boot policy decided.
  2. Install target: server minimal vs workstation desktop chosen consciously.
  3. Storage: ESP ≥ 512 MiB, root on Btrfs, decide /home and /var/lib placement.
  4. Networking manager: Wicked for server, NetworkManager for laptop.
  5. Users: create non-root admin; plan SSH keys.
  6. Time: enable NTP, set timezone.
  7. Reboot and verify: boot clean, repositories sane, Snapper operational.

Post-install hardening checklist (first day)

  1. Patch: run zypper patch; reboot if kernel/systemd updates landed.
  2. SSH: disable password auth; disable root login; verify with sshd -T.
  3. Firewall: allow only required services; confirm zone and active rules.
  4. Monitoring: at least collect journalctl, disk SMART health, and filesystem usage.
  5. Snapshots: confirm retention policy and ensure free space margin.
  6. Backups: implement real backups; test restore.

Storage sanity checklist (before running databases/containers)

  1. Confirm where write-heavy data will live (prefer XFS for churn-heavy paths).
  2. Ensure you have separate filesystems/volumes if you need blast-radius control.
  3. Validate disk health with SMART and review kernel logs for I/O errors.
  4. Decide on RAID level and failure domain; test a degraded scenario if possible.

Joke #2: If you don’t test your restores, your backup system is just a very expensive place to store optimism.

FAQ

Should I choose Leap 15.6 or Tumbleweed?

If the machine must be predictable and low-maintenance, pick Leap. If you need newest kernels, drivers, and developer stacks and can tolerate churn, pick Tumbleweed.

Is Btrfs safe as a root filesystem?

Yes, especially on Leap where it’s a default and integrated with Snapper. Use it as an OS safety net. For heavy-write data, consider XFS on separate volumes.

Are snapshots the same as backups?

No. Snapshots are same-disk point-in-time views. Backups must live on separate storage (and ideally a separate fault domain) and be restore-tested.

Should I use Wicked or NetworkManager?

Servers: Wicked. Laptops/desktops with Wi‑Fi roaming: NetworkManager. Pick one to avoid conflicting behavior.

What’s the safest update command on Leap?

zypper patch for routine security/recommended fixes. Schedule broader updates deliberately, and reboot when core components change.

How do I roll back after a bad update?

If root is Btrfs with Snapper, use snapshots: identify the pre-change snapshot and roll back. If you’re not using Snapper, your rollback is “restore from backup or rebuild,” which is slower and riskier.

Do I need to disable AppArmor to run containers or web services?

Usually no. If a profile blocks your service, adjust that profile or set it to complain while you diagnose. Disabling AppArmor globally is a blunt instrument.

What’s the best filesystem for a database on Leap?

Commonly XFS (or ext4) on a dedicated volume. If you use Btrfs, do it knowingly and test performance and failure behavior under your workload.

How do I keep my repository setup stable?

Keep repos minimal, prefer official sources, and avoid mixing incompatible repos. If you add a repo, document why, what it provides, and how to remove it cleanly.

What’s the quickest way to diagnose “the server is slow”?

Check load and I/O wait, then disk latency and SMART, then memory pressure, then network. Most “mystery slowness” is storage or a single misbehaving service.

Conclusion: practical next steps

If you want a Linux install you can keep for years, build it like you’ll have to explain it later—because you will. Use Leap 15.6’s strengths: YaST for clarity, Btrfs snapshots for rollback, zypper patches for controlled change, and AppArmor for pragmatic hardening.

Next steps that pay off immediately:

  1. Run through the post-install tasks above and fix any surprises while the system is still “new.”
  2. Decide where write-heavy data lives, and move it off the OS filesystem if needed.
  3. Set a patch cadence and do the first reboot on your schedule, not during an incident.
  4. Implement real backups and perform one restore test this week. Not later.
  5. Write down your repo list, storage layout, and network manager choice in a single ops note you can find in a year.

That’s the stable setup: not perfect, not flashy, just reliably survivable.

← Previous
Fresh Install Checklist: 20 Minutes Now Saves 20 Hours Later
Next →
SSH Keys in WSL: Secure Setup (and How Not to Leak Them)

Leave a comment