Batch Rename Files Safely: The Script That Won’t Trash Names

Was this helpful?

Batch renames are the kind of “small change” that shows up on your calendar like a harmless dentist appointment and leaves like a root canal. It’s never just renaming. It’s collisions, weird characters, case-folding, cross-filesystem moves, apps that cache paths, and that one directory where someone stored “final_FINAL_v7 (use this).docx”.

If you’re going to rename hundreds or millions of files, you need a plan that assumes failure. Not because you’re careless—because filesystems, human naming habits, and corporate timelines are all adversarial. This is the practical, production-minded approach: dry runs that mean something, two-phase renames to avoid collisions, audit logs for rollback, and performance checks so you don’t turn your NAS into a space heater.

Non-negotiables for safe renames

1) A rename is metadata—until it isn’t

On a single filesystem, rename() is usually “just metadata”: fast, atomic, and not copying file contents. That’s the good news. The bad news is that your tool might not be doing a pure rename. If your “rename” crosses filesystem boundaries—say you move files from one mount to another—it can degrade into copy+delete. That changes timing, failure behavior, permissions, and your recovery options.

Decision: treat every bulk rename as a change management event. You plan for atomicity, but you verify you’re actually getting it.

2) Dry-run means “predict exactly what will happen”

A dry-run that prints a vague list is not a dry-run. Your dry-run should produce:

  • a one-to-one mapping of source path → destination path
  • a collision report (including case-fold collisions)
  • a rollback plan (a reverse mapping you can replay)
  • a stable ordering (so reruns match)

Dry-run output should be something you can diff, review, and sign off. If you can’t explain why one specific file will become one specific new name, you’re not dry-running. You’re hoping.

3) Two-phase rename prevents self-collisions

The classic faceplant: renaming a.txt to A.txt on a case-insensitive filesystem, or renaming multiple files into the same normalized form (spaces to underscores, lowercasing, stripping punctuation). The fix is boring and correct: rename everything to temporary unique names first, then to final names.

4) Logs are not optional—rollback needs receipts

You want an append-only log with the exact mv operations performed, in order, plus any skips and errors. In incident land, the difference between “we can roll back” and “we think we can roll back” is an audit file you can replay.

5) “Works on my laptop” is not a filesystem strategy

Filesystem semantics vary: ext4, XFS, ZFS, SMB shares, NFS mounts, object-backed FUSE mounts—each has quirks. Some are case-insensitive, some have odd limits, some are slow at directory operations when huge. Your rename plan must include a quick read of what you’re standing on.

One quote to keep on your desk: “Hope is not a strategy.” — Gene Kranz

Interesting facts and small history

  1. POSIX rename atomicity: On a single filesystem, rename() is designed to be atomic: you don’t get a half-renamed filename. That’s why it’s the backbone of safe file updates.
  2. Windows drove case-insensitive expectations: Case-insensitive filesystems became common in desktop land, and they still surprise engineers when code assumes Report.csv and report.csv are distinct.
  3. Early Unix tools assumed “no spaces”: A lot of classic shell patterns were born in an era when filenames with spaces were considered user error. Today they’re Tuesday.
  4. SMB and NFS can add latency to metadata: A rename is metadata, but remote metadata can be slow and chatty, especially over loaded links or with strict consistency.
  5. Directory size matters: Many filesystems handle huge directories well now, but operations like listing and stat’ing millions of entries still become the bottleneck during planning and verification.
  6. Unicode normalization is a real trap: Some platforms normalize Unicode differently, so a “visually identical” name can be a different byte sequence. Bulk renames can accidentally “dedupe” by collision.
  7. Hard links complicate expectations: Renaming a hard-linked file changes the directory entry name, not the inode. If you expected “two files,” you might actually have one inode with two names.
  8. Snapshots changed the rename game: With ZFS and similar systems, taking a snapshot before the rename makes rollback “restore names” more feasible—sometimes by just rolling back the dataset.

What actually goes wrong (and why)

Collision: two old names become one new name

Sanitizing names is where collisions breed. Convert spaces to underscores, lowercase everything, strip punctuation, and suddenly:

  • ACME - Q4.csv
  • ACME_Q4.csv
  • acme q4.csv

…all want to become acme_q4.csv. If your script “last write wins,” you just lost data without touching file contents. The file is there. The name is wrong. That’s worse: silent corruption of meaning.

Case-only changes on case-insensitive mounts

On a case-insensitive filesystem, renaming foo to Foo may be treated as “no-op” or may require a dance (rename to temp, then rename to desired). Remote shares make this even more fun.

Cross-filesystem “rename” becomes copy+delete

If you use tools that move files between mounts while “renaming,” you’re not doing atomic metadata updates anymore. You’re copying data, consuming bandwidth, and creating a failure mode where you have partial copies and missing originals.

Permissions and ownership drift

A pure rename keeps inode metadata. A copy+delete doesn’t. Suddenly the “same file” has a new owner, different ACLs, and a different SELinux context. This is how you get an outage caused by a rename.

Apps don’t like their paths changing

Some apps store absolute paths in databases, config files, or caches. Renaming files underneath them is like moving someone’s desk while they’re in a meeting. It’s hilarious until you’re the one presenting.

Joke #1: Renaming files in production is like reorganizing the kitchen at 2 a.m.—you’ll remember where nothing is, and everyone will find out at breakfast.

Practical tasks: commands, outputs, decisions

Below are real operational tasks you run before you rename, during the run, and after you think you’re done. Each task includes a command, sample output, and the decision you make from it.

Task 1: Confirm where the data lives (don’t cross filesystems by accident)

cr0x@server:~$ df -Th /data/projects
Filesystem     Type  Size  Used Avail Use% Mounted on
tank/projects  zfs   8.0T  5.1T  2.9T  64% /data/projects

What it means: You’re on ZFS dataset tank/projects. If you rename within this mount, it’s metadata-only and snapshot-friendly.

Decision: Ensure your script does not move files to a different mount path. Keep source and destination under the same filesystem.

Task 2: Check mount options and whether it’s a network share

cr0x@server:~$ mount | grep -E ' /data/projects | type (nfs|cifs)'
tank/projects on /data/projects type zfs (rw,xattr,noacl)

What it means: Local ZFS, not NFS/SMB. Rename latency should be stable.

Decision: If you see NFS/CIFS, schedule the rename for off-peak and expect metadata round trips.

Task 3: Count files and directories (scope the blast radius)

cr0x@server:~$ find /data/projects/clientA -type f | wc -l
284913

What it means: ~285k files. Planning steps that do a full stat() on every file will take time.

Decision: Use efficient scans; avoid per-file subprocesses if you can. Consider batching or parallelism carefully (more on that later).

Task 4: Identify “problem characters” in filenames (spaces, newlines, control chars)

cr0x@server:~$ find /data/projects/clientA -type f -name $'*\n*' -print | head
/data/projects/clientA/inbox/weird
name.txt

What it means: You have at least one filename containing a newline. That’s not theoretical; it’s right there.

Decision: Your pipeline must be NUL-delimited (-print0 + read -d '') and your logs must be unambiguous (escape or NUL-safe formats).

Task 5: Check for leading dashes (tools interpret them as options)

cr0x@server:~$ find /data/projects/clientA -type f -name '-*' | head
/data/projects/clientA/inbox/-final.pdf

What it means: At least one file begins with -.

Decision: Always use mv -- "$src" "$dst" so filenames are never parsed as flags.

Task 6: Detect case-fold collisions that will bite you on SMB or macOS

cr0x@server:~$ find /data/projects/clientA -maxdepth 2 -type f -printf '%f\n' | awk '{print tolower($0)}' | sort | uniq -d | head
readme.txt

What it means: There are at least two files whose names differ only by case somewhere in that shallow scan.

Decision: If the destination environment is case-insensitive (or might be), you need a normalization policy and collision resolution (suffixes, hashing, or preserving original case).

Task 7: Find duplicate basenames after your intended normalization (preview collisions)

cr0x@server:~$ find /data/projects/clientA -type f -printf '%p\n' | \
awk -F/ '{
  base=$NF
  norm=tolower(base)
  gsub(/[^a-z0-9._-]+/,"_",norm)
  print norm
}' | sort | uniq -c | awk '$1>1{print $0}' | head
2 acme_q4.csv
3 invoice_2024_01.pdf

What it means: Your proposed normalization will create collisions for at least two names.

Decision: Decide collision policy now: skip, suffix with __DUP2, include parent directory, or append a short hash.

Task 8: Take a snapshot (if available) before changing names

cr0x@server:~$ zfs snapshot tank/projects@pre_rename_clientA
cr0x@server:~$ zfs list -t snapshot -o name,creation | grep pre_rename_clientA
tank/projects@pre_rename_clientA  Mon Feb  5 10:12 2026

What it means: You now have a point-in-time snapshot of the dataset.

Decision: If the rename goes sideways, you can roll back the dataset (big hammer) or use snapshot browsing to restore names.

Task 9: Measure baseline metadata performance (so you can spot bottlenecks)

cr0x@server:~$ /usr/bin/time -f 'elapsed=%E cpu=%P' bash -c 'find /data/projects/clientA -maxdepth 2 -type f -print0 | xargs -0 -n 1000 stat >/dev/null'
elapsed=0:08.41 cpu=92%

What it means: Stat’ing a sample is fast and CPU-heavy (good sign: local metadata, not network-limited).

Decision: If elapsed time is huge and CPU is low, you’re I/O or network bound. Plan smaller batches and off-peak execution.

Task 10: Check inode exhaustion (yes, still happens)

cr0x@server:~$ df -i /data/projects
Filesystem     Inodes   IUsed   IFree IUse% Mounted on
tank/projects       -       -       -     - /data/projects

What it means: On ZFS, inode reporting differs; you won’t get classic inode counts.

Decision: On ext4/XFS you would verify inode availability. On ZFS, focus on space and metadata performance instead.

Task 11: Build a deterministic file list (stable ordering for repeatability)

cr0x@server:~$ cd /data/projects/clientA
cr0x@server:~$ find . -type f -print0 | sort -z > /tmp/clientA.files.zlist
cr0x@server:~$ python3 - <<'PY'
import os
p="/tmp/clientA.files.zlist"
print("bytes", os.path.getsize(p))
PY
bytes 19844712

What it means: You now have a stable NUL-delimited list you can reuse for dry-run and execution.

Decision: Freeze scope. If new files arrive during the rename window, handle them in a second pass—don’t chase a moving target.

Task 12: Run a real dry-run mapping (source → dest) and inspect it

cr0x@server:~$ bash safe_rename.sh --plan --root /data/projects/clientA --rule 'lower,sanitize' --out /tmp/rename.plan.tsv
Plan written: /tmp/rename.plan.tsv
Rows: 284913
Collisions: 17
Skips: 0

What it means: Your plan found 17 collisions. Good: it didn’t pretend everything was fine.

Decision: Stop and resolve collisions before executing. Either adjust rules or choose a collision policy.

Task 13: Verify collisions are truly problematic (spot-check)

cr0x@server:~$ awk -F'\t' '$4=="COLLISION"{print $0}' /tmp/rename.plan.tsv | head -3
/data/projects/clientA/inbox/ACME - Q4.csv	/data/projects/clientA/inbox/acme_q4.csv	COLLISION	with:/data/projects/clientA/inbox/ACME_Q4.csv
/data/projects/clientA/inbox/ACME_Q4.csv	/data/projects/clientA/inbox/acme_q4.csv	COLLISION	with:/data/projects/clientA/inbox/ACME - Q4.csv
/data/projects/clientA/finance/Invoice 2024-01.pdf	/data/projects/clientA/finance/invoice_2024-01.pdf	COLLISION	with:/data/projects/clientA/finance/Invoice_2024_01.pdf

What it means: Two different original names converge. This is not a false positive.

Decision: Choose: append hash suffixes, preserve one naming variant, or segregate duplicates into a quarantine directory.

Task 14: Execute a small canary subset first

cr0x@server:~$ head -z -n 2000 /tmp/clientA.files.zlist > /tmp/clientA.canary.zlist
cr0x@server:~$ bash safe_rename.sh --apply --filelist /tmp/clientA.canary.zlist --root /data/projects/clientA --rule 'lower,sanitize' --log /tmp/rename.canary.log
Applied.
Renamed: 1983
Unchanged: 17
Errors: 0

What it means: A canary run succeeded. Some files were unchanged (already normalized or excluded).

Decision: Validate downstream apps and permissions now, before touching the other 282k files.

Task 15: Validate that the rename didn’t change file contents (sampling)

cr0x@server:~$ awk -F'\t' '$3=="RENAMED"{print $1"\t"$2}' /tmp/rename.canary.log | head -5
/data/projects/clientA/inbox/Report 1.txt	/data/projects/clientA/inbox/report_1.txt
/data/projects/clientA/inbox/Spec (Draft).pdf	/data/projects/clientA/inbox/spec_draft_.pdf
/data/projects/clientA/inbox/notes!.md	/data/projects/clientA/inbox/notes_.md
/data/projects/clientA/inbox/Q4#plan.xlsx	/data/projects/clientA/inbox/q4_plan.xlsx
/data/projects/clientA/inbox/Team Photo.JPG	/data/projects/clientA/inbox/team_photo.jpg
cr0x@server:~$ src="/data/projects/clientA/inbox/Report 1.txt"
cr0x@server:~$ dst="/data/projects/clientA/inbox/report_1.txt"
cr0x@server:~$ sha256sum "$dst"
8c1b5e9b6bbfe19c1f0f3f51c2d7e1ce0b41b2dcf2a44d26e7e4a6e93c39d9d0  /data/projects/clientA/inbox/report_1.txt

What it means: You can hash the renamed file. For pure rename, contents should match, but you’re sampling to catch accidental copy/transcode workflows.

Decision: If hashes differ for “renamed” files, you’re not doing renames—you’re doing something else. Stop.

Task 16: Monitor rename progress and error rate live

cr0x@server:~$ tail -f /tmp/rename.full.log
2026-02-05T10:22:11Z	RENAMED	/data/projects/clientA/inbox/Team Photo.JPG	/data/projects/clientA/inbox/team_photo.jpg
2026-02-05T10:22:11Z	UNCHANGED	/data/projects/clientA/inbox/readme.txt	/data/projects/clientA/inbox/readme.txt
2026-02-05T10:22:11Z	ERROR	/data/projects/clientA/legal/Contract?.pdf	Permission denied

What it means: You’re seeing one permission failure.

Decision: Don’t keep plowing forward if errors cluster. Decide whether to skip and report, or fix permissions and retry.

The script: dry-run first, two-phase rename, rollback log

This script is built for the messy world: spaces, newlines, leading dashes, collisions, and filesystems that don’t share your optimism. It supports:

  • Plan mode: produces a TSV plan with collision detection
  • Apply mode: performs a two-phase rename to avoid collisions
  • Rollback: uses the log to reverse the operation
  • Stable scope: optional NUL-delimited file list so you rename exactly what you reviewed

Opinion: if you’re renaming more than a few hundred files, do not “just run a one-liner.” One-liners are great until you need to explain to Audit why revenue_final.xlsx vanished. Use a script with logs.

cr0x@server:~$ cat safe_rename.sh
#!/usr/bin/env bash
set -euo pipefail

usage() {
  cat <<'USAGE'
safe_rename.sh --plan|--apply|--rollback
  --root PATH                 Root directory containing files to rename
  --rule RULES                Comma-separated rules: lower,sanitize
  --out PLAN.tsv              (plan) output plan TSV
  --log LOG.tsv               (apply/rollback) operation log
  --filelist FILE.zlist       Optional NUL-delimited, sorted file list (relative or absolute)
  --collision-policy POLICY   one of: error, suffix-hash
USAGE
}

mode=""
root=""
rules=""
out=""
log=""
filelist=""
collision_policy="error"

while [[ $# -gt 0 ]]; do
  case "$1" in
    --plan|--apply|--rollback) mode="${1#--}"; shift ;;
    --root) root="$2"; shift 2 ;;
    --rule) rules="$2"; shift 2 ;;
    --out) out="$2"; shift 2 ;;
    --log) log="$2"; shift 2 ;;
    --filelist) filelist="$2"; shift 2 ;;
    --collision-policy) collision_policy="$2"; shift 2 ;;
    -h|--help) usage; exit 0 ;;
    *) echo "Unknown arg: $1" >&2; usage; exit 2 ;;
  esac
done

[[ -n "$mode" ]] || { echo "Missing mode" >&2; usage; exit 2; }
[[ "$mode" == "rollback" || -n "$root" ]] || { echo "Missing --root" >&2; exit 2; }
[[ -n "$rules" || "$mode" == "rollback" ]] || { echo "Missing --rule" >&2; exit 2; }

ts() { date -u +"%Y-%m-%dT%H:%M:%SZ"; }

norm_name() {
  local name="$1"
  local outname="$name"

  IFS=',' read -r -a rr <<< "$rules"
  for r in "${rr[@]}"; do
    case "$r" in
      lower) outname="$(printf '%s' "$outname" | tr '[:upper:]' '[:lower:]')" ;;
      sanitize)
        outname="$(printf '%s' "$outname" | sed -E 's/[^a-zA-Z0-9._-]+/_/g; s/^_+//; s/_+$//')"
        ;;
      *) echo "Unknown rule: $r" >&2; exit 2 ;;
    esac
  done
  printf '%s' "$outname"
}

sha8() {
  # short stable suffix for collision avoidance
  printf '%s' "$1" | sha256sum | awk '{print substr($1,1,8)}'
}

emit_file_list() {
  # Output NUL-delimited absolute file paths
  if [[ -n "$filelist" ]]; then
    # filelist may contain relative entries; anchor under root if relative
    while IFS= read -r -d '' p; do
      if [[ "$p" = /* ]]; then
        printf '%s\0' "$p"
      else
        printf '%s\0' "$root/$p"
      fi
    done < "$filelist"
  else
    find "$root" -type f -print0
  fi
}

plan() {
  [[ -n "$out" ]] || { echo "Missing --out" >&2; exit 2; }
  : > "$out"

  declare -A seen=()
  local collisions=0 rows=0 skips=0

  while IFS= read -r -d '' src; do
    rows=$((rows+1))
    local dir base dstbase dst status extra
    dir="$(dirname -- "$src")"
    base="$(basename -- "$src")"
    dstbase="$(norm_name "$base")"
    dst="$dir/$dstbase"

    if [[ "$src" == "$dst" ]]; then
      status="UNCHANGED"; extra=""
      skips=$((skips+1))
    else
      # collision check: destination path already targeted by another source
      if [[ -n "${seen[$dst]:-}" ]]; then
        status="COLLISION"; extra="with:${seen[$dst]}"
        collisions=$((collisions+1))
      else
        status="OK"; extra=""
      fi
      seen[$dst]="$src"
    fi

    printf '%s\t%s\t%s\t%s\n' "$src" "$dst" "$status" "$extra" >> "$out"
  done < <(emit_file_list)

  echo "Plan written: $out"
  echo "Rows: $rows"
  echo "Collisions: $collisions"
  echo "Skips: $skips"

  if [[ "$collisions" -gt 0 && "$collision_policy" == "error" ]]; then
    echo "Refusing to proceed with collisions under policy=error." >&2
  fi
}

apply() {
  [[ -n "$log" ]] || { echo "Missing --log" >&2; exit 2; }

  local planfile
  planfile="$(mktemp /tmp/rename.plan.XXXXXX.tsv)"
  out="$planfile"
  plan >/tmp/rename.plan.summary

  if grep -q $'\tCOLLISION\t' "$planfile"; then
    if [[ "$collision_policy" == "error" ]]; then
      echo "Collisions detected; aborting. See: $planfile" >&2
      exit 3
    fi
  fi

  : > "$log"

  # Phase 1: rename to temporary unique names in same directory
  # We only rename items that will change.
  while IFS=$'\t' read -r src dst status extra; do
    [[ "$status" == "OK" ]] || continue

    local dir base tmp
    dir="$(dirname -- "$src")"
    base="$(basename -- "$src")"
    tmp="$dir/.rename_tmp.$(sha8 "$src").$base"

    # Ensure temp name doesn't exist
    if [[ -e "$tmp" ]]; then
      echo -e "$(ts)\tERROR\t$src\t$temp_exists" >> "$log"
      echo "Temp already exists: $tmp" >&2
      exit 4
    fi

    mv -- "$src" "$tmp"
    printf '%s\tPHASE1\t%s\t%s\n' "$(ts)" "$tmp" "$dst" >> "$log"
  done < "$planfile"

  # Phase 2: rename temp to final, with optional collision suffixing
  while IFS=$'\t' read -r src dst status extra; do
    if [[ "$status" == "UNCHANGED" ]]; then
      printf '%s\tUNCHANGED\t%s\t%s\n' "$(ts)" "$src" "$dst" >> "$log"
      continue
    fi
    [[ "$status" == "OK" || "$status" == "COLLISION" ]] || continue

    local dir base tmp final
    dir="$(dirname -- "$src")"
    base="$(basename -- "$src")"
    tmp="$dir/.rename_tmp.$(sha8 "$src").$base"
    final="$dst"

    if [[ "$status" == "COLLISION" && "$collision_policy" == "suffix-hash" ]]; then
      # append hash before extension
      local b ext h
      h="$(sha8 "$src")"
      b="$(basename -- "$dst")"
      ext=""
      if [[ "$b" == *.* && "$b" != .* ]]; then
        ext=".${b##*.}"
        b="${b%.*}"
      fi
      final="$dir/${b}__${h}${ext}"
    elif [[ "$status" == "COLLISION" ]]; then
      printf '%s\tERROR\t%s\t%s\n' "$(ts)" "$src" "collision" >> "$log"
      continue
    fi

    if [[ -e "$final" ]]; then
      printf '%s\tERROR\t%s\t%s\n' "$(ts)" "$src" "dest_exists:$final" >> "$log"
      continue
    fi

    mv -- "$tmp" "$final"
    printf '%s\tRENAMED\t%s\t%s\n' "$(ts)" "$src" "$final" >> "$log"
  done < "$planfile"

  echo "Applied."
  echo "Renamed: $(awk -F'\t' '$2=="RENAMED"{c++} END{print c+0}' "$log")"
  echo "Unchanged: $(awk -F'\t' '$2=="UNCHANGED"{c++} END{print c+0}' "$log")"
  echo "Errors: $(awk -F'\t' '$2=="ERROR"{c++} END{print c+0}' "$log")"
}

rollback() {
  [[ -n "$log" ]] || { echo "Missing --log" >&2; exit 2; }
  # Roll back in reverse order; only RENAMED and PHASE1 entries matter.
  tac "$log" | while IFS=$'\t' read -r t action a b; do
    case "$action" in
      RENAMED)
        # a=src(original), b=final; move final back to original name
        if [[ -e "$b" ]]; then
          mv -- "$b" "$a"
          printf '%s\tROLLED_BACK\t%s\t%s\n' "$(ts)" "$b" "$a"
        else
          printf '%s\tROLLBACK_MISSING\t%s\t%s\n' "$(ts)" "$b" "$a"
        fi
        ;;
      PHASE1)
        # a=temp, b=dst planned; if temp still exists, move back to original? not stored here
        # We don't have original in PHASE1 entry; rollback relies primarily on RENAMED lines.
        :
        ;;
      *) : ;;
    esac
  done
}

case "$mode" in
  plan) plan ;;
  apply) apply ;;
  rollback) rollback ;;
  *) echo "Bad mode: $mode" >&2; exit 2 ;;
esac

How to run it safely (the way adults do)

1) Generate a file list (optional but recommended). 2) Plan. 3) Review collisions. 4) Canary apply. 5) Full apply with logging. 6) Verify.

cr0x@server:~$ cd /data/projects/clientA
cr0x@server:~$ find . -type f -print0 | sort -z > /tmp/clientA.files.zlist
cr0x@server:~$ bash safe_rename.sh --plan --root /data/projects/clientA --filelist /tmp/clientA.files.zlist --rule 'lower,sanitize' --out /tmp/clientA.plan.tsv
Plan written: /tmp/clientA.plan.tsv
Rows: 284913
Collisions: 17
Skips: 9211

Now you make a choice. If collisions are unacceptable, fix naming rules or handle them with a policy. If collisions are expected and you need deterministic uniqueness, use --collision-policy suffix-hash. That produces filenames that are stable and explainable.

cr0x@server:~$ bash safe_rename.sh --apply --root /data/projects/clientA --filelist /tmp/clientA.files.zlist --rule 'lower,sanitize' --collision-policy suffix-hash --log /tmp/clientA.rename.log
Applied.
Renamed: 275702
Unchanged: 9211
Errors: 0

Joke #2: A bulk rename without a log is like a magic trick without an audience—nobody knows what happened, including you.

Fast diagnosis playbook

When a batch rename is slow or flaky, people love to argue about “the network” or “ZFS being weird.” Don’t argue. Measure three things and you’ll usually find the bottleneck in ten minutes.

First: Is it local metadata or remote metadata?

  • Check mount type (mount, df -Th). If it’s NFS/SMB/FUSE, expect metadata latency.
  • Symptom of remote metadata: CPU low, elapsed time high, progress in bursts.
cr0x@server:~$ /usr/bin/time -f 'elapsed=%E cpu=%P' bash -c 'for i in {1..2000}; do stat /data/projects/clientA/inbox/readme.txt >/dev/null; done'
elapsed=0:00.36 cpu=99%

Decision: If this is seconds-to-minutes with low CPU, you’re latency-bound. Reduce round trips (batch operations, avoid per-file external commands) and schedule off-peak.

Second: Is the directory huge and listing is the bottleneck?

cr0x@server:~$ /usr/bin/time -f 'elapsed=%E' ls -U /data/projects/clientA/inbox >/dev/null
elapsed=0:02.91

Decision: If listing a single directory takes seconds, renaming thousands inside it will hurt. Break work by subdirectories, or consider reorganizing before renaming.

Third: Are you accidentally copying file contents?

cr0x@server:~$ awk -F: '$2==" /data/projects"{print}' /proc/mounts
tank/projects /data/projects zfs rw,xattr,noacl 0 0

Decision: Ensure source and destination paths remain within /data/projects. If your tool is “moving” into another mount, stop and redesign.

Bonus: spot filesystem-level contention quickly

cr0x@server:~$ iostat -xz 1 3
Linux 6.6.0 (server)  02/05/2026  _x86_64_ (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.15    0.00    6.42    1.11    0.00   80.32

Device            r/s     rkB/s   rrqm/s  %rrqm  r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm  w_await wareq-sz  aqu-sz  %util
nvme0n1          2.10     88.3     0.00   0.00    1.42    42.0     55.10   1400.0    10.00  15.36    4.10    25.4    0.25  12.4

Decision: If %util is pinned and await grows, you’re storage-bound. Slow down concurrency, and avoid parallel rename storms.

Common mistakes: symptom → root cause → fix

1) Symptom: “Some files disappeared”

Root cause: Collision overwrite. Your script renamed multiple sources to one destination and overwrote or skipped silently.

Fix: Two-phase rename plus explicit collision policy. Default should be “error and stop,” not “shrug and continue.” Use --collision-policy suffix-hash only when stakeholders accept uglier names.

2) Symptom: “Rename was fast in dev, slow in prod”

Root cause: Prod is on NFS/SMB or has a loaded metadata server. Dev was local SSD.

Fix: Measure metadata latency. Reduce per-file subprocesses. Run in off-hours. If you must, parallelize cautiously (a few workers), not “one per core.”

3) Symptom: “We changed only case, but nothing happened”

Root cause: Case-insensitive filesystem treats foo and Foo as same entry, so rename becomes no-op or errors.

Fix: Two-step rename: foo.__tmp__Foo. The script’s two-phase approach effectively does this.

4) Symptom: “Half the run failed with Permission denied”

Root cause: Some directories/files have stricter permissions, ACLs, or immutable flags. Or you’re renaming under a directory you can read but not write.

Fix: Preflight with write checks on directories, not just files. On Linux, directory write permission governs renames.

cr0x@server:~$ namei -l /data/projects/clientA/legal/Contract?.pdf
f: /data/projects/clientA/legal/Contract?.pdf
drwxr-xr-x root root /
drwxr-xr-x root root data
drwxr-xr-x root root projects
drwxr-x--- team legal clientA
dr-xr-x--- team legal legal
-rw-r----- team legal Contract?.pdf

Decision: If the target directory isn’t writable (dr-x), your rename will fail. Fix directory perms or exclude that subtree.

5) Symptom: “Rollback didn’t restore everything”

Root cause: No complete mapping, or you edited the log, or the run had mixed outcomes (some PHASE1 temps left behind due to crash).

Fix: Treat logs as append-only. Store them off-host if you’re paranoid. After any error, inventory temp names (.rename_tmp.) and resolve them before rerunning.

cr0x@server:~$ find /data/projects/clientA -name '.rename_tmp.*' | head
/data/projects/clientA/inbox/.rename_tmp.1a2b3c4d.Report 1.txt

Decision: If temps exist, stop doing “new runs.” Cleanly finish Phase 2 or roll back.

6) Symptom: “App can’t find files after rename”

Root cause: The app stored absolute paths; you changed them without updating references.

Fix: Either update references (DB/config) in the same change window, or keep compatibility via symlinks (careful: symlinks don’t help on all platforms and can confuse backup tools).

Checklists / step-by-step plan

Preflight checklist (don’t skip this unless you enjoy surprises)

  1. Confirm filesystem and mount type (df -Th, mount). Decide if metadata is local or remote.
  2. Confirm scope: count files, identify giant directories.
  3. Scan for pathological filenames: newlines, leading dashes, odd Unicode. Decide if you will sanitize or preserve.
  4. Define naming rules: lowercasing? whitespace? punctuation? extension handling? Decide what “correct” is.
  5. Collision policy: default to stop-on-collision; use suffix-hash only when needed.
  6. Backups/snapshots: take a snapshot if you can; otherwise ensure backup tooling will still track renames.
  7. Freeze intake: stop new files arriving, or commit to a second pass.

Execution plan (the part you can put in a change ticket)

  1. Create a stable file list (NUL-delimited, sorted).
  2. Generate plan TSV; review collision report.
  3. Run a canary apply on 1–2k files from representative directories.
  4. Validate a downstream consumer (search index, build system, app) with the canary results.
  5. Apply full rename with logging enabled; tail the log.
  6. After completion: scan for temp leftovers, error lines, and unexpected name patterns.
  7. Communicate: publish the rename log location and the normalization rules used.

Post-rename verification checklist

  1. Errors: awk count of ERROR entries should be zero or explained.
  2. Temp files: no .rename_tmp.* should remain.
  3. Collision outcomes: if suffix-hash used, ensure stakeholders accept the format.
  4. App sanity: key workflows still resolve paths.
  5. Backup sanity: next backup run doesn’t treat everything as “new unrelated data.”
cr0x@server:~$ awk -F'\t' '$2=="ERROR"{c++} END{print c+0}' /tmp/clientA.rename.log
0

Decision: If non-zero, decide whether to fix and rerun for just those paths (preferred) or roll back.

cr0x@server:~$ find /data/projects/clientA -name '.rename_tmp.*' | wc -l
0

Decision: If non-zero, you have an interrupted run or manual interference. Resolve before declaring victory.

Three corporate mini-stories from the rename trenches

Incident #1: The wrong assumption (“rename is always atomic”)

The request looked harmless: “Normalize file names in the export directory to lowercase.” The directory lived under /exports, and the engineer assumed it was a normal local mount. They wrote a quick script that moved files into a staging directory (also under /exports), renamed them, and moved them back. It worked fine in test.

In production, /exports was an autofs-managed NFS mount, and the staging directory was actually a different mount point on a different backend. The “move” silently became copy+delete. Mid-run, the network got cranky. The copy phase slowed; the delete phase still happened for some files; the script retried a few times, producing a garden of partial files and missing originals.

Now the fun part: downstream jobs saw “new” files with new names, plus missing expected ones. Their reconciliation logic was path-based, not content-based, so it flagged a pile of false discrepancies. Finance teams joined the chat. Nobody enjoyed it.

The fix wasn’t fancy. They re-ran the operation as a pure in-place rename within a single filesystem, after confirming mount boundaries with df -Th. They also added a hard “refuse to operate if destination is on a different device” check. This is the kind of guardrail you don’t appreciate until you need it.

Incident #2: The optimization that backfired (“parallelize it”)

A different team had millions of small image files to rename. An engineer saw a wall-clock estimate that made their eye twitch and did what engineers do: launched parallel jobs. Not just a few—dozens of workers, each hammering mv and stat in tight loops.

The storage was a shared NAS with a metadata server that was already busy. Renames are metadata operations, and metadata operations can be the most serialized thing in your fleet. The parallel run turned a manageable queue into a thundering herd. Latency spiked for unrelated workloads. Builds slowed. Someone’s CI pipeline timed out and retried, making the overall situation worse.

The kicker: total completion time barely improved. The workload wasn’t CPU-bound; it was metadata-bound. More workers just meant more waiting in line while also annoying everyone else in the building.

They recovered by dialing concurrency down to a small fixed number, batching operations, and doing a canary to measure throughput before committing. They also moved the job window to off-peak and coordinated with the storage team. The “optimization” wasn’t parallelism; it was not fighting the storage system’s bottlenecks.

Incident #3: The boring practice that saved the day (snapshots + plan review)

A compliance-driven org needed to sanitize filenames before archiving: remove spaces, strip special characters, keep extensions. The team had the good sense to treat it like an operations change, not a shell party trick. They took a snapshot, generated a plan file, and had a second engineer review the collision report.

During review, they found a pattern: a subset of files used punctuation to encode meaning (think “A/B test” style naming), and sanitization would collapse distinct categories. Not data loss—worse, mislabeling. The names would still exist, but the semantics would not.

They adjusted the rule set to preserve certain characters and added suffix-hash only for the handful of remaining collisions. Then they ran a canary, validated that an internal indexing job still matched expected patterns, and proceeded with full execution. The archive job later completed without re-ingesting everything because the storage system recognized the inodes; only names changed.

Nothing about it was glamorous. No one got a trophy. But it avoided the kind of slow-motion catastrophe that ends with a conference room and a timeline slide deck.

FAQ

1) Can I just use the rename command?

Sometimes. But there are multiple rename implementations (Perl-based vs util-linux style), different flags, and wildly different behavior. For production work, I prefer a script that logs every action and does collision detection explicitly.

2) Why not use find ... -exec mv?

You can, but it’s easy to create per-file subprocess overhead and almost impossible to do a meaningful dry-run with collision detection. Also, you’ll hate yourself when a newline filename breaks your log parsing.

3) Will renaming change file contents or timestamps?

A pure rename on the same filesystem doesn’t change file contents. It typically doesn’t modify mtime either, but ctime (inode change time) will change because metadata changed. If you see content changes, you likely did copy+delete by crossing filesystems.

4) What about hard links—will I duplicate data?

Renaming doesn’t duplicate data. But hard links mean multiple names point to the same inode. If you rename one name, the other names still exist. If you expected “one file,” you might find “another file” still sitting there under an older name.

5) How do I handle collisions safely?

Default to stopping and fixing your rules. If business requirements demand “no human review,” use deterministic uniqueness (suffix-hash) and document the format. Avoid random suffixes; you want repeatable results.

6) How do I roll back?

Best: filesystem snapshot rollback (when acceptable). Otherwise: replay a reverse mapping from the log. That’s why the script logs RENAMED src original → dest final. If you don’t have a full log, rollback becomes archaeology.

7) Should I run this as root?

Only if you must. Root makes permission problems “go away” and replaces them with accountability problems. Prefer running as the service account that owns the data, with explicit access grants where needed.

8) How do I keep apps working after renames?

Either update references (DB/config) as part of the change, or use a compatibility layer like symlinks. Symlinks can break tooling and security assumptions, so test them. For some workflows, you’re better off updating the app than papering over paths.

9) Is lowercasing filenames always a good idea?

No. It’s convenient, but it can destroy meaning (product codes, human conventions) and can introduce collisions. Lowercase only when you have a reason and a collision strategy.

10) How do I handle Unicode safely?

At minimum, treat names as bytes in your pipeline and avoid tools that assume “printable text.” If you must normalize Unicode, do it with a deliberate library approach and collision detection. “Looks the same” is not a guarantee.

Next steps you should actually do

  1. Pick the policy: decide your normalization rules and your collision behavior. Write it down in the ticket.
  2. Freeze scope: generate a sorted NUL-delimited file list so you rename exactly what you reviewed.
  3. Plan first: generate the TSV mapping and review collisions like you mean it.
  4. Snapshot if you can: ZFS snapshot, filesystem snapshot, or at least confirm backups are healthy.
  5. Canary run: 1–2k files from representative directories, then validate downstream consumers.
  6. Full run with log: tail it, count errors, and scan for leftover temp names.
  7. Communicate changes: publish the rule set and log path. People will ask “where did my file go” for weeks.

If you remember one thing: a safe batch rename is not about clever string transforms. It’s about proving that every source path maps to one destination path, without collisions, with rollback, and with the filesystem doing what you think it’s doing.

← Previous
SSH Keys in WSL: Secure Setup (and How Not to Leak Them)
Next →
Void Linux Install: The Minimalist Distro That Feels Surprisingly Modern

Leave a comment