Docker Compose Profiles: Dev/Prod Stacks Without Duplicate YAML

Was this helpful?

You know the scene: a repo with docker-compose.yml, docker-compose.dev.yml, docker-compose.prod.yml,
and one more file someone made “temporarily” during an incident. Then a year passes. Now your “dev” stack accidentally enables the prod-grade
reverse proxy, or prod quietly runs the debug image because the override chain is haunted.

Compose profiles are the grown-up answer: one Compose file, multiple stacks, predictable behavior. Less YAML archaeology, fewer “works on my laptop”
surprises, and far fewer Friday-night reversions.

What Compose profiles are (and what they aren’t)

A Compose profile is a label you attach to a service (and sometimes other resources) so it only starts when that profile is enabled.
It’s basically conditional inclusion. The Compose file remains one coherent model; profiles decide which parts are active for a run.

Here’s the core mental model:

  • Without profiles: running docker compose up starts every service in the file (subject to dependencies).
  • With profiles: services tagged with profiles are excluded unless that profile is enabled via
    --profile or COMPOSE_PROFILES.
  • Default services: services with no profiles: entry behave like “always on”.

What profiles are not: a full templating system, a secrets manager, or a replacement for a proper deployment tool. They won’t stop you
from doing something reckless; they just make it harder to do it accidentally.

Opinionated guidance: treat profiles as feature gates for runtime topology. Use them to add/remove sidecars, tooling,
dev-only dependencies, and operational helpers. Don’t use them to paper over fundamentally different production architectures. If prod runs
on Kubernetes and dev runs on Compose, fine—profiles still help in dev and in local prod-like validation. But don’t pretend Compose profiles
magically make dev identical to prod. They make it disciplined.

One quote to keep your head straight during the next “just ship it” debate:
Hope is not a strategy. — General Gordon R. Sullivan

Joke #1: If your dev and prod Compose files diverge long enough, they’ll eventually file separate tax returns.

Facts and history: why profiles exist

Profiles feel obvious now, but they’re a response to years of messy reality. Some context helps you understand the sharp edges.

8 concrete facts that matter in practice

  1. Compose started life as Fig (2013–2014 era): it was designed for local multi-container apps, not enterprise deployment.
    Profiles are a later concession to how people actually used it.
  2. Override files became the default workaround: docker-compose.override.yml was a convenience feature,
    and it accidentally trained teams to fork configuration endlessly.
  3. Profiles arrived to reduce YAML sprawl: they let one file represent multiple shapes without a pile of overrides.
  4. Compose V2 shifted into the Docker CLI: docker compose (space) replaced docker-compose (dash)
    for most modern installs. Profiles are much more consistently supported there.
  5. Profiles are resolved client-side: the Compose CLI decides what to create. The Engine isn’t aware of your intent.
    That means the “source of truth” is the Compose config you actually ran.
  6. Profiles interact with dependencies in non-obvious ways: a service with a profile can be pulled in because another service
    depends on it (depending on how you start things). You need to test your startup paths.
  7. Multi-environment drift is an availability problem: duplicate YAML files don’t just waste time—they create unknown unknowns
    that show up during incidents.
  8. Profiles pair well with “operational tooling” containers: backup jobs, migration runners, log shippers, and admin UIs can be
    opt-in without infecting your default stack.

Design principles: how to structure a single-file dev/prod stack

A single Compose file can be clean or cursed. Profiles don’t save you if you design for chaos. Design for predictability instead.

1) Separate “always-on” from “contextual” services

Put your app, its database, and whatever is required to boot into the default set (no profile).
Put developer luxuries (live reload, admin UIs, fake SMTP, local S3, debug shells) behind dev.
Put production-only infra choices (real TLS edge, WAF-ish reverse proxy rules, log forwarders) behind prod or ops.

2) Keep ports boring, stable, and intentional

In dev, you probably publish ports to the host. In prod, you often don’t; you attach to a network and let a reverse proxy handle ingress.
Use profiles to avoid “prod accidentally binds to 0.0.0.0:5432” incidents.

3) Prefer named volumes; make persistence explicit

Storage is where “dev/prod differences” become data loss. Named volumes are fine for local, but prod should use mounted paths or a managed volume
driver and clearly defined backup/restore workflows.

4) Treat environment variables as API, not as a junk drawer

Use .env files, but don’t let them become a second configuration language. Use explicit defaults, document required variables,
and validate them in your entrypoint if the app is yours.

5) Compose is not an orchestrator; don’t cosplay

Compose can restart containers, do healthchecks, and define dependencies. It is not scheduling across nodes, doing progressive rollout, or managing
secrets at scale. Use it as a reliable “stack runner”. If you need more, graduate—don’t bolt on a pile of scripts until you reinvent a worse Kubernetes.

Joke #2: “Just one more override file” is how you summon YAML poltergeists.

A reference Compose file using profiles (dev/prod/ops)

This is a realistic baseline: a web app, a Postgres database, a cache, and optional helpers. The goal isn’t to be fancy.
The goal is to be hard to misuse.

cr0x@server:~$ cat compose.yml
services:
  app:
    image: ghcr.io/acme/demo-app:1.8.2
    environment:
      APP_ENV: ${APP_ENV:-dev}
      DATABASE_URL: postgres://app:${POSTGRES_PASSWORD:-devpass}@db:5432/app
      REDIS_URL: redis://redis:6379/0
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    networks: [backend]
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost:8080/healthz"]
      interval: 10s
      timeout: 2s
      retries: 12

  db:
    image: postgres:16
    environment:
      POSTGRES_DB: app
      POSTGRES_USER: app
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-devpass}
    volumes:
      - db_data:/var/lib/postgresql/data
    networks: [backend]
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d app"]
      interval: 5s
      timeout: 2s
      retries: 20

  redis:
    image: redis:7
    command: ["redis-server", "--save", "", "--appendonly", "no"]
    networks: [backend]

  # Dev-only: bind ports, live reload, friendly tools
  app-dev:
    profiles: ["dev"]
    image: ghcr.io/acme/demo-app:1.8.2
    environment:
      APP_ENV: dev
      LOG_LEVEL: debug
      DATABASE_URL: postgres://app:${POSTGRES_PASSWORD:-devpass}@db:5432/app
      REDIS_URL: redis://redis:6379/0
    command: ["./run-dev.sh"]
    volumes:
      - ./src:/app/src:delegated
    ports:
      - "8080:8080"
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    networks: [backend]

  mailhog:
    profiles: ["dev"]
    image: mailhog/mailhog:v1.0.1
    ports:
      - "8025:8025"
    networks: [backend]

  adminer:
    profiles: ["dev"]
    image: adminer:4
    ports:
      - "8081:8080"
    networks: [backend]

  # Prod-ish: reverse proxy and tighter exposure
  edge:
    profiles: ["prod"]
    image: nginx:1.27
    volumes:
      - ./nginx/conf.d:/etc/nginx/conf.d:ro
    ports:
      - "80:80"
    depends_on:
      app:
        condition: service_healthy
    networks: [frontend, backend]

  # Ops-only: migrations and backups
  migrate:
    profiles: ["ops"]
    image: ghcr.io/acme/demo-app:1.8.2
    command: ["./migrate.sh"]
    environment:
      APP_ENV: ${APP_ENV:-prod}
      DATABASE_URL: postgres://app:${POSTGRES_PASSWORD}@db:5432/app
    depends_on:
      db:
        condition: service_healthy
    networks: [backend]

  pg-backup:
    profiles: ["ops"]
    image: postgres:16
    environment:
      PGPASSWORD: ${POSTGRES_PASSWORD}
    entrypoint: ["/bin/sh", "-lc"]
    command: >
      pg_dump -h db -U app -d app
      | gzip -c
      > /backup/app-$(date +%F_%H%M%S).sql.gz
    volumes:
      - ./backup:/backup
    depends_on:
      db:
        condition: service_healthy
    networks: [backend]

networks:
  frontend: {}
  backend: {}

volumes:
  db_data: {}

What this structure buys you

  • Default is safe: app, db, redis run with no host port exposure by default.
  • Dev is ergonomic: enable dev to get live-reload app, mail testing, and Adminer.
  • Prod is controlled: enable prod to add an edge proxy; still no random dev ports.
  • Ops is explicit: migrations and backups are not “always running”; they’re invoked intentionally.

Note the deliberate duplication: app and app-dev are separate services. That’s not laziness.
It’s a safety boundary. The dev service binds ports and mounts source code; the prod-ish service does neither.
You can share an image tag while separating runtime behavior.

Practical tasks: 12+ real commands, outputs, and decisions

Below are concrete operational moves you’ll actually use. Each has: a command, what typical output means, and what decision you make next.
Run them in the repo root where compose.yml lives.

Task 1: Verify your Compose supports profiles (and which version you’re running)

cr0x@server:~$ docker compose version
Docker Compose version v2.27.0

Meaning: Compose V2 is installed. Profiles are supported.
If you see “command not found” or an ancient v1 binary, expect inconsistent behavior.

Decision: Standardize on docker compose across your team/CI. Mixed v1/v2 is how you get “but it worked yesterday” tickets.

Task 2: Render the effective config for a profile (catch surprises before you start containers)

cr0x@server:~$ docker compose -f compose.yml --profile dev config
services:
  adminer:
    image: adminer:4
    networks:
      backend: null
    ports:
    - mode: ingress
      target: 8080
      published: "8081"
      protocol: tcp
  app:
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    environment:
      APP_ENV: dev
      DATABASE_URL: postgres://app:devpass@db:5432/app
      REDIS_URL: redis://redis:6379/0
    healthcheck:
      test:
      - CMD
      - curl
      - -fsS
      - http://localhost:8080/healthz
    image: ghcr.io/acme/demo-app:1.8.2
    networks:
      backend: null
  app-dev:
    command:
    - ./run-dev.sh
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    environment:
      APP_ENV: dev
      DATABASE_URL: postgres://app:devpass@db:5432/app
      LOG_LEVEL: debug
      REDIS_URL: redis://redis:6379/0
    image: ghcr.io/acme/demo-app:1.8.2
    networks:
      backend: null
    ports:
    - mode: ingress
      target: 8080
      published: "8080"
      protocol: tcp
    volumes:
    - type: bind
      source: /home/cr0x/src
      target: /app/src
      bind:
        create_host_path: true
  db:
    environment:
      POSTGRES_DB: app
      POSTGRES_PASSWORD: devpass
      POSTGRES_USER: app
    healthcheck:
      test:
      - CMD-SHELL
      - pg_isready -U app -d app
    image: postgres:16
    networks:
      backend: null
    volumes:
    - type: volume
      source: db_data
      target: /var/lib/postgresql/data
  mailhog:
    image: mailhog/mailhog:v1.0.1
    networks:
      backend: null
    ports:
    - mode: ingress
      target: 8025
      published: "8025"
      protocol: tcp
  redis:
    command:
    - redis-server
    - --save
    - ""
    - --appendonly
    - "no"
    image: redis:7
    networks:
      backend: null
networks:
  backend: {}
volumes:
  db_data: {}

Meaning: You can see exactly which services appear under dev. This is your sanity check for accidental port exposure,
missing env vars, or wrong images.

Decision: If the rendered config includes anything you didn’t intend, fix the Compose file before you run it. Don’t debug a configuration typo at runtime.

Task 3: Start the safe default stack (no profiles enabled)

cr0x@server:~$ docker compose -f compose.yml up -d
[+] Running 4/4
 ✔ Network server_backend  Created
 ✔ Volume "server_db_data" Created
 ✔ Container server-db-1   Started
 ✔ Container server-redis-1 Started
 ✔ Container server-app-1  Started

Meaning: Only default services started. No dev tools, no edge proxy.

Decision: Use this as your baseline for CI smoke tests and “prod-like local” runs. The more boring it is, the better it behaves during incidents.

Task 4: Start the dev experience explicitly

cr0x@server:~$ docker compose -f compose.yml --profile dev up -d
[+] Running 3/3
 ✔ Container server-mailhog-1 Started
 ✔ Container server-adminer-1 Started
 ✔ Container server-app-dev-1 Started

Meaning: Compose added only the dev-profile services; the default services were already running.

Decision: Make “dev is opt-in” a team rule. If someone wants debug ports in prod, they should have to say it out loud with a profile flag.

Task 5: Prove which profiles are enabled (useful in CI logs)

cr0x@server:~$ COMPOSE_PROFILES=prod docker compose -f compose.yml config --profiles
prod

Meaning: The CLI acknowledges which profile(s) will be considered. This is a small trick that prevents big misunderstandings.

Decision: In CI, echo effective profiles at the top of the job. You’re buying future-you a shorter incident.

Task 6: List containers for the project and spot profile services

cr0x@server:~$ docker compose -f compose.yml ps
NAME              IMAGE                         COMMAND                  SERVICE    STATUS          PORTS
server-adminer-1   adminer:4                     "entrypoint.sh php …"   adminer    running         0.0.0.0:8081->8080/tcp
server-app-1       ghcr.io/acme/demo-app:1.8.2   "./start.sh"            app        running (healthy)
server-app-dev-1   ghcr.io/acme/demo-app:1.8.2   "./run-dev.sh"          app-dev    running         0.0.0.0:8080->8080/tcp
server-db-1        postgres:16                   "docker-entrypoint…"    db         running (healthy) 5432/tcp
server-mailhog-1   mailhog/mailhog:v1.0.1        "MailHog"               mailhog    running         0.0.0.0:8025->8025/tcp
server-redis-1     redis:7                       "docker-entrypoint…"    redis      running         6379/tcp

Meaning: You can see which services are running and which ports are published. The PORTS column is your “what did we expose?” audit.

Decision: If you see ports published in environments where they shouldn’t be, stop and fix the file. Don’t normalize accidental exposure.

Task 7: Confirm why a service won’t start (dependency and healthcheck reality check)

cr0x@server:~$ docker compose -f compose.yml logs --no-log-prefix --tail=30 app
curl: (7) Failed to connect to localhost port 8080: Connection refused

Meaning: The healthcheck is failing. Either the app isn’t listening, it’s listening on a different port, or it’s crashing before bind.

Decision: Check docker compose logs app for startup errors, then docker exec into the container to validate the listening port.
Don’t touch the DB yet; most app healthcheck failures are app config, not storage.

Task 8: Inspect effective environment variables (find the “wrong .env” problem fast)

cr0x@server:~$ docker compose -f compose.yml exec -T app env | egrep 'APP_ENV|DATABASE_URL|REDIS_URL'
APP_ENV=dev
DATABASE_URL=postgres://app:devpass@db:5432/app
REDIS_URL=redis://redis:6379/0

Meaning: The container sees the values you think it sees. If the password is missing or empty, your .env isn’t loaded or the variable name is wrong.

Decision: If env vars are wrong, fix the caller side (your shell export, your CI secret injection, or the Compose file). Don’t “hotfix” by editing containers.

Task 9: Identify image drift between dev and prod profile services

cr0x@server:~$ docker compose -f compose.yml images
CONTAINER         REPOSITORY                   TAG     IMAGE ID       SIZE
server-app-1      ghcr.io/acme/demo-app        1.8.2   7a1d0f2c9a33   212MB
server-app-dev-1  ghcr.io/acme/demo-app        1.8.2   7a1d0f2c9a33   212MB
server-db-1       postgres                     16      5e2c6e1e12b8   435MB
server-redis-1    redis                        7       1c90a3f8e3a4   118MB

Meaning: Both app services use the same image ID. That’s good: your dev behavior differs by command/volumes/ports, not by untracked code.

Decision: If image IDs differ unexpectedly, decide whether that’s intentional. If it’s not, unify tags or stop pretending the environments are comparable.

Task 10: Prove which services are actually part of a profile (useful during refactors)

cr0x@server:~$ docker compose -f compose.yml config --services
adminer
app
app-dev
db
edge
mailhog
migrate
pg-backup
redis

Meaning: This lists all services in the file, including profile-gated ones. Now you can cross-check ownership and remove dead weight.

Decision: If nobody can explain why a service exists, delete it or move it behind an ops profile and require explicit invocation.

Task 11: Start prod profile locally without dev exposure

cr0x@server:~$ COMPOSE_PROFILES=prod docker compose -f compose.yml up -d
[+] Running 1/1
 ✔ Container server-edge-1  Started

Meaning: Only the edge service was added; default services were already present.

Decision: Use this to validate nginx config changes with the same app/db you use elsewhere, without bringing in dev-only tools.

Task 12: Run one-off ops jobs without leaving zombie containers

cr0x@server:~$ COMPOSE_PROFILES=ops docker compose -f compose.yml run --rm migrate
Running migrations...
Migrations complete.

Meaning: The migration container ran and was removed. No long-running service, no surprise restarts.

Decision: Keep “ops actions” as run --rm jobs. If your migrations run as a permanent service, you’re creating a self-inflicted pager.

Task 13: Take a backup with the ops profile and validate the file exists

cr0x@server:~$ COMPOSE_PROFILES=ops docker compose -f compose.yml run --rm pg-backup
cr0x@server:~$ ls -lh backup | tail -n 2
-rw-r--r-- 1 cr0x cr0x  38M Jan  3 01:12 app-2026-01-03_011230.sql.gz

Meaning: The backup landed on the host filesystem. That’s the difference between “we have backups” and “we have a comforting story”.

Decision: If the file isn’t there, don’t proceed with risky changes. Fix mounts/permissions first. Backups that don’t restore are just performance art.

Task 14: Detect port collisions before blaming Docker

cr0x@server:~$ ss -ltnp | egrep ':8080|:8081|:8025' || true
LISTEN 0      4096         0.0.0.0:8080      0.0.0.0:*    users:(("docker-proxy",pid=22419,fd=4))
LISTEN 0      4096         0.0.0.0:8081      0.0.0.0:*    users:(("docker-proxy",pid=22455,fd=4))
LISTEN 0      4096         0.0.0.0:8025      0.0.0.0:*    users:(("docker-proxy",pid=22501,fd=4))

Meaning: The host ports are already bound by Docker proxy processes. If your next up fails with “port is already allocated”, this is why.

Decision: Either stop the competing stack or change published ports. Don’t “solve” it by running everything privileged and hoping.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-sized SaaS team kept two Compose files: one for dev, one for “prod-like”. The assumption was polite and deadly:
“They’re basically the same; prod-like just adds nginx.” Nobody re-verified that statement after the tenth small change.

A new engineer added a Redis container to the dev file only, because the app had a feature flag and “prod doesn’t use it yet”.
Weeks later, prod started enabling the flag in a canary. The prod-like Compose stack used in CI didn’t have Redis.
CI passed because the relevant tests were skipped when Redis wasn’t detected.

Then came a deployment where the feature flag rolled wider than intended. The app’s fallback behavior was to retry Redis connections
aggressively. CPU shot up, request latency followed, and a couple of nodes started getting killed by the kernel OOM reaper. Not all of them,
just enough to create a rolling brownout that looked like “network flakiness”.

The fix wasn’t heroic. They collapsed back to one Compose file and used profiles: Redis became default in the stack used for CI, and a new
profile gated “experimental” dependencies. That forced a conscious decision: if the app might use Redis in prod, Redis must exist in the prod-like model.

Lesson: assumptions about environment parity are like milk. They expire quietly, then ruin your day loudly.

Mini-story 2: The optimization that backfired

A large enterprise platform team tried to “optimize developer experience” by using profiles to swap entire images:
a tiny debug image for dev and a hardened image for prod. On paper, it reduced local build time and made the prod image stricter.
In practice, they created a forked universe.

The dev image had extra packages: curl, netcat, Python, and a few CA bundles that “just made things work”.
The prod image was slim: fewer libs, fewer tools, less attack surface. Respectable goals.
But the app had a hidden dependency on system CA certificates due to a third-party SDK doing TLS calls.

Dev never saw the bug because the debug image had the right CA chain. Prod did: TLS handshakes failed intermittently depending on which endpoint
the SDK hit, and the failures were wrapped in opaque exceptions. The incident dragged on because engineers kept reproducing in dev, where it worked.

They kept profiles, but changed the rule: profiles may change commands, mounts, and ports, but not the base OS composition of the runtime
image without a formal test that runs the prod image in dev workflows. They also added a “prod-image” profile that forces the prod image locally.

Lesson: optimizing for speed by changing the runtime substrate is the fastest way to buy slow incidents.

Mini-story 3: The boring but correct practice that saved the day

An internal payments team ran Compose for local dev and for a small on-prem “lab” environment used for partner integrations.
Their practice was unsexy: every change to Compose had to include an updated docker compose config output artifact in CI logs for each profile.
Not stored forever, just attached to the job summary.

One morning, a change landed that moved a port mapping from a dev-only service into a default service. It wasn’t malicious;
it was a copy/paste error while refactoring. The service happened to be a database admin UI. You know where this is going.

The lab environment had a strict firewall, so it wasn’t internet-exposed. But it was accessible to a large corporate network,
which is its own kind of wilderness. The team caught the mistake before deploying because the CI artifact for the default profile
suddenly showed a published port that hadn’t been there the day before.

They reverted, then reintroduced the change correctly behind the dev profile. No incident, no shame spiral, no “we’ll fix it later”.
Just a small, boring guardrail doing its job.

Lesson: printing the effective config is the ops equivalent of washing your hands. It’s not glamorous, and it prevents infections.

Fast diagnosis playbook: what to check first/second/third

When a Compose stack “doesn’t work”, the fastest path is to stop guessing what Compose did and inspect what it actually did.
Profiles add one more dimension to confusion, so your triage needs to be crisp.

First: confirm the intended profile set and rendered config

  • Run docker compose --profile X config and scan for:

    • unexpected published ports
    • missing services you assumed were there (cache, message broker, reverse proxy)
    • env var defaults you forgot were defaults
  • If config output surprises you, stop. Fix configuration before chasing runtime symptoms.

Second: check container state and health, not just “running”

  • Run docker compose ps. Look for (healthy) and restart loops.
  • A service can be “Up” and still be dead inside. Healthchecks are your cheap lie detector.

Third: determine whether you have a dependency failure or an app failure

  • If DB is unhealthy: check storage, permissions, and volume mounts.
  • If DB is healthy but app is unhealthy: check app logs and env vars.
  • If everything is healthy but requests fail: check networking, published ports, and reverse proxy config (especially if prod profile adds an edge).

Bonus: isolate by removing profiles

If dev profile introduces breakage, run the default stack alone. If the default stack works, the regression is in dev-only services,
mounts, or port conflicts. Profiles make this isolation trivial—if you keep your defaults clean.

Common mistakes: symptoms → root cause → fix

Mistake 1: “Why is my dev tool running in prod?”

Symptom: Admin UI, MailHog, or debug endpoints appear in environments where they don’t belong.

Root cause: Service lacks profiles: ["dev"], or the environment sets COMPOSE_PROFILES=dev globally.

Fix: Add profiles to the service, and audit CI/hosts for leaked COMPOSE_PROFILES. In prod scripts, set COMPOSE_PROFILES=prod explicitly.

Mistake 2: “Enabling a profile didn’t start anything”

Symptom: docker compose --profile ops up shows no new containers, or only defaults start.

Root cause: The services are defined with a different profile name than you passed (typo), or you expected run-style jobs to appear under up.

Fix: Use docker compose config --services and inspect profiles sections. For one-off jobs, use docker compose run --rm SERVICE.

Mistake 3: “The app can’t connect to the database in dev, but prod works”

Symptom: Connection refused/timeouts only in dev profile.

Root cause: Dev service uses a different DATABASE_URL, or you accidentally pointed it at localhost instead of the service name db.

Fix: In containers, use service DNS names on the Compose network: db:5432. Confirm with docker compose exec app env.

Mistake 4: “Port is already allocated” appears randomly

Symptom: Starting dev profile fails with a port binding error.

Root cause: Another stack already binds the port, or you started two profiles that both publish the same host port (common with app and app-dev if both publish 8080).

Fix: Only publish ports in one of the services (typically the dev one). Verify collisions with ss -ltnp.

Mistake 5: “depends_on didn’t wait; the app started too early”

Symptom: App starts before DB is ready, causing crash loops.

Root cause: You used depends_on without health conditions, or the DB healthcheck is missing/incorrect.

Fix: Add healthchecks and use condition: service_healthy. Also make the app resilient with retries; Compose isn’t your reliability layer.

Mistake 6: “We thought profile services weren’t created, but they were”

Symptom: A profile-gated service exists as a container/network artifact, even when the profile wasn’t enabled.

Root cause: You previously ran with that profile enabled; resources remain until removed. Or your automation uses docker compose up with environment variables set.

Fix: Use docker compose down (and optionally -v in dev only). Treat “what’s currently running” as state, not intent.

Mistake 7: “Our backups succeeded but restores failed”

Symptom: Backup job runs without errors; restore later fails or produces empty data.

Root cause: Backup container wrote to a path inside the container that wasn’t mounted, or permissions prevented writing to the host.

Fix: Store backups on a host-mounted path. After backup, verify file presence and size with ls -lh. Periodically test restore.

Checklists / step-by-step plan

Step-by-step: migrate from multiple Compose files to one file with profiles

  1. Inventory services across files. List services and note differences (ports, volumes, image tags, commands).
  2. Define profiles that match decisions, not people.
    Use names like dev, prod, ops, debug. Avoid alice or newthing.
  3. Pick the “safe default” stack. No dev tools, no published ports except what’s required for basic function (often none).
  4. Move dev-only services behind dev. MailHog, Adminer, fake S3, local tracing UIs, etc.
  5. Split services when runtime behavior differs materially.
    If dev needs bind mounts and different command: create app-dev rather than trying to toggle everything with env vars.
  6. Keep image identity stable where possible. Prefer same image for app and app-dev; change command/mounts/ports.
  7. Render configs in CI for each profile. Save docker compose config outputs in build logs.
  8. Document “how to run” commands. Make them copy/pasteable; people will copy/paste them anyway.
  9. Test three paths: default only, --profile dev, --profile prod (or prod-like).
  10. Kill the old files. Don’t keep them “just in case”. That’s how drift returns.

Operational checklist: before you declare a profile strategy “done”

  • Default profile starts and is functional without published DB ports.
  • docker compose config output is stable and reviewed for each profile.
  • Dev profile does not change base images without an explicit test plan.
  • Ops tasks use run --rm and write output to host-mounted paths.
  • Port mappings are unique across services that might run together.
  • Healthchecks exist for stateful dependencies (DB) and the app.
  • Secrets are not committed, and prod invocations set profiles explicitly.

CI plan: minimal but effective

  1. Render config for default + dev + prod and store in logs.
  2. Start default stack, run smoke tests, tear down.
  3. Start dev stack (or subset), run unit/integration tests, tear down.
  4. Run ops migrations as a one-off job in a disposable environment.

FAQ

1) Should I use profiles or override files?

Use profiles for topology changes (which services exist) and for “dev tools are optional”.
Use override files sparingly for machine-local tweaks (like a developer’s custom port), and only if you can tolerate drift.
If you must choose one: profiles are easier to reason about and easier to audit.

2) Can a service belong to multiple profiles?

Yes. You can set profiles: ["dev", "ops"] for a service that’s useful in both contexts.
Be careful: multi-profile membership can become a logic puzzle during incidents.
Keep it rare and justified.

3) What happens if I run docker compose up with no profile specified?

Services without a profiles key are started. Services with a profiles key are ignored.
That’s why your default services must be safe and minimal.

4) Can enabling a profile accidentally start extra services via dependencies?

It can, depending on how you start things and how your dependencies are declared. Your job is to test startup paths:
starting “just the app”, starting the full stack, and starting profile services.
Assume humans will run weird commands during incidents.

5) Do profiles affect networks and volumes?

Profiles gate services. Networks and volumes are typically created as needed by the services that reference them.
If a volume is only referenced by a profiled service, it won’t be created unless that profile is active.

6) How do I prevent dev ports from being exposed when someone runs the wrong profile?

Make the default profile safe, and make prod invocations explicit. In scripts, set COMPOSE_PROFILES=prod
rather than relying on whatever environment variables happen to exist.
Also avoid publishing ports in default services unless you truly need them.

7) How should I handle migrations with profiles?

Put migrations in an ops profile as a one-off job and run them with docker compose run --rm migrate.
Don’t make migrations a long-running service. If it restarts, you’ll eventually migrate twice. That’s not an upgrade plan.

8) Are profiles suitable for “prod on a single VM” deployments?

Yes, with discipline. Profiles help you keep ops tools out of the baseline and prevent accidental exposure.
But don’t confuse “works on one VM” with “is an orchestrated production platform”.
Add monitoring, backups, and explicit rollback procedures. Compose won’t invent them.

9) What’s the cleanest way to switch between dev and prod behavior for the same app?

Prefer separate services (like app and app-dev) when the differences are meaningful (bind mounts, commands, ports).
Keep them on the same image tag when possible. Separate behavior, shared artifact.

10) Should I keep a debug profile?

Yes, if you use it responsibly. A debug profile for ephemeral tooling (tcpdump container, shell container, profiling agent)
can reduce mean time to understand. Just don’t let it become “prod with training wheels always on”.

Conclusion: practical next steps

Compose profiles are the simplest way to stop duplicating YAML while still running different stacks for different contexts.
They don’t eliminate complexity; they make it visible and controllable. That’s the point.

Do this next, in order

  1. Pick a safe default stack with no dev tools and minimal host port exposure.
  2. Add dev, prod, and ops profiles to gate what’s optional, risky, or one-off.
  3. Make docker compose config output part of CI logs for each profile. Treat it as an audit trail.
  4. Convert migrations/backup utilities into run --rm jobs behind ops.
  5. Delete your extra Compose files once the single-file approach is validated. Drift loves sentimental attachment.

When you’re on call, you want fewer moving parts and fewer undocumented branches in behavior. Profiles give you that—if you keep your defaults clean
and your profiles intentional. Run less magic. Ship more predictability.

← Previous
ZFS snapshot deletion: Why Snapshots Refuse to Die (and How to Fix It)
Next →
ZFS send/receive DR drill: Practicing Restore, Not Just Backup

Leave a comment