The pain point: you have one Linux box, one public IP, and one app that can’t go down. “Just redeploy” turns into a five-minute brownout, a rollback that isn’t, and an executive Slack thread that reads like a crime scene report.
Blue/green on a single host is not glamorous. It’s not Kubernetes. It’s not a service mesh. It’s a small, disciplined set of moves that give you two versions running side by side, and a cutover you can reverse without prayer.
What blue/green means on one host (and what it doesn’t)
Blue/green deployment is running two complete versions of your service in parallel: the currently serving one (“blue”) and the candidate (“green”). You validate green while blue continues to serve traffic. Then you switch traffic to green. If anything smells off, you switch back to blue.
On a single host, the constraints are blunt:
- You have one kernel, one NIC, one disk subsystem. Your “redundancy” is mostly procedural.
- You can’t assume a bad deploy can’t hurt the host (runaway logs, memory leaks, disk fill).
- Your switch has to happen at L7 (reverse proxy) or via local port reassignment. DNS is too slow, and “just change the port in the client” is comedy.
So the goal is not “high availability” in the classical sense. The goal is safe change: shorter downtime windows, predictable rollback, and fewer “we don’t know what happened” postmortems.
Here’s the opinionated take: if you’re on one host, do blue/green with a reverse proxy container (Nginx or HAProxy), keep your application containers dumb, and keep state changes explicit. If you try to get clever with iptables magic and ad-hoc scripts, you’ll succeed right up until you don’t.
One joke, because we deserve it: deploying on a single host is like wearing two seatbelts in one car. It helps, but it doesn’t turn the car into a plane.
Interesting facts and short history (so you stop repeating old mistakes)
- Blue/green predates containers. The pattern comes from release engineering long before Docker—two identical environments and a router flip was the simplest “zero downtime” story available.
- “Atomic deploy” used to mean symlink flips. Many pre-container systems deployed to versioned directories and flipped a symlink for instant rollback. Reverse-proxy upstream switching is the spiritual successor.
- Docker’s early networking was rougher than your memory admits. The default bridge evolved; practices like “just publish a port and call it done” came from a time when fewer people did serious multi-service routing on one host.
- Health checks weren’t always first-class. Docker Compose only gradually normalized explicit health checks; older stacks used fragile “sleep 10” scripts and hoped the app was ready.
- Nginx has been a release workhorse for two decades. Its reload semantics (graceful config reload without dropping connections) made it a natural traffic switch even before “cloud-native” was a phrase.
- HAProxy popularized explicit backend health and circuit behavior. Many SRE teams learned that “upstream is listening” is not the same as “service is healthy.”
- Single-host deployments are still normal. Plenty of profitable internal apps and edge services run on single machines because the business values simplicity over theoretical resilience.
- The “pets vs cattle” metaphor never applied to your finance database. Even in container land, stateful services stay special. Your deployment strategy must acknowledge that.
The simplest design that actually works
We’re building four moving parts:
- proxy: a reverse proxy container that owns the public ports (80/443). It routes to “blue” or “green” upstream.
- app-blue: current production app container.
- app-green: candidate app container.
- optional sidecars: migrations runner, one-shot smoke tests, or a tiny “whoami” endpoint to validate routing.
The rules that keep this sane:
- Only the proxy binds public ports. Blue and green stay on an internal Docker network with no published ports. That avoids accidental exposure and port conflicts.
- Keep the proxy config switchable and reloadable. A one-file “active upstream” choice is easier than templating fifty lines mid-incident.
- Gate cutover on a real health check. If your app doesn’t have a /health endpoint, add one. You can ship features later; you cannot ship without knowing if it’s alive.
- Make rollback a first-class command. If rollback requires “remembering what we changed,” you don’t have rollback. You have improv theater.
- State changes are the hard part. Blue/green works great for stateless app code. For database schema changes, you need compatibility strategy or a deliberate maintenance step.
A paraphrased idea from John Allspaw: “In operations, failure is normal; resilience comes from preparing for it, not pretending it won’t happen.”
Host layout: ports, networks, volumes, and the one thing you must not share
Ports
Public:
- proxy:80 and proxy:443 are published to the host.
Internal:
- app-blue listens on 8080 inside the container.
- app-green listens on 8080 inside the container.
Nginx routes to app-blue:8080 or app-green:8080 on a shared internal network.
Networks
Create a dedicated network, e.g. bg-net. Don’t reuse the default network if you like your future self.
Volumes
Here’s the single-host gotcha: shared writeable storage between blue and green can corrupt your day.
- OK to share: read-only assets, trusted config, TLS certs, and maybe a cache if it’s safe to lose.
- Be careful sharing: uploads directories. Two app versions might write different formats, permissions, or paths.
- Do not share blindly: SQLite files, embedded databases, or anything where two processes can write concurrently without coordination.
If your app writes to a local disk directory, prefer one of these:
- Externalize state (object storage, database, etc.).
- Versioned state directories per color, then a controlled migration step and a controlled switch.
- A “shared but compatible” schema plan where both versions can run against the same DB and tolerate mixed traffic during cutover.
Logging
On a single host, log growth is a deployment killer. Use Docker’s log rotation options. Otherwise your “zero downtime deploy” becomes “no disk, no service.”
A working Docker Compose blueprint
This is intentionally boring. Boring is good. You can tweak it later once it’s reliable.
cr0x@server:~$ cat docker-compose.yml
version: "3.9"
services:
proxy:
image: nginx:1.25-alpine
container_name: bg-proxy
ports:
- "80:80"
volumes:
- ./nginx/conf.d:/etc/nginx/conf.d:ro
- ./nginx/snippets:/etc/nginx/snippets:ro
depends_on:
app-blue:
condition: service_healthy
app-green:
condition: service_healthy
networks:
- bg-net
app-blue:
image: myapp:blue
container_name: app-blue
environment:
- APP_COLOR=blue
expose:
- "8080"
healthcheck:
test: ["CMD-SHELL", "wget -qO- http://localhost:8080/health | grep -q ok"]
interval: 5s
timeout: 2s
retries: 10
start_period: 10s
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "5"
networks:
- bg-net
app-green:
image: myapp:green
container_name: app-green
environment:
- APP_COLOR=green
expose:
- "8080"
healthcheck:
test: ["CMD-SHELL", "wget -qO- http://localhost:8080/health | grep -q ok"]
interval: 5s
timeout: 2s
retries: 10
start_period: 10s
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "5"
networks:
- bg-net
networks:
bg-net:
name: bg-net
The proxy configuration decides which color receives traffic. Keep that decision in one small file.
cr0x@server:~$ mkdir -p nginx/conf.d nginx/snippets
...output...
cr0x@server:~$ cat nginx/snippets/upstream-active.conf
set $upstream app-blue;
...output...
cr0x@server:~$ cat nginx/conf.d/default.conf
server {
listen 80;
location /healthz {
return 200 "proxy ok\n";
}
location / {
include /etc/nginx/snippets/upstream-active.conf;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header X-Request-Id $request_id;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_pass http://$upstream:8080;
}
}
...output...
This pattern uses a variable upstream target. It’s easy to change, easy to diff, and reloads cleanly.
Cutover mechanics: atomic, observable, reversible
Deploy green without touching traffic
Build/pull the new image as myapp:green. Start it alongside blue. Validate health and behavior through the proxy (or directly on the internal network via an exec curl).
Switch traffic by changing one file and reloading Nginx
Cutover should be:
- Atomic enough: one change, one reload.
- Observable: you can watch which upstream is serving.
- Reversible: same mechanism in reverse.
Nginx reload is graceful: it loads new config and lets old workers finish active connections. That doesn’t prevent all edge cases (long-lived streams), but it’s the least-bad option on one host.
Roll back by switching the file back and reloading
The rollback procedure must be mechanically identical to cutover. If rollback is “different,” it will fail at 2 a.m. when your brain is mostly coffee.
Second and last joke: the most reliable monitoring system is still “a customer,” but it has terrible alert routing and no maintenance window support.
Practical tasks (commands, outputs, decisions)
These are the moves you’ll actually do. Each task includes: command, sample output, what it means, and what decision you make from it.
Task 1: Confirm Docker and Compose versions (capability check)
cr0x@server:~$ docker version --format '{{.Server.Version}}'
26.1.3
What it means: you’re on a modern Docker engine. That matters for networking stability and health check behavior.
Decision: if this is ancient (think 19.x era), budget time for an upgrade before you blame your app for weirdness.
cr0x@server:~$ docker compose version
Docker Compose version v2.29.1
What it means: Compose v2 is available; you can rely on current CLI behavior.
Decision: standardize on docker compose (plugin) rather than the old docker-compose binary to reduce “works on my laptop” variance.
Task 2: Create the network once (reduce surprise)
cr0x@server:~$ docker network create bg-net
8d4b7b7f0d3e5e2e0f6f5f5a8d1c0d3a9c2b1f2e3a4b5c6d7e8f9a0b1c2d3e4
What it means: the network exists; containers can resolve each other by service name.
Decision: if your org has multiple stacks on one host, naming networks explicitly avoids collisions and “why is this container seeing that database?” moments.
Task 3: Bring up blue and green (but don’t cut over yet)
cr0x@server:~$ docker compose up -d
[+] Running 3/3
✔ Container app-blue Started
✔ Container app-green Started
✔ Container bg-proxy Started
What it means: everything started. This is not the same as “everything is healthy.”
Decision: immediately verify health status next; don’t celebrate based on “Started.”
Task 4: Check container health (gate cutover)
cr0x@server:~$ docker ps --format 'table {{.Names}}\t{{.Status}}'
NAMES STATUS
bg-proxy Up 25 seconds
app-blue Up 25 seconds (healthy)
app-green Up 25 seconds (healthy)
What it means: both app versions pass their health checks. Proxy is running.
Decision: if green is (unhealthy), stop. Inspect logs and fix before you switch a single packet.
Task 5: Verify which color is serving through the proxy (ground truth)
cr0x@server:~$ curl -sS localhost/ | head -n 1
hello from blue
What it means: your current Nginx snippet points to app-blue.
Decision: if it’s not what you expect, fix your proxy config before you proceed. Don’t debug “deployment” when it’s just routing.
Task 6: Validate green directly from the proxy container (bypass external noise)
cr0x@server:~$ docker exec -it bg-proxy sh -lc "wget -qO- http://app-green:8080/ | head -n 1"
hello from green
What it means: Nginx can resolve and reach green over the internal network.
Decision: if name resolution fails, you have a network mismatch (wrong network, wrong service name, or containers not attached).
Task 7: Run Nginx config test before reload (avoid self-inflicted outages)
cr0x@server:~$ docker exec -it bg-proxy nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
What it means: reload won’t brick the proxy with a syntax error.
Decision: if this fails, do not reload. Fix config first. A broken proxy is instant downtime.
Task 8: Switch to green (one-file change + reload)
cr0x@server:~$ sed -i 's/set \$upstream app-blue;/set $upstream app-green;/' nginx/snippets/upstream-active.conf
...output...
cr0x@server:~$ docker exec -it bg-proxy nginx -s reload
2026/01/03 09:22:48 [notice] 1#1: signal process started
What it means: Nginx accepted the reload signal and started new workers with the new routing target.
Decision: immediately verify traffic is now hitting green, and watch error rates for a few minutes.
Task 9: Confirm cutover worked (and didn’t half-work)
cr0x@server:~$ curl -sS localhost/ | head -n 1
hello from green
What it means: requests now route to green.
Decision: if you see mixed results, you might have multiple proxy instances, a load balancer in front, or caching that masks reality. Validate end-to-end.
Task 10: Watch logs for 60 seconds after cutover (cheap canary)
cr0x@server:~$ docker logs -f --since=60s app-green
INFO listening on :8080
INFO GET / 200 3ms
INFO GET /api/orders 200 21ms
What it means: green is serving real traffic and returning 200s.
Decision: if you see 500s, timeouts, or DB errors: roll back now, then debug with a calm heart.
Task 11: Roll back to blue (same mechanism)
cr0x@server:~$ sed -i 's/set \$upstream app-green;/set $upstream app-blue;/' nginx/snippets/upstream-active.conf
...output...
cr0x@server:~$ docker exec -it bg-proxy nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
cr0x@server:~$ docker exec -it bg-proxy nginx -s reload
2026/01/03 09:24:10 [notice] 1#1: signal process started
What it means: rollback is complete as far as routing goes.
Decision: if you can’t roll back in under a minute, your process isn’t blue/green—it’s “blue/green-ish.” Fix the process, not the heroics.
Task 12: Inspect resource pressure (single-host reality check)
cr0x@server:~$ docker stats --no-stream
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
app-blue 0.10% 120MiB / 8GiB 1.46% 15MB / 18MB 5MB / 1MB 23
app-green 2.40% 620MiB / 8GiB 7.56% 120MB / 95MB 60MB / 4MB 45
bg-proxy 0.05% 20MiB / 8GiB 0.24% 135MB / 130MB 2MB / 1MB 5
What it means: green is heavier. That might be fine, or it might be the beginning of the end.
Decision: if memory jumps dramatically, check for leaks, caching changes, or missing limits. On one host, “just add replicas” is not a plan.
Task 13: Verify port bindings (avoid accidental exposure and collisions)
cr0x@server:~$ docker ps --format 'table {{.Names}}\t{{.Ports}}'
NAMES PORTS
bg-proxy 0.0.0.0:80->80/tcp
app-blue 8080/tcp
app-green 8080/tcp
What it means: only proxy binds host port 80. Blue/green are internal only.
Decision: if you see 0.0.0.0:8080->8080/tcp on an app container, fix it. You’ve created a second entry point that bypasses your switch.
Task 14: Debug DNS and network attachment (when proxy “can’t reach upstream”)
cr0x@server:~$ docker network inspect bg-net --format '{{json .Containers}}'
{"a1b2c3d4":{"Name":"app-blue","IPv4Address":"172.20.0.2/16"},"b2c3d4e5":{"Name":"app-green","IPv4Address":"172.20.0.3/16"},"c3d4e5f6":{"Name":"bg-proxy","IPv4Address":"172.20.0.4/16"}}
What it means: all three containers are on the same network. Names should resolve.
Decision: if proxy isn’t listed, attach it to the network or fix the Compose network stanza.
Task 15: Confirm image digests (ensure you’re running what you think you’re running)
cr0x@server:~$ docker image inspect myapp:green --format '{{.Id}}'
sha256:7f0c1e9a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8
What it means: the green tag points to a specific image ID.
Decision: if your CI retags images, rely on digests in your release notes. “Green” should not silently mutate during an incident.
Task 16: Spot disk pressure from Docker quickly (before it becomes downtime)
cr0x@server:~$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 42 8 18.4GB 9.1GB (49%)
Containers 16 3 1.2GB 900MB (75%)
Local Volumes 12 7 6.8GB 1.1GB (16%)
Build Cache 31 0 3.5GB 3.5GB
What it means: you have reclaimable space, especially build cache.
Decision: if disk is tight, prune build cache in a controlled window. If you prune blindly during an incident, you might delete a rollback image you still need.
Three corporate mini-stories from the trenches
Incident caused by a wrong assumption: “Listening equals healthy”
A mid-size company ran a single-host internal API for billing integrations. The app was containerized, the host was stable, and releases were “fast.” They decided to implement blue/green by starting green on a different port and switching Nginx upstream.
The health check was a TCP connect. If the port accepted a connection, green was declared good. That felt reasonable: the process is up, right?
Then a release added a startup migration that warmed caches by pulling a chunk of data from the database. The process opened the port early (framework default), but it wasn’t actually ready to serve real requests. Under load, it returned 503s and timeouts while still “healthy.” Nginx happily sent production traffic into a half-awake service.
The outage didn’t last long because rollback was quick. The damage was that everyone lost confidence in the deployment process and started pushing changes only during office hours. That’s not reliability; that’s fear scheduling.
They fixed it by changing the health endpoint to validate downstream dependencies: DB connectivity, migrations complete, and a trivial query. The endpoint stayed fast and cheap. Cutover became boring again.
Optimization that backfired: “Let’s share everything to save disk”
A different shop ran a content service with user uploads stored on local disk. To “optimize,” they mounted the same uploads volume into both blue and green so they didn’t have to sync anything and could roll back instantly.
It worked for months, until a release changed how thumbnails were generated and stored. Green started writing new files with different naming and a slightly different permissions mask. Blue, still running, continued to serve old pages that referenced thumbnails that were now being overwritten or replaced mid-flight.
The symptom was weird: not a full outage, but intermittent broken images and occasional 403s. Support tickets arrived first, then engineering. The team chased cache headers, CDN behavior, and even browser bugs. The root cause was mundane: two versions writing to the same directory without a compatibility contract.
The fix was even more mundane: versioned directories for generated artifacts, and a background job to backfill. Shared uploads remained, but derived outputs became versioned and eventually migrated. The “disk optimization” saved a few gigabytes and cost them a week of debugging.
Boring but correct practice that saved the day: “Keep the old image and the rollback path”
A regulated enterprise ran a single host in a controlled network segment for a legacy integration gateway. No autoscaling, no magic—just change management and a pager.
They implemented blue/green with a proxy container and two app containers, but they also enforced one boring rule: never prune images during business hours, and always keep the last known-good image pinned by digest in a local registry mirror. Nobody loved this rule. It felt like clutter.
One afternoon, an upstream library published a broken minor release that passed unit tests but triggered a runtime crash only under a specific TLS handshake. The green container started, appeared healthy for a short time, then crashed under real client traffic.
Rollback worked instantly. No drama. But the real save was this: the team could redeploy the known-good image even after CI had moved tags forward, because the digest was preserved locally. They didn’t have to “rebuild last week’s commit” under pressure.
Nothing heroic happened, which is exactly the point. The postmortem was short, technical, and not about feelings.
Fast diagnosis playbook
When a blue/green deploy on a single host goes sideways, the bottleneck is usually one of: routing, health, resources, or state. Check in this order; it’s optimized for “stop the bleeding” first.
1) Routing: is traffic going where you think it’s going?
- Check proxy is alive and serving
/healthz. - Check which upstream is active in the snippet file.
- Check Nginx config test passes and reload happened.
cr0x@server:~$ curl -sS -i localhost/healthz | head -n 1
HTTP/1.1 200 OK
Decision: if proxy health fails, stop debugging the app. Fix proxy/container/host firewall first.
2) Health: is green actually healthy, or just “running”?
cr0x@server:~$ docker inspect -f '{{.State.Health.Status}}' app-green
healthy
Decision: if unhealthy, don’t cut over. If already cut over, roll back and debug green offline.
3) Resources: are you starving the host?
cr0x@server:~$ free -h
total used free shared buff/cache available
Mem: 7.7Gi 6.9Gi 210Mi 120Mi 610Mi 430Mi
Swap: 0B 0B 0B
Decision: if available memory is low and green is heavier, you’re on track for the OOM killer. Roll back or add limits and capacity.
4) State: did you change something irreversible?
If the app uses a database, check schema migrations and compatibility. If you changed data formats, rollback might not restore behavior.
cr0x@server:~$ docker logs --since=10m app-green | tail -n 20
INFO migration complete
INFO connected to db
Decision: if migrations ran and are not backward compatible, rolling back app code may not fix the issue. Your “deployment” is now a data incident.
Common mistakes: symptoms → root cause → fix
1) Symptom: Nginx shows “502 Bad Gateway” after cutover
- Root cause: proxy can’t reach green (wrong network, container name mismatch, green crashed, or app listening on different port).
- Fix: verify network attachment and service name; confirm green is healthy; confirm app listens on 8080.
cr0x@server:~$ docker exec -it bg-proxy sh -lc "wget -S -qO- http://app-green:8080/health"
HTTP/1.1 200 OK
...
ok
2) Symptom: Green is “healthy” but users see errors
- Root cause: health check is too shallow (port open, not dependency-ready), or it doesn’t represent real request paths.
- Fix: make health check validate critical dependencies and a representative lightweight request. Keep it fast.
3) Symptom: Cutover works, then performance collapses 5–15 minutes later
- Root cause: memory leak, cache growth, connection pool change, or log amplification causing disk I/O pressure.
- Fix: check
docker stats, host memory, and log volume; set container memory limits; rotate logs; investigate connection pools.
4) Symptom: Rollback doesn’t fix the problem
- Root cause: green performed an irreversible state change (schema migration, data rewrite), or the proxy is still routing to green due to stale config or multiple proxies.
- Fix: confirm routing; if state changed, use a forward-fix or restore from backup/snapshot. You can’t “roll back” time with YAML.
5) Symptom: Both blue and green are serving traffic unexpectedly
- Root cause: app containers published host ports, bypassing proxy, or an external load balancer targets both.
- Fix: remove host port publishing from app containers; ensure only proxy is exposed; validate external routing targets.
6) Symptom: Deploys intermittently fail with “bind: address already in use”
- Root cause: you tried to bind app containers directly to host ports for both colors.
- Fix: stop binding app ports to host. Bind only proxy to host ports, and route internally.
7) Symptom: Nginx reload causes a brief drop in long-lived connections
- Root cause: websockets/streams and proxy settings not tuned; reload is graceful but not magical.
- Fix: confirm keepalive settings; for truly long-lived connections, consider HAProxy with explicit backend behavior, or accept a short maintenance window.
8) Symptom: Disk fills during deploy
- Root cause: log growth, image sprawl, build cache accumulation.
- Fix: set Docker log rotation; prune with intent; keep rollback images pinned; monitor disk.
Checklists / step-by-step plan
Pre-flight checklist (before you touch production traffic)
- Proxy is the only service publishing
80/443on the host. - Blue and green run on the same internal Docker network.
- Health checks exist and validate dependencies (not just “port open”).
- Logs have rotation (
max-size,max-file). - Rollback path is tested: switch back to blue and reload proxy.
- State changes are planned: migrations are backward compatible or separated.
- You know how to verify “which color is live” using a real request.
Step-by-step deploy plan (single-host blue/green)
- Build/pull green image and label it clearly.
- Start green alongside blue (no public ports on green).
- Wait for health: green must be
healthy. - Smoke test green from inside the proxy container (network and response).
- Run Nginx config test before reload.
- Switch upstream file to green.
- Reload proxy and confirm live traffic goes to green.
- Observe logs, error rate, latency, and host resources for a few minutes.
- Keep blue running until you’re confident (your rollback window).
- After acceptance, stop blue and keep its image for at least one release cycle.
Rollback checklist (when green disappoints you)
- Switch upstream file back to blue.
- Test Nginx config.
- Reload Nginx.
- Verify live traffic hits blue.
- Capture green logs and metrics for debugging.
- Decide whether to keep green running for investigation or stop it to free resources.
FAQ
1) Can I do blue/green without a reverse proxy?
You can, but you’ll reinvent a reverse proxy using host ports, iptables, or DNS hacks. On a single host, an explicit proxy is the least-bad switch.
2) Why not just use Docker’s port publishing and swap ports?
Because you can’t bind two containers to the same host port, and the swap is not atomic unless you add more machinery. Also, you’ll end up exposing both versions at some point.
3) Is Nginx reload truly zero downtime?
For typical HTTP requests, it’s close. Nginx reload is graceful, but long-lived connections and weird clients can still see blips. Design with that in mind.
4) Should blue and green share the same database?
Usually yes, but only if your schema migration strategy supports compatibility during overlap. If green requires a breaking migration, you need a planned data step or downtime window.
5) What’s the safest way to handle schema migrations?
Prefer backward-compatible migrations: add columns, avoid destructive changes, and deploy code that can handle both old and new shapes. Run destructive cleanup later.
6) How long should I keep blue running after cutover?
Long enough to catch the failures you actually see in your environment: usually minutes to hours. If failures often show up “next morning,” keep blue available longer—resource permitting.
7) Can I canary a small percentage of traffic on a single host?
Yes, but it’s more complexity. HAProxy makes weighted backends straightforward. With Nginx, you can do it, but your config and observability need to be stronger.
8) What about TLS on the proxy?
Terminate TLS at the proxy. Keep certs on the host as read-only mounts. Don’t terminate TLS in both app containers unless you enjoy debugging.
9) Do I need container resource limits for blue/green?
On a single host, yes. Without limits, two versions running together can starve the host and make rollback pointless because everything is slow.
10) How do I prove which version served a request?
Return a version string in a response header or in a debug endpoint. Log it. When someone says “green is broken,” you need evidence, not vibes.
Conclusion: practical next steps
Single-host blue/green is not about perfection. It’s about eliminating the dumbest failures: broken deploys, slow rollbacks, and “we don’t know what’s live.” If you do only three things, do these:
- Put a reverse proxy in front and make it the only public entry point.
- Make health checks real, and gate cutover on them.
- Make cutover and rollback the same operation in opposite directions.
Then get disciplined about the unsexy parts: log rotation, keeping rollback images, and treating state changes as separate events. Your future incidents will still happen. They’ll just be shorter, cleaner, and less theatrical.