Docker “connection refused” between services: fix networks, not symptoms

Was this helpful?

“Connection refused” is a wonderfully blunt message. It doesn’t mean your app is “a bit slow today.”
It means something tried to open a TCP connection and the other end said “nope” immediately.
No handshake. No waiting. Just a closed door.

In Docker stacks, that door gets slammed for predictable reasons: you’re dialing the wrong address, you’re on the wrong network,
you’re hitting the wrong port, the server isn’t listening where you think it is, or something in the middle is filtering traffic.
This piece is about proving which one, fast—and fixing the network model instead of sprinkling retries like holy water.

What “connection refused” actually means (and what it doesn’t)

“Connection refused” is TCP’s way of saying: the destination IP is reachable, but nothing is listening on that port
(or something actively rejected the connection with a TCP RST). That’s a very different failure mode from:

  • Timeout: packets vanish, routing is broken, firewall is dropping, or the service is wedged and not responding.
  • Name resolution failure: you’re not even getting an IP address for the target name.
  • Connection reset by peer: you connected, then the application closed you down mid-flight.

In Docker, “refused” often means you connected to the wrong place successfully. That sounds contradictory until you realize
how often developers accidentally aim at localhost, or at the host-published port from inside the same network,
or at a container IP that changed since last Tuesday.

Here’s the rule you can tape to your monitor: inside a container, “localhost” means the container.
If you’re connecting from one container to another, “localhost” is almost always wrong unless you purposely run both processes
in the same container (which is its own lifestyle choice).

A Docker networking mental model you can operate under pressure

Docker networking is not “hard.” It’s just layered. People get in trouble when they guess which layer is at fault.
We’re not guessing. We’re going to prove it.

Layer 1: Process and socket

Something must be listening on a port. If your service listens on 127.0.0.1 inside its container, other containers can’t reach it.
It needs to bind to 0.0.0.0 (or the container interface address).

Layer 2: Container network namespace

Each container has its own network namespace: its own interfaces, routes, and loopback. Containers can be attached to one or more networks.
Docker creates a veth pair to connect the container namespace to a bridge (for bridge networks) or to an overlay (for Swarm).

Layer 3: Docker networks (bridge/overlay/macvlan)

The default “bridge” network is not the same as a user-defined bridge network. User-defined networks provide DNS-based service discovery.
Compose leans on that. If you fall back to the default bridge and start hardcoding IPs, you’re writing future incidents.

Layer 4: Service discovery (Docker DNS)

On user-defined networks, Docker runs an embedded DNS server. Containers commonly see it as 127.0.0.11 in /etc/resolv.conf.
Compose service names resolve to container IPs on that network. If name resolution is wrong, everything downstream becomes chaos.

Layer 5: Host port publishing and NAT

ports: in Compose publishes container ports to the host. That is for traffic coming from outside Docker (your laptop, the host, other machines).
Inside the Docker network, containers should normally talk to each other on the container port via the service name.

If you find yourself connecting from container A to host.docker.internal:5432 to reach container B, pause and ask:
“Why am I leaving the Docker network and re-entering through NAT?” Sometimes you need it. Most of the time you don’t.

One quote, because the ops world has receipts

Paraphrased idea from Werner Vogels (reliability/architecture): “Everything fails; design assuming failure and recover by automation.”
That applies here: stop hoping the network will behave; design to observe and verify it.

Fast diagnosis playbook (first/second/third)

When production is burning, you don’t need a philosophy degree. You need a sequence that collapses the search space.
This playbook is built for “service A can’t connect to service B” with “connection refused.”

First: confirm you’re dialing the right target from the caller container

  • From inside container A: resolve the name you’re using.
  • Confirm the IP is on a network shared with container B.
  • Attempt a TCP connect to the intended port.

If name resolution fails or points somewhere unexpected, stop. Fix DNS/network membership first.

Second: confirm the server container is actually listening on the right interface and port

  • Inside container B: list listening sockets.
  • Check the service binds to 0.0.0.0, not 127.0.0.1.
  • Check app logs for “started” versus “crashed and restarting.”

Third: inspect Docker network plumbing and host filtering

  • Inspect network attachments and container IPs.
  • Check iptables/nftables rules if the host is involved (published ports, or traffic crossing namespaces).
  • Check for accidental network isolation (multiple Compose projects, multiple networks, wrong aliases).

Joke #1: If your fix is “add sleep 10,” you didn’t solve networking—you just negotiated with time, and time always invoices you later.

Hands-on tasks: commands, expected output, and decisions

Below are practical tasks you can run on a Linux host with Docker and Compose. Each task includes a command, realistic output,
what that output means, and the decision you make. This is the part you copy into your incident channel.

Task 1: Identify the failing connection details (from logs)

cr0x@server:~$ docker logs --tail=50 api
2026-01-03T09:12:41Z ERROR db: dial tcp 127.0.0.1:5432: connect: connection refused
2026-01-03T09:12:41Z INFO  retrying in 1s

Meaning: The API container is trying to reach Postgres on 127.0.0.1 inside its own network namespace.
Unless Postgres runs in the same container, that’s wrong.

Decision: Change the DB host to the Compose service name (for example db) and use the container port (5432).

Task 2: Confirm container state isn’t lying to you

cr0x@server:~$ docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
NAMES     STATUS                        PORTS
api       Up 2 minutes                  0.0.0.0:8080->8080/tcp
db        Restarting (1) 5 seconds ago  5432/tcp

Meaning: The DB is restarting. “Connection refused” might be real (nothing listening), not a routing problem.

Decision: Stop tuning the client. Fix the server container crash loop first: inspect DB logs and config.

Task 3: Inspect the server logs for bind/listen issues

cr0x@server:~$ docker logs --tail=80 db
2026-01-03 09:12:32.177 UTC [1] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2026-01-03 09:12:32.177 UTC [1] LOG:  listening on IPv6 address "::1", port 5432
2026-01-03 09:12:32.180 UTC [1] LOG:  database system is ready to accept connections

Meaning: Postgres is only listening on loopback inside its container. Other containers will get “refused.”

Decision: Set Postgres to listen on 0.0.0.0 (or the container interface) and adjust pg_hba.conf accordingly.
In containers, “secure by binding to localhost” is security theater with collateral damage.

Task 4: Prove name resolution inside the caller container

cr0x@server:~$ docker exec -it api getent hosts db
172.21.0.3      db

Meaning: Docker DNS can resolve db to an IP. That’s a prerequisite for service discovery.

Decision: If this fails, you’re on the wrong network or using the wrong name. Fix Compose networks/aliases, not the application.

Task 5: Test the TCP port from the caller container (fast signal)

cr0x@server:~$ docker exec -it api bash -lc 'nc -vz -w2 db 5432; echo exit=$?'
nc: connect to db (172.21.0.3) port 5432 (tcp) failed: Connection refused
exit=1

Meaning: You reached the container IP, but the port refused. That’s almost always “not listening” or “listening on loopback only.”

Decision: Investigate the server container’s listening sockets and bind address (next task).

Task 6: Check what the server is listening on (inside server container)

cr0x@server:~$ docker exec -it db bash -lc 'ss -lntp | head -n 20'
State  Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0      4096   127.0.0.1:5432      0.0.0.0:*     users:(("postgres",pid=1,fd=6))

Meaning: Postgres is bound to 127.0.0.1:5432 only. Containers on the same network cannot connect.

Decision: Reconfigure service to bind to 0.0.0.0 (or container IP) and restart.

Task 7: Confirm both containers share the same Docker network

cr0x@server:~$ docker inspect -f '{{json .NetworkSettings.Networks}}' api
{"appnet":{"IPAMConfig":null,"Links":null,"Aliases":["api","api-1"],"NetworkID":"c0b3f6f2e2c1","EndpointID":"8d6d...","Gateway":"172.21.0.1","IPAddress":"172.21.0.2","IPPrefixLen":16,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"MacAddress":"02:42:ac:15:00:02","DriverOpts":null}}
cr0x@server:~$ docker inspect -f '{{json .NetworkSettings.Networks}}' db
{"appnet":{"IPAMConfig":null,"Links":null,"Aliases":["db","db-1"],"NetworkID":"c0b3f6f2e2c1","EndpointID":"a21c...","Gateway":"172.21.0.1","IPAddress":"172.21.0.3","IPPrefixLen":16,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"MacAddress":"02:42:ac:15:00:03","DriverOpts":null}}

Meaning: Both are on appnet with the same NetworkID. So routing between them should be fine.

Decision: Focus on the server listening/bind config and app readiness, not on multi-network weirdness.

Task 8: Inspect the network object for surprises (subnet, containers, options)

cr0x@server:~$ docker network inspect appnet --format '{{json .IPAM.Config}} {{json .Containers}}'
[{"Subnet":"172.21.0.0/16","Gateway":"172.21.0.1"}] {"1c2f...":{"Name":"api","IPv4Address":"172.21.0.2/16"},"7aa9...":{"Name":"db","IPv4Address":"172.21.0.3/16"}}

Meaning: Confirmed subnet and membership. If the DB isn’t listed, it’s not on the network you think it is.

Decision: If membership is wrong: fix Compose to attach services to the same user-defined network and redeploy.

Task 9: Detect the “published port confusion” problem

cr0x@server:~$ docker port db
5432/tcp -> 0.0.0.0:15432

Meaning: The DB container exposes 5432 internally, published as 15432 on the host.
Other containers should still use db:5432, not db:15432 and not localhost:15432.

Decision: If application config points to 15432 from inside Docker, correct it. Published ports are for outside clients.

Task 10: Validate routing from inside the caller container

cr0x@server:~$ docker exec -it api ip route
default via 172.21.0.1 dev eth0
172.21.0.0/16 dev eth0 proto kernel scope link src 172.21.0.2

Meaning: The container has a route to the subnet where db lives. If the route is missing, you attached the wrong network.

Decision: Missing route means wrong network attachment. Fix Compose networks, don’t hack /etc/hosts.

Task 11: Validate the caller’s DNS config points at Docker DNS

cr0x@server:~$ docker exec -it api cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0

Meaning: Docker’s embedded DNS is in use. If you see corporate DNS servers only, name resolution to service names may fail.

Decision: If Docker DNS isn’t being used, check for overridden DNS settings in daemon config or Compose.

Task 12: Prove that the service is reachable when binding is fixed

cr0x@server:~$ docker exec -it db bash -lc 'grep -E "^(listen_addresses|port)" -n /var/lib/postgresql/data/postgresql.conf | head'
60:listen_addresses = '*'
64:port = 5432
cr0x@server:~$ docker restart db
db
cr0x@server:~$ docker exec -it api bash -lc 'nc -vz -w2 db 5432; echo exit=$?'
Connection to db (172.21.0.3) 5432 port [tcp/postgresql] succeeded!
exit=0

Meaning: We’ve converted “refused” into “succeeded” by fixing the listener.

Decision: Lock in the configuration change, add a readiness check, and remove any client-side “retry until the heat death of the universe.”

Task 13: Catch the “depends_on means ready” misconception

cr0x@server:~$ docker compose ps
NAME           IMAGE            COMMAND                  SERVICE   STATUS          PORTS
stack-api-1    api:latest       "/app/api"               api       Up 20 seconds   0.0.0.0:8080->8080/tcp
stack-db-1     postgres:16      "docker-entrypoint..."   db        Up 22 seconds   5432/tcp

Meaning: “Up” is not “ready.” Postgres might still be running migrations or replaying WAL. Clients can see refused during early boot.

Decision: Add a healthcheck to DB and gate API startup on DB readiness (or implement robust connection retry with bounded backoff).

Task 14: Verify health status (when you add healthchecks)

cr0x@server:~$ docker inspect -f '{{.State.Health.Status}}' stack-db-1
healthy

Meaning: Healthchecks give you a reliable readiness signal. You can use it for orchestration decisions and alerting.

Decision: If unhealthy: stop blaming the network and fix DB initialization, credentials, disk, or config.

Task 15: Spot host firewall/NAT issues for published ports (host involved)

cr0x@server:~$ sudo iptables -S DOCKER | head
-N DOCKER
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 8080 -j DNAT --to-destination 172.21.0.2:8080

Meaning: Docker injects rules to forward host ports to container IPs. If these rules are missing, published ports won’t work.

Decision: If rules are missing or your environment uses nftables policies that override Docker: align firewall management with Docker,
or use a different networking mode intentionally. Don’t “just flush iptables” in production unless you like surprise audits.

Task 16: Detect accidental multiple Compose projects on separate networks

cr0x@server:~$ docker network ls --format 'table {{.Name}}\t{{.Driver}}\t{{.Scope}}' | grep -E 'stack|appnet'
NAME                DRIVER    SCOPE
stack_default        bridge    local
billing_default      bridge    local

Meaning: Two projects, two default networks. A container on stack_default cannot reach services on billing_default by name.

Decision: Attach both stacks to a shared user-defined network (explicitly), or run them as one Compose project if they’re coupled.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption (“localhost is the database”)

A product team migrated a monolith into “microservices” over a quarter. It wasn’t a religious conversion; it was a budget line item.
The first step was running the API and Postgres in Docker Compose locally, then promoting that setup into a shared dev environment.

In the monolith days, the database lived on the same VM. So the config everywhere said DB_HOST=localhost.
During the migration, someone put Postgres into a container but kept the old variable, thinking Docker would “map it.”
Docker did map it—straight into the wrong place.

The symptom was immediate: the API containers threw connect: connection refused. The first response was to increase retries,
because the team had recently been burned by cold starts in Kubernetes. Retries went from 3 to 30, and the logs became an expensive novel.
It still failed, because you can’t eventually-consistent your way into a socket that doesn’t exist.

The break came when someone ran getent hosts inside the container and noticed the service name db resolved fine.
The API just wasn’t using it. One config change later—DB_HOST=db—the incident was over.
The lesson wasn’t “use service names.” The lesson was: assumptions are technical debt with a due date.

Mini-story 2: The optimization that backfired (published ports for “performance”)

Another org had a Compose-based integration environment. A senior engineer (competent, genuinely) decided to “simplify networking”
by having services talk to each other via host-published ports. The logic sounded clean:
“Everything points at the host IP, we use one firewall policy, and it’ll be easier to debug.”

It worked until it didn’t. Under load, they started seeing intermittent “connection refused” from service A to service B.
The refusal clustered during deployments, but also randomly during busy hours. That’s the kind of behavior that makes people blame the cloud provider,
the kernel, and sometimes astrology.

The actual problem was self-inflicted complexity. Inbound traffic took a detour: container A → host NAT → docker-proxy/iptables → container B.
During container restarts and IP churn, the timing windows widened. Some connections landed on a port mapping that briefly pointed nowhere.
Also, internal calls were now pinned to host-specific addresses, making horizontal scaling harder and failovers nastier.

They reverted to direct service-to-service traffic over the user-defined Docker network, using service names and container ports.
For debugging and ingress they kept published ports, but internal calls stayed internal. Their “optimization” was really a detour through more moving parts.

Mini-story 3: The boring practice that saved the day (healthchecks and explicit networks)

A platform team ran a modest Compose stack for internal tools: API, queue, Postgres, and a worker. Nothing fancy.
What was fancy is that they treated it like production anyway: explicit user-defined networks, healthchecks, and predictable service names.
No magic defaults. No “it works on my laptop” energy.

One Friday, the host rebooted after routine kernel patching. Services came back up, but the API started erroring immediately.
The on-call saw “connection refused” and prepared for a long evening. Then they checked DB health: starting.
Postgres was replaying WAL after an unclean shutdown—normal, but it takes time.

Because healthchecks were in place, the API container didn’t stampede the DB with connection storms.
It waited. The logs were boring. Alerts were meaningful. Ten minutes later, everything was healthy and nobody wrote a panicked status update.

Boring is an achievement. “Works after reboot” is not a default property; it’s something you engineer.

Common mistakes: symptom → root cause → fix

These are the recurring patterns behind “connection refused” in Dockerized systems. Each one includes a specific corrective action.
If your team keeps repeating one, turn it into a lint rule or a review checklist item.

1) “API can’t reach DB, but DB is running” → DB binds to loopback → bind to 0.0.0.0

  • Symptom: connect: connection refused from other containers; ss shows 127.0.0.1:5432.
  • Root cause: Service listens only on loopback inside the container.
  • Fix: Configure bind/listen address to 0.0.0.0 (or container IP), plus proper auth rules (e.g., Postgres pg_hba.conf).

2) “Works on host, fails in container” → using localhost → use service name

  • Symptom: App config uses localhost or 127.0.0.1 for another service.
  • Root cause: Misunderstanding of network namespaces.
  • Fix: Use Compose service name (e.g., redis, db) and container port.

3) “Connection refused only during startup” → readiness isn’t guaranteed → add healthchecks/backoff

  • Symptom: First few seconds/minutes after deploy: refused; later: OK.
  • Root cause: Client starts before server is listening (or before it’s ready to accept connections).
  • Fix: Healthchecks and dependency gating; or client retry with bounded exponential backoff and jitter.

4) “Can ping, but TCP refused” → network is fine, port isn’t → stop debugging L3

  • Symptom: IP reachable, ARP/routing fine, but nc fails with refused.
  • Root cause: Service down, wrong port, wrong bind address, or app-level crash.
  • Fix: Check ss -lntp and service logs; verify port mapping and config.

5) “Service name doesn’t resolve” → wrong network/project → attach to same user-defined network

  • Symptom: getent hosts db fails inside a container.
  • Root cause: Containers are on different networks or different Compose projects without a shared network.
  • Fix: Declare an explicit shared network in Compose and attach both services to it.

6) “Connecting to published port from another container” → NAT detour → use container port on network

  • Symptom: Container calls host:15432 or service:15432 because 15432 is published.
  • Root cause: Confusion between ingress ports and internal ports.
  • Fix: For internal traffic: db:5432. Publish ports only for external clients.

7) “Intermittent refuses after redeploy” → stale IP assumptions → stop using container IPs

  • Symptom: Hardcoded IP works until a restart, then refused/timeout.
  • Root cause: Container IPs change; your app config doesn’t.
  • Fix: Use service discovery (names), not container IP addresses.

8) “Published port dead from outside” → firewall/nftables conflict → align host filtering with Docker

  • Symptom: Host port mapped, container listens, but outside clients get refused.
  • Root cause: Host firewall rules override Docker’s DNAT/forwarding, or Docker’s rules never installed cleanly.
  • Fix: Fix firewall policy to allow forwarding; ensure Docker chain integration; avoid managing iptables in two competing systems.

Joke #2: Docker networking isn’t haunted; it just looks that way when you skip the part where you verify which universe your packets are in.

Checklists / step-by-step plan

Step-by-step: from “refused” to a root cause in 10 minutes

  1. Identify the exact target from the client log: host, port, protocol. If it’s localhost, assume it’s wrong until proven otherwise.
  2. Check container health/state: is the server restarting or unhealthy?
  3. From the client container: resolve the server name and capture the IP.
  4. From the client container: attempt TCP connect with nc to the server name and port.
  5. From the server container: confirm a listener exists with ss -lntp.
  6. Confirm bind address: if it’s 127.0.0.1, fix to 0.0.0.0 (and tighten auth properly).
  7. Confirm network membership: both containers on same user-defined network; inspect network object.
  8. Eliminate port confusion: internal calls use container port; external calls use published port.
  9. Only then inspect host firewall/NAT rules, if the host is part of the path.
  10. After fix: add healthchecks/readiness, remove cargo-cult sleeps, and document the networking contract.

Deployment checklist: prevent “refused” before it happens

  • Use a user-defined network; do not rely on the default bridge.
  • Use service names for inter-service traffic; never hardcode container IPs.
  • Bind network services to 0.0.0.0 inside containers unless you have a specific reason not to.
  • Publish ports only for ingress; don’t route internal traffic through the host.
  • Add healthchecks for stateful services (DB, cache, queue); consume health status in orchestration decisions.
  • Implement bounded retry with backoff and jitter for clients; treat it as resilience, not a bandage.
  • Keep firewall policy coherent with Docker; avoid dueling rule managers.
  • Make Compose networks explicit and named when multiple projects must talk.

Interesting facts and historical context (useful, not trivia)

  • Fact 1: Early Docker setups relied heavily on Linux bridges and iptables NAT; published ports are still implemented via DNAT rules on many systems.
  • Fact 2: The default bridge network historically behaved differently from user-defined bridges, especially around automatic DNS/service discovery.
  • Fact 3: Docker’s embedded DNS commonly appears as 127.0.0.11 inside containers on user-defined networks—a detail that’s diagnostic gold.
  • Fact 4: “Connection refused” is typically an immediate TCP RST, meaning the network path to the IP worked; the endpoint rejected the port.
  • Fact 5: Compose service names became a de facto service discovery mechanism for dev/test long before many teams adopted “real” discovery systems.
  • Fact 6: Container IPs are intentionally ephemeral; stable addressing is provided through naming and discovery, not by pinning IPs.
  • Fact 7: Some distros moved from iptables to nftables; mismatched firewall tooling can produce confusing Docker networking failures if chains aren’t integrated properly.
  • Fact 8: depends_on in Compose was never a readiness guarantee; it is ordering. Treat “ready” as an application-level property.
  • Fact 9: Binding to 127.0.0.1 inside a container is a classic footgun because it silently blocks all external container traffic while looking “secure.”

FAQ

1) Why do I get “connection refused” instead of a timeout?

Refused usually means you reached the destination IP and the kernel responded with a reset because nothing is listening on that port
(or a firewall actively rejected). Timeouts are more about dropped packets and broken paths.

2) If both services are in Compose, why can’t they talk automatically?

They can, but only if they share a network and you use service names. Problems happen when services land on different networks,
different Compose projects, or the client is configured to use localhost or a published host port.

3) Should I use container IP addresses for performance?

No. Container IPs change. Name resolution is not your bottleneck; your next outage is.
Use service names and let Docker DNS handle the mapping.

4) What’s the difference between expose and ports in Compose?

expose documents internal ports and can influence linked behavior, but it does not publish to the host.
ports publishes ports to the host interface (usually via NAT). Container-to-container traffic doesn’t need published ports.

5) Is depends_on enough to prevent startup connection failures?

No. It starts containers in order; it doesn’t guarantee the dependency is ready to accept connections.
Use healthchecks and/or client retry with sane backoff.

6) Why does the server listen on 127.0.0.1 inside the container?

Many services default to loopback for “safety.” In containers, that often blocks the only traffic you actually want: other containers.
Bind to 0.0.0.0 and secure with authentication/ACLs rather than hiding behind loopback.

7) Can firewalls cause “connection refused” in Docker?

Yes. Reject rules can generate RST/ICMP responses that look like refusal. More commonly, firewalls cause timeouts by dropping packets.
If your path involves published ports or cross-host routing, validate host firewall integration with Docker.

8) Should containers call each other via the host published port?

Usually no. It adds NAT, extra failure modes, and host coupling. Use service-to-service networking via the shared Docker network.
Publish ports for clients outside Docker.

9) Why does it work locally but fail in CI or on a shared server?

Local setups often have fewer networks, fewer Compose projects, and less firewall policy. In CI/shared environments,
network names collide, services start in different orders, and firewall baselines can differ. Make networks explicit and add healthchecks.

10) What if DNS resolves correctly but I still get refused?

Then DNS isn’t your problem. Refused points to listening sockets, bind addresses, wrong ports, or server process crashes.
Run ss -lntp inside the server container and verify the port is listening on 0.0.0.0.

Conclusion: practical next steps

“Connection refused” is not a mystery; it’s a diagnosis begging to be finished. Your job is to stop treating it like a weather event.
Prove the target, prove name resolution, prove network membership, prove the listener, and only then argue about firewalls.

Next steps that pay rent:

  • Audit every inter-service config for localhost, host IPs, and published ports used internally. Replace with service names and container ports.
  • Add healthchecks for stateful services and make your clients handle startup gracefully with bounded backoff.
  • Make Compose networks explicit, especially when multiple projects must communicate.
  • Standardize a short runbook: getent, nc, ss, docker network inspect. Make it muscle memory.
← Previous
Ubuntu 24.04: Random disconnects — debug NIC drops and offloads without superstition
Next →
Debian/Ubuntu Random Timeouts: Trace the Network Path with mtr/tcpdump and Fix the Cause (Case #64)

Leave a comment