I’m trying to get docker swarm to run inside docker-compose where containers fill the role of nodes. For educational purposes, I’d like to be able to simulate a distributed swarm via docker-compose.
My approach:
./swarm_compose/
|- docker-compose.yml
|- shared/
|- entrypoint/
|- |- worker.sh
|- |- master.sh
-
docker-compose.yml
defines two services (master
,worker
) and a network (swarm_net
). -
shared/
is a shared volume utilized by the master and worker containers to share the join token. It is also used as a directory to write log files during initialization for easy access and debugging. -
entrypoint/master.sh
is used to initialize the swarm and write the join token to a shared volume. -
entrypoint/worker.sh
is used to join the swarm with the token written to the shared volume.
The worker
services depend_on
the master providing a successful healthcheck
, and the master’s healthcheck
depends on the existence of the join token file written to shared storage. Furthermore, I use the docker:dind
image and mount the docker.sock
to enable docker-in-docker for the swarm.
The files:
./entrypoint/master.sh:
#!/bin/sh
# Initialize Docker Swarm and save initialization logs
docker swarm init --advertise-addr ${MASTER_ADDR} > /shared/swarm_init.log 2>&1
# Save join token to shared directory
docker swarm join-token -q worker > /shared/swarm_token
./entrypoint/worker.sh:
#!/bin/sh
# Join the Swarm and log the output
docker swarm join --token $(cat /shared/swarm_token) ${MASTER_ADDR}:2377 > /shared/swarm_join.log 2>&1
./docker-compose.yml:
version: '3.8'
x-config:
cidr: &cidr_range "192.168.0.0/24"
master: &master_address "192.168.0.10"
services:
master:
image: docker:dind
container_name: swarm_master
networks:
swarm_net:
ipv4_address: *master_address
ports:
- "2377:2377"
expose:
- "7946"
- "4789/udp"
volumes:
- ./entrypoint/master.sh:/tmp/entrypoint.sh
- /var/run/docker.sock:/var/run/docker.sock
- ./shared:/shared
environment:
MASTER_ADDR: *master_address
privileged: true
entrypoint: /tmp/entrypoint.sh
healthcheck:
test: test -f /shared/swarm_token
interval: 60s
retries: 5
start_period: 20s
timeout: 10s
worker:
image: docker:dind
networks:
swarm_net:
expose:
- "7946"
- "4789/udp"
volumes:
- ./entrypoint/worker.sh:/tmp/entrypoint.sh
- /var/run/docker.sock:/var/run/docker.sock
- ./shared:/shared
environment:
MASTER_ADDR: *master_address
privileged: true
entrypoint: /tmp/entrypoint.sh
depends_on:
master:
condition: service_healthy
deploy:
replicas: 2
networks:
swarm_net:
ipam:
driver: default
config:
- subnet: *cidr_range
The Error:
Launching this fails as follows:
/swarm_compose$ docker-compose up
Starting swarm_master ... done
ERROR: for worker Container "2bdbe8056ab8" is unhealthy.
ERROR: Encountered errors while bringing up the project.
I proceed to check the logs, which were written to /shared/swarm_init.log
.
Logs from the manager init:
Error response from daemon: must specify a listening address because the address to advertise is not recognized as a system address, and a system's IP address to use could not be uniquely identified
Issue seems related to this https://github.com/moby/moby/issues/25514
Seems the swarm manager is having issues initializing with the given IP. This failure is cascading to the worker containers.
I attempted the workaround found in the GitHub thread by updating the masters entrypoint script as follows:
docker swarm init --advertise-addr ${MASTER_ADDR} --listen-addr 0.0.0.0
.
I receive the same error in the log file though.
Edit: For what it’s worth (perhaps little) I did manage to successfully initialize just the manager by using network_mode: host
like so:
version: '3.8'
x-config:
master: &master_address "127.0.0.1"
services:
master:
image: docker:dind
container_name: swarm_master
network_mode: host
volumes:
- ./entrypoint/master.sh:/tmp/entrypoint.sh
- /var/run/docker.sock:/var/run/docker.sock
- ./shared:/shared
environment:
MASTER_ADDR: *master_address
privileged: true
entrypoint: /tmp/entrypoint.sh
healthcheck:
test: test -f /shared/swarm_token
interval: 60s
retries: 5
start_period: 20s
timeout: 10s
This approach makes it impossible to isolate the workers and master though, and the workers reasonably-so fail to initialize with:
Error response from daemon: This node is already part of a swarm. Use "docker swarm leave" to leave this swarm and join another one.