How to run and network Docker Swarm inside Docker containers?

I’m trying to get docker swarm to run inside docker-compose where containers fill the role of nodes. For educational purposes, I’d like to be able to simulate a distributed swarm via docker-compose.

My approach:

./swarm_compose/
|- docker-compose.yml
|- shared/
|- entrypoint/
|- |- worker.sh
|- |- master.sh

docker-compose.yml defines two services (master, worker) and a network (swarm_net).
shared/ is a shared volume utilized by the master and worker containers to share the join token. It is also used as a directory to write log files during initialization for easy access and debugging.
entrypoint/master.sh is used to initialize the swarm and write the join token to a shared volume.
entrypoint/worker.sh is used to join the swarm with the token written to the shared volume.

The worker services depend_on the master providing a successful healthcheck, and the master’s healthcheck depends on the existence of the join token file written to shared storage. Furthermore, I use the docker:dind image and mount the docker.sock to enable docker-in-docker for the swarm.

The files:

./entrypoint/master.sh:

#!/bin/sh

# Initialize Docker Swarm and save initialization logs
docker swarm init --advertise-addr ${MASTER_ADDR} > /shared/swarm_init.log 2>&1

# Save join token to shared directory
docker swarm join-token -q worker > /shared/swarm_token

./entrypoint/worker.sh:

#!/bin/sh

# Join the Swarm and log the output
docker swarm join --token $(cat /shared/swarm_token) ${MASTER_ADDR}:2377 > /shared/swarm_join.log 2>&1

./docker-compose.yml:

version: '3.8'

x-config:
  cidr: &cidr_range "192.168.0.0/24"
  master: &master_address "192.168.0.10"

services:
  master:
    image: docker:dind
    container_name: swarm_master
    networks:
      swarm_net:
        ipv4_address: *master_address
    ports:
      - "2377:2377"
    expose:
      - "7946"
      - "4789/udp"
    volumes:
      - ./entrypoint/master.sh:/tmp/entrypoint.sh
      - /var/run/docker.sock:/var/run/docker.sock
      - ./shared:/shared
    environment:
      MASTER_ADDR: *master_address
    privileged: true
    entrypoint: /tmp/entrypoint.sh
    healthcheck:
      test: test -f /shared/swarm_token
      interval: 60s
      retries: 5
      start_period: 20s
      timeout: 10s


  worker:
    image: docker:dind
    networks:
      swarm_net:
    expose:
      - "7946"
      - "4789/udp"
    volumes:
      - ./entrypoint/worker.sh:/tmp/entrypoint.sh
      - /var/run/docker.sock:/var/run/docker.sock
      - ./shared:/shared
    environment:
      MASTER_ADDR: *master_address
    privileged: true
    entrypoint: /tmp/entrypoint.sh
    depends_on:
      master:
        condition: service_healthy
    deploy:
      replicas: 2

networks:
  swarm_net:
    ipam:
      driver: default
      config:
        - subnet: *cidr_range

The Error:

Launching this fails as follows:

/swarm_compose$ docker-compose up
Starting swarm_master ... done

ERROR: for worker  Container "2bdbe8056ab8" is unhealthy.
ERROR: Encountered errors while bringing up the project.

I proceed to check the logs, which were written to /shared/swarm_init.log.

Logs from the manager init:

Error response from daemon: must specify a listening address because the address to advertise is not recognized as a system address, and a system's IP address to use could not be uniquely identified

Issue seems related to this https://github.com/moby/moby/issues/25514

Seems the swarm manager is having issues initializing with the given IP. This failure is cascading to the worker containers.

I attempted the workaround found in the GitHub thread by updating the masters entrypoint script as follows:
docker swarm init --advertise-addr ${MASTER_ADDR} --listen-addr 0.0.0.0 .

I receive the same error in the log file though.

Edit: For what it’s worth (perhaps little) I did manage to successfully initialize just the manager by using network_mode: host like so:

version: '3.8'

x-config:
  master: &master_address "127.0.0.1"

services:
  master:
    image: docker:dind
    container_name: swarm_master
    network_mode: host
    volumes:
      - ./entrypoint/master.sh:/tmp/entrypoint.sh
      - /var/run/docker.sock:/var/run/docker.sock
      - ./shared:/shared
    environment:
      MASTER_ADDR: *master_address
    privileged: true
    entrypoint: /tmp/entrypoint.sh
    healthcheck:
      test: test -f /shared/swarm_token
      interval: 60s
      retries: 5
      start_period: 20s
      timeout: 10s

This approach makes it impossible to isolate the workers and master though, and the workers reasonably-so fail to initialize with:

Error response from daemon: This node is already part of a swarm. Use "docker swarm leave" to leave this swarm and join another one.