I’m experimenting with Docker images having 20..70 GiB in a single layer (yes, the software/compiler is so fat). To improve the download and decompression efficiency of Docker, I’m splitting this fat layer into multiple layers of 3..6 GiB. I wrote a script writing one file per layer, containing one file or directory entry per line.
Finally, in a multi-stage Docker build, xargs and cp are used to copy the files (parallel) into the target layer.
RUN --mount=type=bind,from=monolithic,source=/opt/layers,target=/context
--mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install
if [[ -f /context/layer_1.list ]]; then
cd /Install;
xargs --no-run-if-empty --arg-file=/context/layer_1.list --max-procs=4 --max-args=25 -- cp --parents -r -t /opt/Xilinx/Vivado/${VIVADO_VERSION};
fi
While many layers have just a few lines, a few files have more than 100 entries (3rd column = entries per file):
#12 19.18 expected size: 48128.0 MiB
#12 19.18 min layer count: 16
#12 19.18 min. image size: 1002.7 MiB
#12 19.18 max. image size: 3008.0 MiB
#12 19.18 max. item size: 802.1 MiB
#12 19.18 Collecting data (this may take several minutes) ...
#12 19.18
#12 19.18 Time: 14.095 s
#12 19.18 Total files: 1029
#12 19.18 Total dirs: 3361
#12 19.18 Time: 0.021 s
#12 19.18 Total size: 52584.5 MiB (4331)
#12 19.18 Sort Time: 0.001 s
#12 19.18 Docker Layers: 25
#12 19.18 24 1307.3 MiB 3285 /context/layer_24.list
#12 19.18 23 1535.6 MiB 505 /context/layer_23.list
#12 19.18 22 1594.5 MiB 184 /context/layer_22.list
#12 19.18 21 1660.9 MiB 87 /context/layer_21.list
#12 19.18 20 1726.3 MiB 51 /context/layer_20.list
#12 19.18 19 1781.5 MiB 40 /context/layer_19.list
#12 19.18 18 1842.1 MiB 28 /context/layer_18.list
#12 19.18 17 1907.2 MiB 19 /context/layer_17.list
#12 19.18 16 1964.1 MiB 16 /context/layer_16.list
#12 19.18 15 1924.3 MiB 13 /context/layer_15.list
#12 19.18 14 2083.4 MiB 11 /context/layer_14.list
#12 19.18 13 2118.0 MiB 10 /context/layer_13.list
#12 19.18 12 2104.5 MiB 9 /context/layer_12.list
#12 19.18 11 2255.0 MiB 9 /context/layer_11.list
#12 19.18 10 2354.7 MiB 9 /context/layer_10.list
#12 19.18 9 2233.3 MiB 8 /context/layer_9.list
#12 19.18 8 2372.7 MiB 8 /context/layer_8.list
#12 19.18 7 2507.2 MiB 8 /context/layer_7.list
#12 19.18 6 2445.2 MiB 7 /context/layer_6.list
#12 19.18 5 2415.4 MiB 6 /context/layer_5.list
#12 19.18 4 2735.2 MiB 6 /context/layer_4.list
#12 19.18 3 2456.1 MiB 4 /context/layer_3.list
#12 19.18 2 2658.5 MiB 4 /context/layer_2.list
#12 19.18 1 2513.4 MiB 3 /context/layer_1.list
#12 19.18 0 2088.1 MiB 1 /context/layer_0.list
Because there is a file with more then 3000 entries, xargs is used with --max-args=25, so the maximum command line limit is not exceeded. Otherwise a command could easily contain >30,000 characters. In addition, --max-procs=4 is used to fully utilize a 24 core CI server with NVMe memory. So cp can run 4-times in parallel. In total more then 100k files are copied into >20 layers.
cp is used with --parents and -r, so the source’s directory structure is recreated and entries can be either single files or subdirectories.
When running this, I get this error message from cp:
#36 3.400 cp: cannot make directory '/opt/Xilinx/Vivado/2023.2/data/deca/models_dir/VersalDefault': File exists
My suspicion is xargs --max-procs isn’t compatible to cp --parents, but I can’t find any evidence in the documentations or online (Google, StackOverflow, …).
I suspect, because of overlapping directories and --parents, one cp was faster than the other parallel running cp causing this error message.
If so, should this be reported as a probelm/bug to cp?