Zend certified PHP/Magento developer

Recordings composition with ffmpeg

I am trying to compose Twilio’s recording based on this article.

I have video1.mkv and audio1.mka (for first participant of video call)
and video2.mkv and audio2.mka for (second participant). Summary call duration are 54 sec – video1 and audio1 have 54 sec and video2, audio2 have 27 sec duration. Second participant joins to the call after 26.9 seconds.

When i used this command:

ffmpeg -i video1.mkv -i video2.mkv -acodec libopus -i audio1.mka -acodec libopus -i audio2.mka -y -filter_complex "[0]scale=512:-2,setsar=1:1,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs0],color=black:size=512x768:duration=0.189[b0],[b0][vs0]concat[r0c0];   [1]scale=512:-2,setsar=1:1,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs1],color=black:size=512x768:duration=26.932[b1],[b1][vs1]concat[r0c1];[r0c0][r0c1]hstack=inputs=2;[2]aresample=async=1[a0];[3]aresample=async=1,adelay=26901|26901[a1];[a0][a1]amix=inputs=2" -map  -map  -acodec libopus -vcodec libvpx output.webm

my output file has 1 min 14 sec duration (but should be 54 sec).

When i used only audio:

ffmpeg -i audio1.mka -i audio2.mka -filter_complex "[0]aresample=async=1[a0];[1]aresample=async=1,adelay=26901|26901[a1];[a0][a1]amix=inputs=2" -map  -acodec libopus -strict -2 output_audio.webm

then output has correct duration.

What is wrong with my first command?