concat ffmpeg filter outputs wrong timebase, xfade cannot be used

I’m trying to join several videos (removed for testing purposes) and images using xfade. In some cases, I don’t want to use any transition, so instead of using xfade filter I’m using concat filter.
This is the command:

ffmpeg 
-loop 1 -t 3 -framerate 30 -i f44096fb9e5ecff463aeedc56c5f4795 
-loop 1 -t 3 -framerate 30 -i 3a5ec4f224614a17ff4c7c77fe853233 
-loop 1 -t 3 -framerate 30 -i 3a5ec4f224614a17ff4c7c77fe853233 
-filter_complex " 
[0:v]settb=AVTB,setpts=PTS-STARTPTS,fps=30,scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:trunc((ow-iw)/2):trunc((oh-ih)/2):black,setsar=1[scale0]; 
[1:v]settb=AVTB,setpts=PTS-STARTPTS,fps=30,scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:trunc((ow-iw)/2):trunc((oh-ih)/2):black,setsar=1[scale1]; 
[2:v]settb=AVTB,setpts=PTS-STARTPTS,fps=30,scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:trunc((ow-iw)/2):trunc((oh-ih)/2):black,setsar=1[scale2]; 
[scale0][scale1]concat=n=2:v=1:a=0[vconcat1]; 
[vconcat1][scale2]xfade=transition=wipeleft:duration=1:offset=5,setpts=PTS-STARTPTS[xfade2]" 
-map "[xfade2]" -f mp4 -q:v 0 -r 30 -vcodec libx264 -pix_fmt yuv420p -shortest output.mp4

However I get the following error:

[Parsed_xfade_19 @ 0x6703a80] First input link main timebase (1/1000000) do not match the corresponding second input link xfade timebase (1/30)
[Parsed_xfade_19 @ 0x6703a80] Failed to configure output pad on Parsed_xfade_19
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #2:0

Why am I getting this? Input videos are using 30 fps and default timebase as per previous filters. Why concat filter is outputting a timebase of 1/1000000 and where is it specified? How can I change that?

I also get this warning, but I think it’s unrelated:

[swscaler @ 0x6963e80] deprecated pixel format used, make sure you did set range correctly

By the way, can I use anything else rather than concat filter so that videos which don’t use xfade are not re-encoded? I cannot use null filter because it takes only 1 input, not 2.

Thanks!