Zend certified PHP/Magento developer

Why is DNS Resolution failing when default gateway is changed to backup circuit?

We have a strange problem that I hope someone can help with. We have a building that has a fiber circuit and a backup Comcast circuit (coax).

The system is setup like this:

Fiber NID (cat6 cable to switch) Comcast (cat6 cable to switch) | unmanaged switch | Unbuntu Linux Box as router (cat6 cable from WAN port to Switch) | LAN port to fiber OLT | 4 fiber GPON ports going to many ONT’s

We wrote a bash script that will check every minute to see if the main gateway (fiber circuit) is up and if not, it will change the default gateway to the Comcast router 10.1.10.1. Once main Gateway comes back on, it will switch back to it automatically.

We have this same set up in many buildings and it works perfectly, but we have a few buildings that once the default gateway switches to the backup Comcast circuit, DNS won’t resolv at all most of the time and sometime it will resolve after like 4 seconds at which most browsers already time out.

The strange thing is that if we disconnect the cable from the OLT that goes to LAN port on linux box router, DNS resolves just fine from the command line on the linux box and from laptop plugged into the LAN port of Linux box router.

If we plug a laptop into a rj45 port on the OLT, it strangely gets an IPV6 address (Comcast, even if we were to unplug our Comcast router first). From here if we disconnect the fiber GPON ports which removes the residents from the equation, the Backup Comcast resolves properly and everything works great and we don’t get an ipv6 address (we only use ipv4 on lan side). Plug back in the fiber GPON ports for residents and instantly DNS won’t resolve anymore.

The OLT is set with port isolation so the residents can’t “see” each other. We tried to setting up ACL’s to block all IPv6 traffic as well as all private ipv4 not on 172.16.0.0/16 subnet, but that didn’t work.

/etc/resolv.conf

nameserver 8.8.8.8
nameserver 8.8.4.4
nameserver 1.1.1.1

The /etc/resolv.conf is the same whether on Backup circuit or main circuit.

/etc/network/interfaces

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback


#The LAN (right port) network interface
allow-hotplug enp2s0
iface enp2s0 inet static
    address 172.16.1.1
    netmask 255.255.0.0

######## END LAN SETUP ####################################################

############ MAIN WAN SETUP ################################################
#WAN
allow-hotplug enp3s0
iface enp3s0 inet static
    address xxx.xxx.xxx.xxx
    netmask 255.255.255.252
    gateway xxx.xxx.xxx.xxx
    dns-nameservers 8.8.8.8 8.8.4.4 1.1.1.1

########### END MAIN WAN SETUP
#WAN backup circuit
allow-hotplug enp3s0
iface enp3s0 inet dhcp
    dns-nameservers 8.8.8.8 8.8.4.4 1.1.1.1

####### END BACKUP WAN SETUP FAILOVER ###################################

We we did a tcpdump, we saw multiple “Comcast” modems/routers on the LAN side (we looked up the mac addresses). It seems that some residents have Comcast Internet and have plugged in our ONT to their Comcast router and it is leaking into the system. If I set a static IP of 10.0.0.xx and plug into ONT rj45 port, I can ping and browse to someones Comcast modem/router. Once we setup the ACL’s this is now blocked, but we still have the same problem.

To summarize, it seems there is something (possibly residents Comcast modems/routers) coming from one or more residents ONT’s that is interfering with DNS resolution, but only when our system has switched default gateway to our Comcast backup router. To reiterate this is only happening at two out of many buildings with the same set up. At the other buildings we don’t seem to see any “Comcast modems/routers” coming from residents hooked into our system by mistake.

Any ideas?