Table Of Content
In a previous entry, I described my thoughts on using VXLAN to replace MPLS for L2VPNs across a network. After reading through it, I decided to go back and re-do the configuration examples a bit, to make things a bit more like a backbone with several hops between the customer routers. This entry will outline those changes and configurations. It doesn’t change my thoughts on VXLAN vs MPLS.
Changing the Lab Network
Recall my lab network diagram, which looks like this:
In the previous blog entry, I changed SW1 and Router 1 enough so that both the Agg router and SV1 could EBGP-peer with one another via a VXLAN tunnel. It dawned on me that I did it wrong, and that I was focusing on the wrong section of my little lab network here. After thinking through it, I decided that SVI and SV2 would be the customer routers. In fact, the work would go a little like the diagram below:
Once these steps were completed, the network would look more like this:
1. BGP Between Agg and Router 1
This wasn’t difficult; I just had to return the Agg interface and the Router 1 interface back to their original configs from the EVPN blog entry. A quick edit to both of their respective /etc/frr/frr.conf files as well, and they were once again peered.
2. Break BGP Between Spines and Leafs
The idea here was to stretch the network out a little. With that, I disabled BGP between Router 1 and Switch 2, and between Router 2 and Switch 1. Once that was accomplished, each switch only had one upstream router connected to it.
3. Convert SV Interfaces to L3 Sub-Interfaces
As stated previously, I promoted my two server images to routers. Remember that these are just running Cumulus OS, so it’s an easy config change to make. The configuration on the new CUST1-R[1,2] devices and the upstream SW[1,2] will be outlined here. It involved creating two L3 sub-interfaces on each physical link, IP addressing them accordingly, and getting BGP running across each sub-int.
On the Switch
I’ll go through the config on just SW1; the configuration on the other switch is identical, with differences in IP addressing. I used IP addressing instead of un-numbered because of the assumption that the customer routers aren’t running Cumulus. First, in /etc/network/interfaces, I had to create 2 L3 sub-ints out of swp6, and then add one of them into VXLAN:
auto swp6 iface swp6 auto swp6.100 iface swp6.100 auto swp6.200 iface swp6.200 address 10.4.0.0/31 auto bridge iface bridge bridge-ports swp6.100 vni10100 bridge-vids 100 bridge-vlan-aware yes
As you can see, swp6 now has swp6.100 an swp6.200. The latter has an IP address on it, and will be the one I use to EBGP-peer with the downstream customer router. The .100 interface will be added to the bridge interface, which will enable VXLAN to do its thing with it.
The changes to /etc/frr/frr.conf aren’t really massive; I just needed to create a new peer group and put swp6.200 in as an interface peer:
router bgp 65201 ! neighbor customer peer-group neighbor customer remote-as external ! neighbor swp6.200 interface peer-group customer
On the Customer Router
For the interfaces file on cust1-r1, I simply had to create 2 new L3 sub-interfaces on swp4:
auto swp4 iface swp4 auto swp4.100 iface swp4.100 address 172.18.0.0/31 auto swp4.200 iface swp4.200 address 10.4.0.1/31
You can see that the IP address on swp4.200 matches up with the IP on the upstream switch. Once networking is restarted, we can ping the switch’s interface:
root@cust1-r1:/home/jvp# ping 10.4.0.0 PING 10.4.0.0 (10.4.0.0) 56(84) bytes of data. 64 bytes from 10.4.0.0: icmp_seq=1 ttl=64 time=0.352 ms 64 bytes from 10.4.0.0: icmp_seq=2 ttl=64 time=0.301 ms 64 bytes from 10.4.0.0: icmp_seq=3 ttl=64 time=0.242 ms 64 bytes from 10.4.0.0: icmp_seq=4 ttl=64 time=0.377 ms ^C --- 10.4.0.0 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3000ms rtt min/avg/max/mdev = 0.242/0.318/0.377/0.051 ms root@cust1-r1:/home/jvp# arp -an 10.4.0.0 ? (10.4.0.0) at 08:00:27:c1:57:1a [ether] on swp4.200
As for the BGP configuration, it was also pretty simple. But I had to set up 2 peers: one directly to the upstream switch (leaf), and the other to the second customer router via the VXLAN tunnel:
router bgp 65230 bgp router-id 10.100.0.20 redistribute connected neighbor leaf peer-group neighbor leaf remote-as external neighbor over-vxlan peer-group neighbor over-vxlan remote-as external ! neighbor swp4.100 interface peer-group over-vxlan neighbor swp4.200 interface peer-group leaf
But after restarting the frr service, I only had 1 EBGP peer. That’s because the interfaces and BGP configurations hadn’t been changed on Switch 2 and cust1-r2. Either way, cust1-r1 still had a default route:
root@cust1-r1:/home/jvp# net show route 0.0.0.0 RIB entry for 0.0.0.0 ===================== Routing entry for 0.0.0.0/0 Known via "bgp", distance 20, metric 0, best Last update 02:23:30 ago * 10.4.0.0, via swp4.200 FIB entry for 0.0.0.0 ===================== default via 10.4.0.0 dev swp4.200 proto bgp metric 20
SW2 and Cust1-R2
Once the same changes were made to Switch 2 and cust1-r2, everything clicked into place. Again from cust1-r1:
root@cust1-r1:/home/jvp# ping 172.18.0.1 PING 172.18.0.1 (172.18.0.1) 56(84) bytes of data. 64 bytes from 172.18.0.1: icmp_seq=1 ttl=64 time=1.42 ms 64 bytes from 172.18.0.1: icmp_seq=2 ttl=64 time=1.38 ms 64 bytes from 172.18.0.1: icmp_seq=3 ttl=64 time=1.41 ms 64 bytes from 172.18.0.1: icmp_seq=4 ttl=64 time=1.23 ms 64 bytes from 172.18.0.1: icmp_seq=5 ttl=64 time=1.47 ms --- 172.18.0.1 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4008ms rtt min/avg/max/mdev = 1.235/1.386/1.472/0.090 ms root@cust1-r1:/home/jvp# arp -an 172.18.0.1 ? (172.18.0.1) at 08:00:27:36:f2:4e [ether] on swp4.100
I can ping the corresponding interface on the second customer router, and I have L2 knowledge of it, as shown via the arp -an command. All of that is via the VXLAN tunnel provided by the other 4 devices. But what about BGP?
root@cust1-r1:/home/jvp# net show bgp sum show bgp ipv4 unicast summary ============================= BGP router identifier 10.100.0.20, local AS number 65230 vrf-id 0 BGP table version 17 RIB entries 27, using 4104 bytes of memory Peers 2, using 39 KiB of memory Peer groups 2, using 128 bytes of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd sw1(swp4.200) 4 65201 2975 2975 0 0 0 02:28:09 12 cust1-r2(swp4.100) 4 65231 2977 2974 0 0 0 02:28:07 12 Total number of neighbors 2
Absolutely. The second peer is shown as cust1-r2, with it’s associated ASN. And am I hearing prefix announcements from it? Well the router’s loopback would be a good test:
root@cust1-r1:/home/jvp# net show route 10.100.0.21 RIB entry for 10.100.0.21 ========================= Routing entry for 10.100.0.21/32 Known via "bgp", distance 20, metric 0, best Last update 02:29:58 ago * 172.18.0.1, via swp4.100 FIB entry for 10.100.0.21 ========================= 10.100.0.21 via 172.18.0.1 dev swp4.100 proto bgp metric 20
Yep. I haven’t written any prefix lists or route maps to limit what BGP routes are sent across the VXLAN tunnel vs what the routers are sending and receiving via the direct uplink. In a real scenario, I’d probably want to do that. But with the lab and limited number of prefixes in play, it doesn’t really matter. Given the AS hops, the customer routers will always choose the switch uplink for their default. So unless something really odd happens with my lab configuration here, I’ll never connect to cust1-r1 via cust1-r2. It’ll always be direct. Likewise, neither router will choose the other to transit.
Doing It Without EVPN
Again, I’m not going to bother going through the configuration changes here. I did make use of my existing EVPN control plane to make this lab work, and yes: I do understand it’s less likely to have an EVPN control plane across your company’s backbone. Hopefully it’s pretty simple to see how setting up a unicast VXLAN tunnel between SW1 and SW2 would provide identical connectivity to the two customer routers. An exercise left to the reader.