Network Architecture

VXLAN Instead of MPLS – Part 2


In a previous entry, I described my thoughts on using VXLAN to replace MPLS for L2VPNs across a network.  After reading through it, I decided to go back and re-do the configuration examples a bit, to make things a bit more like a backbone with several hops between the customer routers.  This entry will outline those changes and configurations.  It doesn’t change my thoughts on VXLAN vs MPLS.

Changing the Lab Network

Recall my lab network diagram, which looks like this:

In the previous blog entry, I changed SW1 and Router 1 enough so that both the Agg router and SV1 could EBGP-peer with one another via a VXLAN tunnel.  It dawned on me that I did it wrong, and that I was focusing on the wrong section of my little lab network here.  After thinking through it, I decided that SVI and SV2 would be the customer routers.  In fact, the work would go a little like the diagram below:

Once these steps were completed, the network would look more like this:


1. BGP Between Agg and Router 1

This wasn’t difficult; I just had to return the Agg interface and the Router 1 interface back to their original configs from the EVPN blog entry.  A quick edit to both of their respective /etc/frr/frr.conf files as well, and they were once again peered.

2. Break BGP Between Spines and Leafs

The idea here was to stretch the network out a little.  With that, I disabled BGP between Router 1 and Switch 2, and between Router 2 and Switch 1.  Once that was accomplished, each switch only had one upstream router connected to it.

3. Convert SV Interfaces to L3 Sub-Interfaces

As stated previously, I promoted my two server images to routers.  Remember that these are just running Cumulus OS, so it’s an easy config change to make.  The configuration on the new CUST1-R[1,2] devices and the upstream SW[1,2] will be outlined here.  It involved creating two L3 sub-interfaces on each physical link, IP addressing them accordingly, and getting BGP running across each sub-int.

On the Switch

I’ll go through the config on just SW1; the configuration on the other switch is identical, with differences in IP addressing.  I used IP addressing instead of un-numbered because of the assumption that the customer routers aren’t running Cumulus.  First, in /etc/network/interfaces, I had to create 2 L3 sub-ints out of swp6, and then add one of them into VXLAN:

auto swp6
iface swp6

auto swp6.100
iface swp6.100

auto swp6.200
iface swp6.200

auto bridge
iface bridge
    bridge-ports swp6.100 vni10100
    bridge-vids 100
    bridge-vlan-aware yes

As you can see, swp6 now has swp6.100 an swp6.200.  The latter has an IP address on it, and will be the one I use to EBGP-peer with the downstream customer router.  The .100 interface will be added to the bridge interface, which will enable VXLAN to do its thing with it.

The changes to /etc/frr/frr.conf aren’t really massive; I just needed to create a new peer group and put swp6.200 in as an interface peer:

router bgp 65201
    neighbor customer peer-group
    neighbor customer remote-as external
    neighbor swp6.200 interface peer-group customer

On the Customer Router

For the interfaces file on cust1-r1, I simply had to create 2 new L3 sub-interfaces on swp4:

auto swp4
iface swp4

auto swp4.100
iface swp4.100

auto swp4.200
iface swp4.200

You can see that the IP address on swp4.200 matches up with the IP on the upstream switch.  Once networking is restarted, we can ping the switch’s interface:

root@cust1-r1:/home/jvp# ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.352 ms
64 bytes from icmp_seq=2 ttl=64 time=0.301 ms
64 bytes from icmp_seq=3 ttl=64 time=0.242 ms
64 bytes from icmp_seq=4 ttl=64 time=0.377 ms
--- ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.242/0.318/0.377/0.051 ms
root@cust1-r1:/home/jvp# arp -an
? ( at 08:00:27:c1:57:1a [ether] on swp4.200

As for the BGP configuration, it was also pretty simple.  But I had to set up 2 peers: one directly to the upstream switch (leaf), and the other to the second customer router via the VXLAN tunnel:

router bgp 65230
	bgp router-id
	redistribute connected

	neighbor leaf peer-group
	neighbor leaf remote-as external

	neighbor over-vxlan peer-group
	neighbor over-vxlan remote-as external
	neighbor swp4.100 interface peer-group over-vxlan
	neighbor swp4.200 interface peer-group leaf

But after restarting the frr service, I only had 1 EBGP peer.  That’s because the interfaces and BGP configurations hadn’t been changed on Switch 2 and cust1-r2.  Either way, cust1-r1 still had a default route:

root@cust1-r1:/home/jvp# net show route
RIB entry for
Routing entry for
  Known via "bgp", distance 20, metric 0, best
  Last update 02:23:30 ago
  *, via swp4.200

FIB entry for
default via dev swp4.200  proto bgp  metric 20

SW2 and Cust1-R2

Once the same changes were made to Switch 2 and cust1-r2, everything clicked into place.  Again from cust1-r1:

root@cust1-r1:/home/jvp# ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=1.42 ms
64 bytes from icmp_seq=2 ttl=64 time=1.38 ms
64 bytes from icmp_seq=3 ttl=64 time=1.41 ms
64 bytes from icmp_seq=4 ttl=64 time=1.23 ms
64 bytes from icmp_seq=5 ttl=64 time=1.47 ms

--- ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4008ms
rtt min/avg/max/mdev = 1.235/1.386/1.472/0.090 ms
root@cust1-r1:/home/jvp# arp -an
? ( at 08:00:27:36:f2:4e [ether] on swp4.100

I can ping the corresponding interface on the second customer router, and I have L2 knowledge of it, as shown via the arp -an command.  All of that is via the VXLAN tunnel provided by the other 4 devices.  But what about BGP?

root@cust1-r1:/home/jvp# net show bgp sum

show bgp ipv4 unicast summary
BGP router identifier, local AS number 65230 vrf-id 0
BGP table version 17
RIB entries 27, using 4104 bytes of memory
Peers 2, using 39 KiB of memory
Peer groups 2, using 128 bytes of memory

Neighbor           V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
sw1(swp4.200)      4      65201    2975    2975        0    0    0 02:28:09           12
cust1-r2(swp4.100) 4      65231    2977    2974        0    0    0 02:28:07           12

Total number of neighbors 2

Absolutely.  The second peer is shown as cust1-r2, with it’s associated ASN.  And am I hearing prefix announcements from it?  Well the router’s loopback would be a good test:

root@cust1-r1:/home/jvp# net show route
RIB entry for
Routing entry for
  Known via "bgp", distance 20, metric 0, best
  Last update 02:29:58 ago
  *, via swp4.100

FIB entry for
========================= via dev swp4.100  proto bgp  metric 20

Yep.  I haven’t written any prefix lists or route maps to limit what BGP routes are sent across the VXLAN tunnel vs what the routers are sending and receiving via the direct uplink.  In a real scenario, I’d probably want to do that.  But with the lab and limited number of prefixes in play, it doesn’t really matter.  Given the AS hops, the customer routers will always choose the switch uplink for their default.  So unless something really odd happens with my lab configuration here, I’ll never connect to cust1-r1 via cust1-r2.  It’ll always be direct.  Likewise, neither router will choose the other to transit.

Doing It Without EVPN

Again, I’m not going to bother going through the configuration changes here.  I did make use of my existing EVPN control plane to make this lab work, and yes: I do understand it’s less likely to have an EVPN control plane across your company’s backbone.  Hopefully it’s pretty simple to see how setting up a unicast VXLAN tunnel between SW1 and SW2 would provide identical connectivity to the two customer routers.  An exercise left to the reader.


Leave a Reply