Within the earlier MLAG Deep Dive weblog posts we mentioned the innards of a standalone MLAG cluster. Now let’s see what occurs once we join such a cluster to a VXLAN cloth – we’ll use our customary MLAG topology and add a VXLAN transport underlay to it with one other change related to the opposite finish of the underlay community.
A couple of notes earlier than we get to the cumbersome particulars:
- We nonetheless want the peer hyperlink between the MLAG cluster members. Changing the peer hyperlink with a digital hyperlink over the VXLAN cloth is one other fascinating matter that we’ll cope with another time.
- Connecting MLAG clusters in a conventional bridging cloth is boring. Each MLAG cluster appears like a dual-homed host to adjoining clusters.
- Issues much like those described on this weblog submit apply to different transport applied sciences (TRILL, SPBM) or proprietary cloth options, however we gained’t talk about them as a result of these applied sciences aren’t precisely mainstream anymore.
Dynamic MAC Studying Ruins the Day
In a typical VXLAN-based cloth, we’d assign a singular VXLAN tunnel endpoint (VTEP) IP deal with to each leaf change. That method doesn’t work for MLAG clusters if we need to depend on dynamic MAC studying as a substitute of utilizing EVPN management airplane.
Think about host A having a number of TCP or UDP classes with host Z:
- If the host A makes use of the standard 5-tuple hashing algorithm, it should most likely use each uplinks (towards S1 and S2) to ship Ethernet frames to Z.
- Ethernet frames with the identical supply MAC deal with (A) might be encapsulated into VXLAN frames on S1 and S2 and despatched towards Sx.
- Sx may obtain frames from the identical supply MAC deal with coming from two totally different supply VTEPs (S1 and S2). That may completely confuse Sx if it makes use of dynamic MAC studying to construct MAC-to-VTEP mapping tables
Conclusion: S1 and S2 should use the identical supply IP deal with when encapsulating the site visitors coming from a dual-attached host.
Apart: The requirement to make use of an anycast VTEP IP deal with complicates the machine configuration. VTEP IP deal with is often hooked up to a loopback interface, and IP addresses configured on loopback interfaces are generally used to set OSPF or BGP router ID.
Having two routers with the identical router ID in a community operating a link-state routing protocol produces fascinating outcomes. To alleviate that pitfall, distributors often advocate configuring two loopback interfaces, and utilizing a static IGP router ID to make sure the automated course of doesn’t choose the VTEP loopback interface because the supply of router ID.
Again to VXLAN. The usage of anycast VTEP IP deal with has an fascinating facet impact: load balancing works out of the field. S1 and S2 are promoting the identical IP deal with with the identical price, and assuming the VXLAN transport community makes use of a leaf-and-spine topology, site visitors from host Z to host A makes use of each switches (based mostly on the supply UDP port within the VXLAN packets).
Orphan Hosts
Orphan hosts are (as at all times) a supply of ache. Think about host X speaking with host Z:
- The site visitors from X to Z might be encapsulated in VXLAN packets with anycast VTEP supply IP deal with.
- Sx will use the anycast VTEP IP deal with to ship the site visitors from Z to X, and roughly half of these packets will finish on the flawed egress change (S2).
- S2 must use the peer hyperlink to ship the packets obtained over the VXLAN cloth to X.
There’s a easy answer to this conundrum: use a special supply VTEP IP deal with when encapsulating Ethernet packets obtained from orphan hosts. To do this, your ASIC has to assist:
- A number of vacation spot VTEP addresses per change – one for dual-attached hosts, one other one for orphan hosts
- Deciding on the supply VTEP deal with based mostly on the ingress interface.
Taking a look at numerous VXLAN implementations, it looks as if the above necessities aren’t precisely a stroll within the park. Clearly we don’t know what information middle change ASICs can do (thanks one million, Broadcom, NVIDIA and buddies), and individuals who might reply that query should not allowed to, however for those who might say one thing with out violating an NDA signed in blood, or ship me an nameless trace, you’d be most welcome.
Lastly, that is the right second for EVPN pundits to inform me how all the issues I simply described get solved with EVPN multihoming. That’s not precisely true, and we’ll talk about the nuances within the subsequent weblog submit on this sequence.
Flooding Issues
Now that we all know how unicast forwarding works in MLAG clusters related to a VXLAN cloth, let’s see how complicated the flooding of BUM packets is.
When the VXLAN cloth makes use of multicast-based BUM flooding, all egress gadgets listening to the VNI IP multicast deal with obtain all of the flooded site visitors, and the MLAG cluster members may need a enjoyable time deciding who ought to ahead the flooded packets to the multi-homed hosts.
The simplest answer to this problem is to make use of a single MLAG cluster member as a devoted flooder and register the VNI IP multicast deal with solely on that node. The Ethernet frames obtained from the VXLAN cloth could be flooded to all ports on the devoted flooder – together with the peer hyperlink – and the opposite member(s) of the MLAG cluster would use the precise identical procedures they utilized in standalone MLAG cluster.
What about VXLAN materials utilizing ingress replication? That’s a good simpler one. All MLAG cluster members promote the identical anycast VTEP IP deal with, and that deal with is used within the ingress replication lists on all different VTEPs.
When an ingress VTEP sends a replicated BUM packet to the anycast VTEP IP deal with, it’s obtained by a random MLAG cluster members. That change can deal with the flooded packet like it might be coming from an orphan host: flood it to all different ports and the peer hyperlink. The customary MLAG flooding procedures deal with additional flooding.
Extra Info
Watch the VXLAN Deep Dive webinar if you wish to know extra about VXLAN, and EVPN Multihoming part of EVPN Deep Dive webinar if you wish to skip forward and be taught extra about EVPN-based MLAG clusters.
Each webinars can be found with Commonplace ipSpace.web Subscription.
Revision Historical past
- 2022-09-28
- Added flooding concerns