Comments on The Open Fabric: Regarding scale-out network virtualization in the enterprise data center

Hey Greg, Thanks for your comments. What Micros...

2013-11-19T07:54:26.092-08:00

Hey Greg, Thanks for your comments.

What Microsoft is doing with their overlay extends to many thousands of hypervisors (millions of VMs) with a custom smart controller to manage the overlay. The simple approach that I outline does not require a "central" controller (safe from controller bugs) and fully supports robust "in fabric" multicast replication. Since the approach is modular it is very scalable but, as I mention, is limited to 10s of thousands of VMs per stub fabric and host addresses are not mobile beyond the stub fabric instance. For environments that do not depend on out-of-fabric address mobility, approaches such as this work very well for limited cost, risk and complexity.

Btw, E-VPN and IPVPN reduce the need for large RAM and FIB using RT-Constrain. Additionally, some vendors further optimize by reactively pushing down forwarding vectors into the FIB when the vector is required (similar to Openflow controller reactive mode).

The model you describe doesn't work very well ...

2013-11-18T11:39:40.533-08:00

The model you describe doesn't work very well because of memory limitations in the hardware devices. The solution that Microsoft deployed ( which drove the design that Petr put together in draft-lapukhov-bgp-routing-large-dc ) is intended to support a massive scale overlay network using NVGRE.

The problem with E-VPN, MPLS and BGP is the strict finite limits on forwarding tables in hardware devices which have strict limits and exponential cost increases.

For this reason, all the vendors are using software overlays, including Cisco and Juniper.

The second reason is that virtualization network using overlays create a system that can console cheaper commodity devices.

So, no, scaling as you propose doesn't seem to be the way forward. It's possible that hardware solutions will arrive in the years ahead but there are no plans in networking to use them in the next 5 years at least.

Hey Kris, sorry about this late response and thank...

2013-09-26T19:55:58.691-07:00

Hey Kris, sorry about this late response and thanks for your comments. When I say "centralized camp", I'm referring to solutions where the network has complete dependance on a centralized system (includes controller clusters with strong consistency requirement) that compile low level match-action rules for the network elements. In the case of RR and some of the other centralized route server models, they either do simple route reflection (i.e. they are not the source of truth) or use stateless route policy to modify attributes of route update messages.

E-VPN would make an elegant solution. Generally wh...

2013-08-25T15:53:39.726-07:00

E-VPN would make an elegant solution. Generally what comes out of a standards WG will be more flexible and extensible. But it does take longer to get there, and it may accumulate stuff you don't want along the way.

Centralizing things isn't automatically bad. A BGP RR is centralized. A route-server in an IX is centralized. A TX Matrix is a centralized RE.

The "centralized camp" in this case might be thinking: I know the address and VPN membership of all my VMs. This only changes when I create, move, or destroy a VM. What's going to be the simplest and easiest to troubleshoot approach to keep that state synchronized across my VM hosts?

There are numerous answers that smart software engineers can come up with, and some of them are also very elegant. The big difference between the approaches as I see it is with how closely VM orchestration and VM networking is coupled. In the E-VPN scenario nothing stops you from having multiple orchestration systems right down to manual configuration and baremetal all interworking. Of course this can be done with the centralised approaches too, just by building a BGP process into the system to talk with other systems.