Monday, June 24, 2013

Regarding scale-out network virtualization in the enterprise data center

There's been quite a lot of discussion regarding the benefits of scale-out network virtualization.   In this blog I present some additional thoughts to ponder regarding the value of network virtualization in the enterprise DC.  As with any technology options, the question that enterprise network operators need to ask themselves regarding scale-out network virtualization is whether it is the right solution to the problems they need to address.

To know whether scale-out network virtualization in the enterprise DC is the answer to the problem, we need to understand the problem in a holistic sense.  Let's set aside our desire to recreate the networks of the past (vlans, subnets, etc, etc) in a new virtual space, and with an open mind ask ourselves some basic questions.

Clearly at a high level, enterprises wish to reduce costs and increase business agility.  To reduce costs it's pretty obvious that enterprises need to maximize asset utilization.   But what specific changes should enterprise IT bring about to maximize asset utilization and bring about safe business agility?  This question ought to be answered in the context of the full sum of changes to the IT model necessary to gain all the benefits of the scale-out cloud.

Should agility come in the form of PaaS or stop short at Iaas?  Should the individual machine matter?  Should a service be tightly coupled with an instance of a machine or rather should the service exist as data and application that is independent of a machine (physical or virtual)?

In a scale-out cloud, the platform [software] infrastructure directs the spin up and down of  application instances relative to demand.  The platform infrastructure also spins up application capacity when capacity is lost due to physical failures.  Furthermore, the platform software infrastructure ensures that services are only provided to authorized applications and users and secured as required by data policy.  VLANs, subnets and IP addresses don't matter to scale-out cloud applications.  Clearly network virtualization isn't a requirement for a well designed scale-out PaaS cloud.  (Multi-tenant IaaS clouds do have a very real need for scale-out network virtualization)***

So why does scale-out network virtualization matter to the "single-tenant" enterprise?  Here's two reasons why I believe enterprises might believe they need it, and two reasons why I think maybe they don't need it for those reasons.

Network virtualization for VM migration.

The problem in enterprises is that a dynamic platform layer such as I describe above isn't quite achievable yet because, unlike the Google's of the world, enterprises generally purchase most of their software from third parties and have legacy software that does not conform to any common platform controls.  Many of the applications that enterprises use maintain complex state in memory that if lost can be disruptive to critical business services.  Hence, the closest an enterprise can do these days to attain cloud status, is IaaS -- i.e. take your physical machines and turn them into virtual machines.  Given this dependence on third party applications, dynamic bursting and that sort of true cloudy capabilities aren't universally applicable in the back end of an enterprise DC.

The popularity of vmotion in the enterprise is testament to the current need to preserve the running state of applications.  VM migration is primarily used for two reasons -- (1) to improve overall performance by non-disruptively redistributing running VMs to even out loads and (2) to non-disruptively move virtual machines away from physical equipment that is scheduled for maintenance.  This is different from scale-out cloud applications where virtual machines would not be moved, but rather new service instances spun up and others spun down to address both cases.

We all know that for vmotion to work, the migrated VM needs to retain the same IP and MAC address of it's prior self.  Clearly if the VM migration were limited to only a subset of the available compute assets this will lead to underutilization and hence higher costs.  If a VM should be migrated to any available compute node (assuming again that retaining IP and MAC is a requirement), the requirement would appear to be scale-out network virtualization.

Network virtualization for maintaining traditional security constructs.

As I mentioned before, a scale-out PaaS cloud enforces application security beginning with a trusted registration process.  Some platforms require the registration of schemas that application are then forced to conform to when communicating with another application.  This isn't a practical option for consumers of off-the-shelf applications.  But clearly, not enforcing some measure of expected behavior between endpoints isn't a very safe bet either.

The classical approach to network security has been to create subnets and place firewalls at the top of them.  A driving reason for this design is that implementing security at the end-station wasn't considered very secure since an application vulnerability could allow malware on a compromised system to override local security.  This drove the need for security to be imposed at the nearest point in the network that was less likely to be compromised rather than at the end station.

When traditional firewall based security is coupled with traditional LANs, a virtual machine is limited to only the compute nodes that are physically connected to that LAN and so we end up with the underutilization of the available compute assets that are on other LANs.  However if rather than traditional LANs, we instead use scale-out LAN virtualization, then the firewall (physical or virtual) could be wherever the firewall is, and the VMs that the firewall secures can be on any compute node.  Nice.

So it seems we need scale-out network virtualization for vmotion and security...

Actually we don't -- not if we don't have a need for non-IP protocols.  Contrary to what some folks might believe, VM migration doesn't require that a VM remains in the same subnet -- it requires that a VM retains it's IP and MAC address which is easily done using host-specific routes (IPv4 /32 or IPv6 /128 routes).  Using host-specific routing a VM migration would require that the new hypervisor advertise the VM's host route (initially with a lower metric) and the original hypervisor withdraw it when the VM is suspended.

So now that we don't need a scale-out virtual LAN for vmotion, that leaves the matter of the firewall.  The ideal place to implement security is at the north and south perimeters of your network.  As I mentioned earlier security inside the host (the true south) can be defeated and hence the subnet firewall (the compromise).  But with the advent of the hypervisor, there is now a trusted security enforcement point at the edge.  We can now implement firewall security right at the vNIC of the VM (the "south" perimeter).  When coupled with perimeter security at the points where communication lines connect your DC to the outside world (the "north" perimeter), you don't need scale-out virtual LANs to support traditional subnet firewalls either.  It's debatable whether additional firewall security is required at intermediate points between these two secured perimeters -- my view is that they are not unless you have a low degree of confidence in your security operations.  There is a tradeoff to intermediate security which comes at the expense of reduced bandwidth efficiency, increased complexity and higher costs to name a few.

The use of host-specific routing combined with firewall security applied at the vNIC is evident in LAN virtualization solutions that support distributed integrated routing and bridging capability (aka IRB or distributed inter-subnet routing).  The only way to perform fully distributed shortest-path routing with free and flexible VM placement, is to use host-based routing.  The dilemma then is where to impose firewall security -- at the vNIC of course!!

Although we don't absolutely need network virtualization for either VM migration or to support traditional subnet firewalls, there is one really good problem that overlay based networking helps with, and that is scaling.  Merchant silicon and other lower priced switches don't support a lot of hardware forwarding entries.  This means that your core network fabric might not have enough room in it's hardware forwarding tables to support a very large number of VMs.  Overlays solve this issue by only requiring the core network fabric to support about as many forwarding entries as there are switches in the fabric (assuming one access subnet per leaf switch).  However even in this case per my reasoning in the prior three paragraphs, for a single-tenant enterprise the overlay network should only need to support a single tenant instance and hence would be used for dealing with the scaling limitations of hardware and not for network virtualization.  

Building a network-virtualization-free enterprise IaaS cloud.

There's probably a couple of ways to achieve vmotion and segmentation without network virtualization and with very little bridging.  Below is one way to do this using only IP.  The following approach does not leverage overlays and so each fabric can only support as many hosts as the size of the hardware switch L3 forwarding table.

(1) Build an IP-based core network fabric using switches that have adequate memory and compute power to process fabric-local host routes.  The switch L3 forwarding table size reflects the number of VMs you intend to support in a single instance of this fabric design.  Host routes should only be carried in BGP.  You can use the classic BGP-IGP design or for a BGP-only fabric you might consider draft-lapukhov-bgp-routing-large-dc.  Assign the fabric a prefix that is large enough to provide addresses to all the VMs you expect your fabric to support.  This fabric prefix will be advertised out of and a default route advertised in to the fabric for inter-fabric and other external communication.
(2) Put a high performance virtual router in your hypervisor image that will get a network facing IP via DHCP and is scripted to automatically BGP peer with it's default gateway which will be the ToR switch.  The ToR switch should be configured to accept incoming BGP connections from any vrouter that is on it's local subnet.  The vrouter will advertise host routes of local VMs via BGP and for outbound forwarding will use it's default route to the ToR.
(3) To bring up a VM on a hypervisor, your CMS should create an unnumbered interface on the vrouter, attach the vNIC of the VM to it and create a host route to the VM which should be advertised via BGP.  The reverse should happen when the VM is removed.  This concludes the forwarding aspect of the design.
(4) This next step handles the firewall aspect of the design.  Use a bump-in-the-wire firewall like Juniper's vGW to perform targeted class-based security at the vNIC level.  If you prefer to apply ACLs on the VM facing port of the vrouter, then you should carve out prefixes for different roles from the fabric's assigned address space to make writing the ACLs a bit easier.

Newer hardware switches support 64K and higher L3 forwarding entries and also come with more than enough compute and memory to handle the task so it's reasonable to achieve upward of 32K VMs per fabric.  Further scaling is achieved by having multiple of these fabrics (each with their own dedicated maskable address block) interconnected via an inter-connect fabric, however VM migration should be limited to a single fabric.  But if you prefer to go with the overlay approach to achieve the greater scaling, replace BGP to the ToR with MP-BGP to two fabric route reflectors for VPNv4 route exchange.  When provisioning a VM-facing interface on the vrouter you'll need to place it into a VRF and import/export a route target.

I've left out some details for the network engineers among you to exercise your creativity (and avoid making this blog as long as my first one) -- why should Openstack hackers have all the fun? :)

Btw, native multicast should work just fine using this approach. Best of all, you can't be locked in to a single vendor.

If you believe you need scale-out overlay network virtualization consider using one that is based on an open standard such as IPVPN or E-VPN.  The latter does not require MPLS as some might believe and supports efficient inter-subnet routing via this draft which I believe will be accepted as a standard eventually.   Both support native multicast and both are or will be supported by at least three or more vendors eventually with full inter-operability.  I'm hopeful that my friends in the centralized camp will some day see the value of also using and contributing to open control-plane and management-plane standards.

Sunday, June 9, 2013

Angry SDN hipsters.

Some folks seem to get a little too hung up on one philosophy or another -- too blind to see good in any other form except the notions that have evolved in their mind.  I'm hoping I'm not one of them.  I do have opinions, but which I believe are rational.

The counter culture of networking waves the SDN banner.  That acronym seems to belong to them.  They don't know what it stands for yet, but one thing they seem to be sure of is that nothing good can come by allowing networking innovations to evolve or even to exist in their birthplace.

The way I see evolving the network fabric is through improving on the best of the past.  Every profession I know from medicine, finance, law, mathematics, physics, you name it -- all of them are building their tomorrow on a mountain of past knowledge and experience.  So I'm sure my feeling the same about the network doesn't make me outdated, just maybe not a fashionable SDN hipster.

Some angry SDN hipsters say that the core network needs to be dumbed down.  They must have had a "bad childhood," technically speaking.  One too many Cisco 6500's stuffed with firewalls, load balancers and other things that didn't belong there.  Maybe even a few with a computer or two crammed into them.  I'm not sure I can feel sorry for you if that was your experience.  Maybe you didn't realize that was a bad idea until it was too late.  Maybe you were too naive and didn't know how to strike the right balance in your core network.  Whatever it was, I can assure you that your experience isn't universal, and neither is your opinions about how tomorrow should or shouldn't be.

Those who couldn't figure out how to manage complexity yesterday won't be able to figure it out tomorrow.  Tomorrow will come and soon become yesterday and they'll still be out there searching.  Endlessly.  Never realizing that the problem wasn't so much the network, it was them and the next big company that they put their trust in.

I had a great experience building great networks.  I stayed away from companies that didn't give me what I needed to get the job done right.  The network was a heck of a lot easier to manage than computers in my day, and the technology has kept pace in almost every aspect.  You see Amazon and Google aren't the only ones that can build great infrastructure.  And some of us don't need help from VMWare thank you.

So mister angry SDN hipster, do us all a favor and don't keep proposing to throw the baby out with the bath water.  We know your pain and see your vision too, but ours might not be so narrow.

Monday, June 3, 2013

Straight talk about the enterprise data center network.

Building a mission critical system requires the careful selection of constituent technologies and solutions based on their ability to support the goals of the overall system.   We do this regardless of whether we subscribe to one approach or another of building such a system.

It is well known that the network technologies commonly used today to build a data center network have not adequately met the needs of applications and operators.  However, what is being drowned out in the media storm around "SDN" is that many of the current day challenges in the data center network can be addressed within an existing and familiar toolkit.  The vision of SDN should be beyond where today’s data center networks could have been yesterday.

In this "treatise" I highlight what I believe has been lacking in todays data center core network toolkit as well as address certain misconceptions.  I'll begin by listing key aspects of a robust network, followed by perspectives on each.  

A robust network should be evolved in all of the following aspects:
  1. Modularity - freedom to choose component solutions based on factors, such as bandwidth, latency, ports, cost, serviceability, etc.,  This generally requires the network to be solution and vendor-neutral as no single solution or vendor satisfies all requirements.  Management and control-plane applications are not excluded from this requirement.
  2. Automation - promotes the definition of robust network services, automated instantiation of these network services, full cycle management of network services and of other physical and logical network entities, and api-based integration into larger service offerings.
  3. Operations - functional simplicity and transparency (not a complicated black box), ease of finding engineering and operations talent and ease of building or buying robust software to transparently operate it.
  4. Flexibility - any port (physical or virtual) should support any network service (“tenancy”, service policy, etc).  This property implies that the network can support multiple coexisting services while still meeting end user performance and other experience expectations.
  5. Scalability - adding capacity (bandwidth, ports, etc) to the network should be trivial and achievable without incremental risk. 
  6. Availability - through survivability, rapid self-healing and unsusceptibility.
  7. Connectivity - in the form of a simple, robust and consistent way to federate network fabrics and internetwork. 
  8. Cost -- since inflated costs inhibit innovation.
This blog post is about the location-based connective fabric over which higher-layer network services and applications operate.  I might talk about "chaining in" conversation-based network services (such as stateful firewalling, traffic monitoring, load-balancing, etc) another time.

On modularity.

Real modularity frees you to select components for your system that best meet it's requirements, without the constraint of unnecessary lock in.  In today’s network, the part of the network that can make or break this form of modularity are generally the control-plane and data-plane protocols.

Network protocols are like language.  Proprietary protocols have a similar impact to networking as languages to civilization, which is that language silos hinder a connected world from forming.  It took the global acceptance of English to enable globalization -- for the world to be more accessible and opportunities not constrained by language.

The alphabet soup of proprietary and half-way control-plane protocols we’ve had forced on us has resulted in mind-bending complexity and has become a drag on the overall advancement of the network.  Each vendor claims their proprietary protocol is better than the other guys, but we all know that proprietary protocols are primarily a tool to lock customers out of choice. 

Based on evidence, it’s reasonable to believe that robust open protocols and consistent robust implementations of these would address the modularity requirement.  We can see this success in the data-plane protocols of TCP/IP and basic Ethernet.

Many folks see the world of network standards as a pile of RFCs related to BGP, IGPs, and other network information exchange protocols.  What they don’t see is that many RFCs actually describe network applications (sound familiar?).  For example, the IETF RFC 4364 ("BGP/MPLS IP Virtual Private Networks") describes the procedures and data exchange required to implement a network application used for creating IP-based virtual private networks.  It describes how to do this in a distributed way using MPBGP for NLRI exchange (control-plane) and MPLS for data plane -- in other words it does not attempt to reinvent what it can readily use.  Likewise there are RFCs that describe other applications such as Ethernet VPNs, pseudo-wire service, traffic engineering, etc.   

Openflow extends standardization to the API of the data plane elements, but it is still only a single chapter in the modularity story.  Putting a proprietary network application on top of Openflow is damaging to Openflow's goal of openness since a system is only as open as the least open part of it.  

The modularity of any closed system eventually breaks down.

On automation.

Achieving good network automation has been more of a challenge for some operators than for others.  If you haven't crossed this mountain then it's hard, but if you're over the top already then it's a lot easier.  Based on my experience the challenges with automating today’s networks are concentrated in a couple of places.
  1. Low-level configuration schema rather than service-oriented schema.  This is fairly obvious when you look at configuration schema of common network operating systems.  To pull together a network service such as an IPVPN, many operators need a good "cookbook" that provides a "recipe" which describes how to combine a little bit of these statements with a little bit of those seemingly unrelated statements, and that also hopefully warns you of hidden dangers in often undocumented default settings. 
  2. Different configuration languages to describe the same service and service parameters.  The problem also extends to the presentation of status and performance information.  There are very large multi-vendor networks today that seamlessly support L2 and L3 VPNs, QoS, and all that other good stuff across the network -- this is made possible by the use of common control and data plane protocols.  However it is quite a challenge to build a provisioning library that has to speak in multiple tongues (ex: IOS, JunOS, etc), and keep up with changes in schema (or presentation).  This problem highlights the need for not only common data plane and control plane languages, but also for common management plane languages.
  3. Inability to independently manage one instance of a service from another.  An ideal implementation would allow configuration data related to one service instance to be tagged so that services can be easily retired or modified without affecting other services that may depend on the same underlying resources.  Even where Openflow is involved, Openflow rules that are common to different service instances need to be managed in a manner that prevents the removal of these common rules when an instance of a service that shares that common rule is removed -- with Openflow, this forces the need for a central all-knowing controller to compile the vectors for the entire fabric and communicate the diffs to data plane elements.
  4. Lack of robust messaging toolkit to capture and forward meaningful event data.  Significant efficiencies can be achieved when network devices capture events and deliver them over a reliable messaging service to a provisioning or monitoring system (syslog does not qualify).  For example [simplistically] if a switch could capture LLDP information regarding a neighbor and send that to a provisioning station along with information about itself then the provisioning system could autonomically configure the points of service without the need for an operator to drive the provisioning.
  5. Inability to atomically commit all the statements related to the creation, modification, deletion and rollback of a service instance as a single transaction.  Some vendors have done a better job than others in this area.
  6. Excessively long commit times (on platforms where commits are supported).  Commits should take milliseconds and not 10’s of seconds to be considered API quality.
Interestingly the existence of these issues gives impetus to the belief by some that Openflow will address them by effectively getting rid of them.  Openflow is to today's CLI as Assembly Language is to Python.  With all the proprietary extensions tacked on to the side of most implementations of Openflow the situation is hardly any better than crummy old CLI, except now you need a couple of great software developers with academic levels of network knowledge and experience (a rare breed) to engineer a network.

On the other hand, for the sake of automation, buying a proprietary centralized network application that uses Openflow to speak to data plane elements isn't necessarily an ideal choice either.  These proprietary network applications may support a northbound interface based on a common service schema (management plane) and issue Openflow-based directives to data plane elements, but implement proprietary procedures internally -- a black box.  

On operations.

Relying on hidden procedures that are at the heart of the connectivity of your enterprise isn't, in my opinion, a good thing.  Today's network professionals believe that control-plane and data-plane transparency are essential to running a mission critical network since that transparency is essential to identifying, containing and resolving issues.  The management plane, on the other hand, is considered important for the rapid enablement of services but, in most cases, is less of a concern in relation to business continuity.  Some perspectives on SDN espouse fixing of issues at the management plane at the expense of disrupting and obscuring the control plane.  Indeed some new SDN products don’t even use Openflow.

Vendors and others that believe that enterprises are better off not understanding the glue that keeps their system together are mistaken.  Banks need bankers, justice needs lawyers, the army needs generals and mission critical networks need network professionals.  One could define APIs to drive any of these areas and have programmers write the software, but lack of domain expertise begs failure.  

When I first read the TRILL specification I was pretty baffled.  The network industry had already created most of the basic building blocks needed to put STP-based protocols out of their miserable existence and yet it needed to invent another protocol that would bring us only half way to where we needed the DC network to be.  The first thing that crossed my mind when I came out of the sleep induced by reading this document was regarding the challenge of managing all the different kinds of wheels created by endless reinvention.  Trying to holistically run a system with wheels of different sizes and shapes all trying to do effectively the same thing yet in different ways, is mind numbing and counter-productive.  My reaction was to do something about this -- hence E-VPN, which enables scalable Ethernet DCVPN, based on an existing proven wheel.

Domain expertise is built on transparency, and success is proportional to how non-repetitive, minimal, structured and technically sound your choice of technologies are -- and you'll be better off if your technologists can take the reins when things are heading south.

On flexibility.

Anyone who has built a large data center network in the past 15 years is familiar with the tradeoffs imposed by spanning tree protocols and Ethernet flooding.  The tradeoff for a network that optimized for physical space (i.e. any server anywhere) was a larger fault domain while the tradeoff for networks that optimized for high availability was restrictions on the placement of systems (or a spaghetti of patches).  When the DC started to get filled up things got ugly.  Things were not much better with regard to multi-tenancy at the IP layer of the data center network either.

As I mentioned before, some of the same vendors in the DC network space had already created the building blocks for constructing scalable and flexible networks for service providers.  But they kept these technologies out of the hands of DC network operators -- the DC business made profits on volume while the WAN business made its dollars on premiums for feature-rich equipment.  If network vendors introduced lower cost DC equipment with the same features as the WAN equipment they risked harming their premium WAN business.  Often the vendor teams that engineered DC equipment were different from those that engineered WAN equipment and they did not always work well together, if at all.

WAN technology such as MPLS was advertised as being too complex.  Having built both large WAN and DC networks I'll admit that I've had a harder time building a good DC network than building a very large and flexible MPLS WAN.  The DC network often “got me”, while the WAN technology was far more predictable.  But instead of giving us flexible DCVPN based on robust, scalable and established technologies we instead were given proprietary flavors of TRILL for the DC.  The DC network was essentially turf protected with walls made of substandard proprietary protocols.  The good news is that all that is changing -- our vendors have known for quite some time that WAN technology is indeed good for the DC and artificial lines need not be drawn between the two.

A flexible DC network allows any service to be enabled on any network port (physical or virtual).  One could opt to achieve this flexibility using network software running in hypervisors under the control of centralized SDN controllers.  This model might be fine for some environments where compute virtualization is pervasive and if good techniques to reduce risk are employed.  Most enterprise DC environments on the other hand will continue to have “bare metal” servers which will be networked over physical ports.  "Physicalization" remains strong and certainly PaaS in an enterprise DC does not necessarily require a conventional hypervisor.  Some environments may even need to facilitate the extension of a virtual network to other local or remote devices where it will not be possible to impose a hypervisor.  Ideally the DC network would not need to be partitioned into parts that are interconnected by specialized gateway choke points.  The goal of reducing network complexity and increasing flexibility can't be achieved without eliminating gateways and other service nodes where they only get in the way.

On scalability.

The glue that holds together a network is the control-plane of the network -- it’s job is to efficiently and survivably distribute information about the position and context of destinations on a network and determine the best paths via which to reach them.  The larger a network the more the details of the control plane matter.  As I alluded to before, spanning tree protocols showed up on the stage with seams already unraveled (this is why smart service providers avoid it like the plague).

The choice of control plane matters significantly in the scaling of a network that needs to provide seamless end-to-end services.  A good network control plane combined with its proper implementation enables the efficient and reliable distribution of forwarding information from the largest and most expensive equipment on the network to the smallest and cheapest ones.

A good control plane takes a divide and conquer approach to scaling.  As an example of this, given a set of possible paths to a destination, BGP speakers only advertise to their peers the best paths they have chosen based on their position in the network.  This approach avoids the need for the sender to transmit more data than necessary and the receiver to have to store and process more of it.  Another scaling technique that is used by some BGP-based network applications is to have network nodes explicitly subscribe to the routing information that are relavent to them.  Scaling features are indeed available in good standards-based distributed control planes and not unique to proprietary centralized ones.

One could argue that a central controller can do a better job since it has full visibility into all the information in the system.   However a central controller can only support a domain of a limited size, more or less the size of what an instance of an IGP can support.  The benefits therefore are tempered when building a seamless network of significant scale since to do this necessitates multiple controller domains that share summarized information with each other.  As you step back, you begin to see why [scalable and open] network protocols matter.

In addition to the network control plane, open standards have allowed further scaling in other ways, such as by enabling one service domain to be cleanly layered over another service provider domain without customer and service provider having to know the details of the layer above or below.

There are other properties that tend to be inherent in scalable network technology and in their implementations.  Bandwidth scaling, for example, (1) avoids the need for traffic to move back and forth across the network as it makes it’s way between two endpoints, (2) avoids multiple copies of the same packet on any single link of the network (whether intra or inter-subnet), and (3) makes it possible for different traffic types to share the same physical wires fairly with minimal state and complexity.  Scalable network technology is also fairly easy to understand and debug (but I said that already).

On availability.

In todays networks, each data plane node is co-resident with it's own personal control-plane entity within the same sheet metal packaging.  However the communication between that control-plane entity and the coupled data plane entity is opaque.  Some operators reduce the size of each box to reduce the unknowns related to the failure of any single box.   The complexity inside the box doesn't matter so much since the operator is able to reduce the impact of a failure of the box.  What he can see is the control-plane dialog between boxes which he understands.  Now imagine that box is the size of your DC network, and the "inside" is still opaque.  

In my experience the biggest risk to high availability is software.  The quality of software tends to vary as the companies that produce them go through shifts and changes.  Many operators have been on the receiving end of this dynamic lately.  Even software that is intended to improve availability tends to be a factor in reducing it.  In order to minimize the chance of getting hit hard by a software bug, many operators deploy new software into their network in phases (after first testing in their lab hopefully).  Any approach that requires an operator to cross his fingers and upgrade large chunks of the network or god forbid the whole thing is probably not suitable for a mission critical system.  In my opinion, it is foolish to trust any vendor's software to be bug free.  Building a mission critical system involves reducing software fault domains as much as it does reducing other single points of failure.

The most resilient networks would have two separate network "rails" with hosts that straddle both rails and software on the hosts that know how to dynamically shift away from problems on one or the other network.   For the highest availability, the software running each network would be different from the other since then they will not be subject to the same bugs.

On connectivity.

In order to scale out a network and reduce risks, we would most likely divide the network into smaller control-plane domains (lets call them fabrics) and then inter-connect these fabrics.  There are two main ways we could go about federating these fabrics to create a fabric of fabrics.

Gateways  -- We might choose to interconnect the fabrics using gateways, but depending on the gateway technology, the use of gateways can tend to damage the transparency between two communicating endpoints and/or they create artificial choke points and burdensome complexity in the network.   Gateways may be the way to go for public clouds for tenants to access their virtual networks, but in large enterprise data centers, fabric domains will more often be created for HA purposes and not for creating virtual network silos with high data-plane walls.

Seamless -- In the ideal world, we should be able to create virtual networks that are not constrained to a single fabric.  In order to achieve this, distributed controllers need to federate with each other using a common paradigm (let's say L2 VPN) and ideally using a common protocol language.  This seems to take us back to the need for a solid distributed control plane protocol (we can't seem to get away from it!).

If seamless connectivity is not to be limited to a single fabric, then a good protocol will be the connectivity maker.  Then it begs the question why is a good protocol not good enough for the fabric itself?  Hmm...

On cost.

Health care in the United States is one of the most expensive in the world, but if I was in need of serious medical attention I wouldn't want to be anywhere else.  If health care were free innovation would probably cease.  Once patents expire and generics hit the market access to life-saving medication becomes commoditized and more accessible.  Similarly when Broadcom brought Trident to the market and Intel came with Alta, building good data center networks became accessible to folks that weren't in big margin businesses.  However the commodity silicon vendors of the world aren't at the head of the innovation curve.  Innovation isn't cheap, but it does need to be fair to both the innovator and the consumer.

The truth about network hardware cost is that it depends on what you need and which point on the adoption curve you are at.  The other truth about cost is that customers have a role in driving down costs through the choices we make.  As an example of this -- if I believe that routers should have 100GE coherent DWDM interface cards for DCI, then I put my dollars on the first vendor that brings a good implementation to market -- I resist alternatives since it defeats my goal of making the market.  There may be a higher price to being a first mover, but once a competitive market is established prices will fall and we're all the better for it.  Competition flourishes when sellers know they can be easily displaced -- hence again standards.  The alternative is to spend your dollars on run of the mill technology and cry about it.

Fortunately, most data centers don't require really fancy technology as many vendors might have you believe (unless you love fiber channel).  The problem is that no single vendor seems to have the right combination of technology that would make for an ideal network fabric.  The network equipment we buy has a ton of features we probably didn't need (like DCB) yet pay dearly for and not enough of the stuff that we could really have used.  In future blogs I hope to outline a small set of building block technologies that I believe would enable simple, cheap and scalable data center fabrics.  It just might be possible for DC network operators to have their cake and eat it too.

In conclusion.

One of the greatest value of the SDN “gold rush” to the enterprise DC network is not necessarily in the change of hands to a brand new way of networking, but in how it has shone a light on it's current dysfunction.  SDN is giving network vendors a "run for the money" that will result in them closing the long-standing gaps and bringing great standards-based technologies into the data center network that they've unnecessarily kept out of the data center.

There is more to be done beyond using the fear of SDN to ensure that network vendors are making the best decisions on behalf of the rest of us -- decisions based on customer needs versus on vendor politics and pure self-interest.  The way standards bodies work today is driven more by the self-interest of each of it’s members then by what would be best for customers.  On the other hand, it was technology buyers that drove the creation of standards bodies because of how much worse it was prior to their existence.  Costs can become unconstrained without competition, and once a closed solution is established, extracting it can be a long, costly and perilous journey.

Good standards bring structure to the picture, each good standard bringing one more foundation on which other systems can be efficiently built.  When control and data plane are standards-based and properly layered then an Internet emerges.  When the management plane is standardized it will reach more people faster.

New and innovative network control applications indeed have value and create new out-of-the box thinking that are good for the industry.  On the other hand, that conversation should not be a reason for holding back progress along proven lines.  Fixing the problems with todays data center networks should not require giving the boot to all of thirty years of packet network evolution.  

The market is agitating for network vendors to bring us solutions that work for us, not just for them.  The message is loud and clear and the vendors that are listening will be successful.  Vendors and customers that buy too much into the hype will lose big time.  Vendors that don't respond to the challenge at hand will also find themselves heading towards the exit.  

As much as it's been a bit tiring to search for gems of truth amidst the noise, it's also an exciting time for the data center network.

My advise to buyers and implementors?  Choose wisely.