In 2012, my friend Parth Shah (no relation, but we are brothers) was trying to teach me about “the enterprise.” I mean, I had some idea about it, particularly on the data side, but we started composing this post for non-technical people to better understand. After much work, we’re proud to finally release our co-authored version of that today — see below. The spirit of this document is to be living (so please suggest any changes or additions), and is geared as a basic primer for anyone interested in enterprise IT but who isn’t natively familiar with that landscape. In other words, if you are already an expert in enterprise IT, this won’t be of use to you.
by Parth Shah and Semil Shah
For people interested in technical entrepreneurship, the phrase “the enterprise” can feel foreign, especially for those more steeped in consumer products and services and/or those who do not possess technical backgrounds. Yet, we have found that many people have a desire to learn more about enterprise-level technologies, and to that effect, we have collected and organized this work to be a continually improving resource in lay terms for people to begin learning.
THE INFRASTRUCTURE LAYER
This section will cover datacenter storage; networking; and compute hardware and software. Traditional storage players like EMC and Netapp face two major threats to conventional storage models: Distributed storage; and flash.
Traditional enterprise storage is centralized, where multiple servers connect to one large storage device (usually a disk array) through a specialized network (usually the storage-area network, or SAN). This is a very reliable setup, providing features like backup, fault-tolerance, deduplication, snapshot, etc. built right into the storage device. The shift in this market, however, is toward a Google-style model where commodity (read: cheap) storage devices are directly attached to servers instead of being centralized. This shift is driven by the next generation of Internet-style enterprise apps that are “scale-out” in nature. The opportunity here is to make distributed storage as reliable and feature-rich as traditional, centralized enterprise-grade storage while offering the cost-savings afforded by using cheaper commodity hardware. [Reference: (1) see Nutanix’s article on evolution of datacenters; see Nutanix CEO Dheeraj Pandey with Semil on “In The Studio”; (3) see Nutanix investor and Lightspeed Partner Bipul Sinha with Semil on “In The Studio”; and (4) see Nebula CEO Chris Kemp with Semil on “In The Studio.”]
Traditional storage is also facing a threat from startups focused on pure flash storage, such as Nexenta, Pure Storage, etc. Flash-based storage devices are orders of magnitude faster when compared to spinning disks. Most of the current innovation in and around flash is in trying to overcome other constraints such as costs, life cycle, form factor, etc. Just like distributed storage, with flash there is an opportunity to provide enterprise-grade features like deduplication, snapshotting, backup, data recovery, etc. for pure flash-based storage devices. The shift to flash poses a challenge in that most of the technology to invent these features were written for devices with spinning disks. [PureStorage creates the software to manage flash storage, provides similar features as Netapp, and EMC-level enterprise features. Nutanix is poised well to combine the power of distributed storage and flash. Other flash device vendors like Fusion IO, Violin Memory, Samsung, and Hitachi are also in a good position.]
The networking industry is currently dominated by the likes of Cisco and Juniper, among others. The biggest shift shaking up this industry right now is software-defined networking, or SDN. The traditional datacenter networking provided by a company like Cisco, for instance, carries too much vendor lock-in with it, as well as expensive devices, proprietary networking protocols, and less flexibility in deploying applications. SDN is being pioneered by companies like Nicira (acquired by VMware) and Big Switch Networks. With SDN, companies can use commodity network switches manufactured by any vendor that supports an open protocol like OpenFlow. With SDN, one can virtualize the network, pool the resources and create virtual wires on-demand, based on the application’s needs. This yields flexibility, agility, and lower capital and operating expenses for deploying and managing applications. (For those using IaaS, they don’t have a network to manage, but rather configurations to adjust. However, once these folks move to a hybrid cloud model, a new challenge emerges with respect to managing balance load and deciding how traffic is split between private and public clouds.)
Intel rules the datacenter — 90% of the world’s workloads run on Intel processors. AMD completely lost the game in the server marketplace to Intel. Recently, there has been increasing interest around using ARM processors in the datacenter. ARM, the company, licenses CPU architecture to manufacturers. ARM processors are typically extremely low-power and, hence, suitable for mobile devices. The majority of modern mobile devices in world (smartphones, tablets, etc.) run on ARM. Apple’s A-series SPUs, Qualcomm’s Snapdragon, NVIDIA’s Tegra, etc. are all based on ARM architecture. [It helps to understand the fundamental difference between Intel’s x86 architecture versus ARM. Intel using CISC (complex instruction set computer) architecture: complex hardware that supports a rich set of complex mathematical operations and computes very quickly, all handled in the hardware and therefore consumes more power and dissipates more heat (requires cooling fans) as the hardware is complex. On the other hand, ARM uses RISC (reduced instruction set computer), a simple hardware solution that supports basic operations and software, which is slower than CISC, to perform more complex operations. RISC consumes less power and is therefore suitable for mobile devices.]
Memory is dominated by companies like Samsung and Hitachi. It is not a very interesting space because it is all based on pricing and density. All vendors are trying to make their chips smaller and smaller, and this drives all prices down. It is a race to the bottom.
THE CLOUD LAYER
The cloud layer abstracts away these infrastructure resources and delivers them on-demand.
Gartner defines “Cloud IaaS” as a standardized, highly automated offering, where compute resources, complemented by storage and networking capabilities, are owned by a service provider and offered to the customer on-demand. Cloud computing enables the delivery and consumption of computing as a utility. You pay as you go and move from CapEx to OpEx. Cloud services enable enterprise to compete better in changing markets, be more agile and optimize resource utilization.
Infrastructure as a Service (IaaS)
In IaaS, the infrastructure is available as a service. Current examples of IaaS companies are Amazon (with AWS), Google (Compute), Nebula, and Rackspace, among others. Currently, these players service small companies but can eventually grow to cover the enterprise market. As a result, the IaaS landscape will likely consolidate to a small handful of big players.
One of the interesting aspects of IaaS is that it doesn’t matter where the infrastructure is coming from. In the public cloud, a company like Amazon doesn’t own the hardware and provides cloud-based services; in a private cloud environment, companies like Nebula or VMware provide solutions; and in a hybrid cloud state, a company owns some hardware but outsources some as well. This hybrid state is where most innovation is happening today, as most enterprise runs private cloud most days but, for busy times, can burst to add more capacity when needed.
The growth of hybrid cloud models is potentially threatening to AWS. As servers shift from public to private, companies will adapt to a hybrid model. This will have initial investment costs to purchase hardware, storage, network setup (capital expenditures), plus operating expenditures to manage it, assuming things are architected correctly with stable traffic.
The “cloud” is about “how” one does compute plus storage (an operational model) rather than “where” it’s done — it’s not location-specific. Infrastructure can be anywhere, but the key is how it is managed. The cloud is getting less and less about technology, and more about process, policies, and orchestration. This trend provides opportunities for hybrid packaged solutions (like Nebula) and management policies for sensitive and/or regulated data (e.g. financial, health, security, etc.).
Platform as a Service (PaaS)
Companies also offer platform-as-a-service, or PaaS, such as Heroku, Appfog, Nitrous.IO, and others. This market will likely have many small players, and will be hard to run into Amazon as they expand AWS offerings from the infrastructure layer. The value in PaaS solutions take the pain of setting up and managing production environments, such as setting up software environments on top of infrastructure, finding the right plugins, managing security patches, and so forth.
A classic PaaS example is Heroku, a service which essentially takes care of all the busy work of setup and maintenance and frees up developers to write and deploy code. Developers can push using Git and make their apps live on Heroku, and this agility results in teams not needing systems- and/or database-experts. Heroku runs on AWS, so customers enjoy the goodness of IaaS already baked in and the developer never has to touch it. The problem is that if and when Amazon experiences an outage, startups who are addicted to these setups are stuck, perhaps rationalizing the need for Nebula-like client-side solutions once a company reaches a certain scale. [[ ex Dropbox, Netflix, Ngmoco, Zynga is doing “Hybrid Cloud Model” – ZCloud started early 2011 (considered innovative operating model]]
Everyone knows SaaS. Briefly, we’ll define it as the pure software and application layer, the place where real scale and innovation will occur. An example of SaaS could be software like Asana, which may (as it grows) provide tiers of service that would empower it to charge for the right to use it. Here’s a brief matrix of some of these companies, to share examples:
“Big Data” is an overused, often misused term. We define “big data” as data one cannot process using traditional analytical techniques, but which require parallel algorithms designed specifically to operate on said data that is usually stored in a distributed fashion. The definition of what constitutes “big data” today will change as computing power increases and price of storage falls. Today, defined in terabytes and petabytes, but in future terabytes might not be considered big data. The reason this is such an exciting space is that the market for all the industries this can effect are huge. Applications of big data are enormous including but not limited to analytics, visualization, business intelligence, reporting, recommendation systems, information discovery, etc. Most think of consumer data, but consider the life sciences, oil and gas discovery, and so forth. For example, a Boeing 787 generates several terabytes of telemetry data on a typical transatlantic flight.
The rise of Big Data can be mainly attributed to two factors: a) denser & cheaper storage devices b) emergence of open-source Big Data processing frameworks (Hadoop, et.al)
MapReduce by Google is one of the most foundational programming models used to process large datasets. Yahoo extended the MapReduce paradigm and introduced the Hadoop open-source framework to perform parallel algorithms on large datasets. The rapid adoption and contribution to Hadoop by companies like Facebook, Twitter, Amazon, etc played a huge role in making big data processing popular.
Although Hadoop remains massively popular in consumer internet companies, the adoption in the enterprise has been relatively slower due to concerns like complexity, security, support and lack of talent to operate Hadoop infrastructure. Whole new category of startups have emerged which are making easier for enterprises to adopt Hadoop. For e.g Cloudera and Hortonworks are doing something very similar to what RedHat did for Linux. These companies offer “enterprise-ready” distributions of Hadoop software, help enterprises deploy them in their cloud and then provide ongoing customer support for the same. With enterprise distributions of Hadoop, enterprises are able to make the transition much faster. However they still have to own and operate the infrastructure to run Hadoop. Hadoop-as-a-Service startups aim to solve this problem by offering big data processing services on demand. Amazon has been a pioneer in this space with their Elastic MapReduce offering. Amazon Elastic MapReduce is a web service that enables businesses to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon EC2 and Amazon S3. Cetas (acquired by VMware) provides advanced, real-time Hadoop analytics and can be deployed on-premise or in the cloud.
In layman terms, virtualization is a way of creating abstract virtual resources from real physical resources. IBM technically invented virtualization decades ago however VMware is usually given the credit for creating an industry around it. The enabling technology behind virtualization is a smart mini-kernel like software, commonly referred to as the “Hypervisor”. The Hypervisor runs directly on top off bare metal hardware and provides abstractions like virtual CPU, virtual memory, virtual disks, virtual networks, etc to the upper layers. Hypervisor enables a single server in your datacenter run multiple virtual machines (VMs) on the server that run in their own container with virtual resources like CPU, memory and storage. Each of these VMs can be running a full-fledge operating system (OS) like Linux/Windows. The OS is usually unaware of the fact that its running on a virtual computer and not a real one.
In a typical datacenter, with tens to thousands of servers, resources are not often fully utilized. With virtualization a single server could be hosting tens, hundreds or even thousands of VMs which dramatically improves the utilization and thereby has a huge impact on the CapEx. When there is resource contention on a server due to high demand for resources from the VMs, the Hypervisor acts as an intermediary and allocates resources to each VM according to its fair share. Virtualization is the key enabler for Infrastructure-as-a-Service products. Key features like elasticity, auto-scaling, multi-tenancy and efficient resource utilization are impossible to deliver without virtualization.
[Reference: see Tintri CEO Kieran Harty with Semil on “In The Studio.”]
Recommended blogs on enterprise IT: