Tuesday, July 05, 2016

Orchestration Sets The Beat For Agile IT


Agile IT has been widely heralded (and equally widely decried) as a way to align the pace of change in IT with the pace of change in the broader business.

At its core, Agile IT is making the very basic point that if in house IT cannot keep pace with the business there are a rapidly increasing number of cloud and SaaS providers for whom that is not a problem.

In short, IT must innovate around time to market or die a death of a thousand credit card cuts as individual developers outsource the IT they need to Amazon and other public cloud providers.

But what is the core activity that drives the shift to agility? Most often this shift is characterized as a vat migration to cloud. This is true, but it misses the key driver of IT agility: orchestration.

How Orchestration Drives The Cloud
IT automation  deals with performing a particular task, such as setting up a single compute node. IT orchestration manages execution of multiple, interdependent tasks. For example, an orchestration workflow manage dependencies such as the need install a database before installing an app.

A cloud without effective orchestration is more like a demo - if it works at all, it is likely to break the first time the need arises to update the apps, data or infrastructure supporting that cloud. Key areas of focus for cloud orchestration include:
  • Reference architecture - any orchestration solution must start with a clear understanding of how the pieces fit together, particularly to guarantee reliability at a specific scale and set of workloads. 
  • OS patching - far from being a settled capability, OS patching remain more of a dark art than a science. While patching itself is straightforward, for every 100 servers patched, several will not reboot properly, causing cascading faults across the cloud. Next-gen computing companies like CoreOS are offering some innovative approaches to solving the OS patching problem.
  • Infrastructure lifecycle management - OpenStack upgrades are notoriously challenging and one of the reasons companies like Mirantis have achieved such success in helping customers build and manage large OpenStack installations.
  • Application lifecycle management - a cloud is only as valuable as the applications running on it. Orchestration and DevOps is needed at the application, infrastructure and OS level.
How to Get To Agile IT
Here are a few hints to simplify the task of building an orchestration-first cloud:
  1. Usage drives design - Understand intended usage (workloads and scale) before you design the architecture. There is no such thing as a “one size fits all” cloud - the cloud architecture is dictated by its intended use.
  2. Don’t skimp on designing and testing the reference architecture - understand what happens to network, storage and compute at scale running realistic workloads. Work through failure scenarios (think chaos monkey) and ensure that HA and DR work under real-world conditions
  3. Don’t just automate, orchestrate -  while small, pilot clouds can be managed with manual processes and single-node automation, large, production-quality clouds require a significant investment in multi-node orchestration and change management.
  4. Address organizational and business process disruptions up front. Understand the impact of cloud on individual IT roles/responsibilities, career paths and opportunities for advancement



Monday, June 20, 2016

How Not To Build A Cloud

Thomas Bitman of Gartner wrote a blog post last year about why OpenStack projects fail. In that article, he outlined three particular metrics which together cause 60% of OpenStack projects to fall short of expectations:
  • Wrong people (31% of failures): a successful cloud needs commitment both from the operations team as well as from “anchor” tenants.
  • Wrong processes (19% of failures): a successful cloud automates across silos in the software development lifecycle, not just within silos.
  • Wrong metrics (10% of failures): a successful cloud focuses on top line transformation by accelerating delivery of innovative applications and services, not merely on squeezing bottom line costs. 


Wrong people
"Agile clouds need agile processes — and people are your biggest supporters, 
or your biggest roadblocks.” - Thomas Bitman
Many OpenStack projects start as technology pilots with part time technical staff. If there is not a single champion responsible for the success of an OpenStack cloud initiative as their full-time job, the chance of failure is high. There are two critical roles that govern cloud success:
  Cloud operations champion: this champion is not just responsible for building and operating the cloud (supplying cloud capacity), they are equally responsible for on-boarding developers and workloads onto the cloud (building cloud demand). Their job is to work closely with developer tenants to make sure that the developer on boarding process is smooth and that key developer tools are available in the cloud application catalog.
   Cloud anchor tenant: developers are overwhelmingly the most important early adopters of private cloud. Accelerating the software development lifecycle through DevOps automation is by far the highest value of private cloud. Therefore the most important validation for a private cloud is to on-board a key set of developers and show the impact of accelerating the development and go live process for their applications. Having an anchor tenant committed to using the cloud is a key prerequisite for achieving success.

Wrong processes
"Is this really cloud? Or just virtualization? And what about 
the stuff running inside the VMs?” - Thomas Bitman
Many OpenStack projects start with very limited goals around provisioning generic VMs or delivering relatively limited development services. This effectively automates just a silo within the software development lifecycle. Business value comes from being able to automate not just within but also across the silos of the software development lifecycle.
  Beware of automating silos: for many IT organizations, the tragedy of virtualization has been that developers can provision a VM within 20 minutes, but getting a fully configured development environment takes over 6 weeks. 
   Aim to automate entire Go Live process: The ultimate goal of a private cloud should be to accelerate the delivery of applications and features by automating the entire process from code check in to go live. This level of automation is also the only way a traditional enterprise can compete with “born in the cloud” SaaS businesses.

Wrong metrics
“Not putting the right metrics in place - usually, this is focusing 
on cost-savings, not agility." - Thomas Bitman

Private cloud has often been sold as a natural extension of virtualization - as such, customers often justified their OpenStack investments based on IT cost savings. While cost savings are one value of a successful cloud, enabling business agility is the core value delivered by OpenStack.

OpenStack projects should measure business value not just for the cloud overall but for each tenant. In particular, they should focus on two tenant metrics:
      Uptime dashboard: public clouds have long delivered detailed uptime metrics. Private clouds must do the same if they are to build trust with tenants and create a business case to justify additional cloud investments.
      Value dashboard: private cloud value is primarily driven by its ability to accelerate the software development lifecycle. McKinsey has documented that DevOps automation can accelerate the go live process by 80%, which in turn can deliver top line revenue growth, for example by enabling greater innovation in customer facing apps. Tracking continuous integration deployments is a proxy for the overall acceleration enabled by private cloud.

Planning for OpenStack Success
The antidote for OpenStack project failure is to build a business case for private cloud that addresses people, process and metric issues. This business case should lay out a phased approach for rolling out their private cloud.

The starting point is identifying a full time cloud champion and teaming them with an anchor tenant who will use the cloud and provide input on how to deliver value by accelerating delivery of new applications and features. The next step is to define a phased set of investments, each with clear success metrics that govern timing for subsequent investment:
  Phase 1: stand up cloud and on-board anchor tenant. Success metric: 99% uptime, 1.5X software development acceleration. Once these metrics are achieved, the company should invest in phase 2 of their rollout.
   Phase 2: on-board additional tenants. Success metric: 99.9% uptime, 2.0X software development acceleration.
   Phase 3: automate go live process from code checkin to production. Success metric: 99.99% uptime, 4.0X software development acceleration.


An ideal approach for a company looking to make a strategic investment in private cloud is to conduct a short pilot in an OpenStack lab that allows them to validate the business case. This kind of a pilot can also allow the cloud champion and “anchor” tenant to work together on clarifying requirements for successfully on-boarding an initial application to the private cloud.