Thursday, September 12, 2013

Engineering Management - Shaolin Style


A friend of mine just got a well-deserved promotion from code horse to manager. Here are my quick thoughts on making that transition.

The basic idea is that when you are given a little more responsibility, your words and actions carry more weight. For that reason, it is important to be careful about throwing that weight around.

You job is no longer to optimize your output, but to optimize the output of your group. Don't be the genius with a thousand helpers!

In particular, here is some advice to ease into a new engineering manager role:

  • Listen more. There is an expression about argumentative people - "they don't listen, they just reload." Since your words carry more weight, make sure you really understand other people's point of view before you offer your own. Once you wade in with guns blazing, other engineers will be less likely to confront you.
  • Code less. The tradeoff for more human communication is less computer communication. The time you spend helping make other people effective comes directly out of your average daily KLOC. Remember, you are making the team's total output better at the expense of your own output - this will smart a bit at first!
  • Start team building.
  • Stop architecting. If your vote counts for more than other engineers by dint of your hierarchical position, you can win architecture arguments just by yelling louder. To build a real engineering team, you have to separate the team leadership position from the tech leadership position. If you are the team leader, you just can't be the tech leader as well.

The net of it all is to use more influence, less telling; more carrot, less stick; you get the picture!

Monday, May 20, 2013

Health Care Transparency Requires Open Data


Transparent pricing and quality data is the foundation of the US economy, yet is entirely lacking in our Health Care industry. New players like Castlight have raised over $130 million to provide greater transparency, but only to selected customers who pay for that data.


I believe making health care pricing information freely available (like Wikipedia for health care data) will help reduce these inequities in our health care system. 

Last week's release of Medicare provider charge data from hospitals across the US pointed the way forward - making pricing data publicly available to everyone. Because the government pays in a unique way, this data is only a starting point - what is needed is a public data set showing what employers and individuals pay for these same services.

Several years ago, I had a personal experience that ignited a passion to drive change in US healthcare. While our family was living in Paris, my son was diagnosed with a benign brain tumor. We went through a series of medical procedures in France and then repeated them on our return to San Francisco.

Because our insurance only covered major medical procedures, we had to pay these bills personally. We found that medical costs for in the US averaged a factor of seven to ten times higher than what we had paid in Paris.

A good first step would be to analyze claims data from 3-5 large US employers to create a dataset showing the prices eployers paid for the most common procedures across providers (including the top 100 most frequently billed discharges information published by  Medicare). This analysis would help employers verify the health care prices they are paying.

Making this information available on a publicly available web site could unlock a wave of innovation in the world of health care, much as open source communities have transformed the software world.


Monday, March 18, 2013

Hadoop Will Not Mow Your Lawn


"The best minds of my generation are thinking about how to make people click ads." Jeff Hammerbacher ex- Facebook Architect

It turns out that when you have a lot of "best minds" working on the same problem, you come up with some pretty interesting technology - no matter how inane that problem may be.

The technology that those "best minds" at Yahoo came up with to target ads to users is called Hadoop. 

Hadoop is a powerful technology and like most new IT solutions is being touted at being able to solve a vast number of technical ills. When companies discover that Hadoop will not in fact cure male pattern balding, they will fall into the inevitable trough of disillusionment

Here are some thoughts about what Hadoop can and cannot do:

1. RDBS are for business data, Hadoop is for web data

Almost all traditional business data fits well into the relational model, including data about customers (CRM), products (ERP) and employees (HR). This data should continue to live in relational databases, where it is much easier to manage and access than in Hadoop.

Almost all web data fits well into the Hadoop model, including log files, email and social media. This data would be almost impossible to store in a relational database, not just because of the volume, but because of the inherently nested quality of the data (threaded email conversations, web site directory structures, social media graphs).

2. Hadoop is really good at analyzing web data

Hadoop is incredibly good at looking at huge amounts of web data and figuring out why people clicked on the blue button instead of the red one. This can be generated to a few other computer log formats, but the list is relatively small, including:
How many other data types look like click streams? Not very many. How many other real world problems lend themselves to analysis using web data analytic techniques? Also not as many as you might think.

This is not to take anything from the Hadoop market opportunity - as more of the world interacts with each other via web applications and devices, more of the world's data will be reducible to click-stream-like formats. 

The big data craze has taken over the tech media world much like the cloud craze. Most people know it is important but they don't know why. Many vendors get caught up in the hype cycle and start to believe that their technology has some sort of manifest destiny that will allow it to do much more than it can reasonably be expected to do.

3. Hadoop is a Pay Me Later Technology

Traditional data warehouses work on a "pay me now" basis. To get data into the data warehouse - even data that may not end up being useful in any way - you have to massage the data into a formal relational model. This is expensive and the data normalization process itself may make it impossible to get at the data in exactly the way you want to.

In contrast, Hadoop works on a "pay me later" basis. Data can be shoved into the Hadoop file system any old way. It is not until someone wants to analyze the data that you have to worry about how to connect all the pieces. The gotcha is that the price you pay in this "pay me later" model is much higher, requiring extensive programming in order to ask each question. 

In addition, because the normalization process wasn't done up front, it won't be until later that you may discover that you were missing crucial pieces of information all along. Thus it does bear some thinking up front on what sort of data to store in your Hadoop database and what kinds of questions you might want to be able to answer about that data in the future.  

Realistically, it will take most businesses who implement several years to figure out whether all the data they are dumping into Hadoop produces real value out the back end, just as it was several years before companies started to get a payout from their investments in relational data warehouses.

4. Use the right tool for the right job

Back in my - very brief - high school shop days, we learned that the trick to making a really nice looking ash tray is picking the right tool for the right job.
  • Hadoop is web data query engine that requires a high level of effort for each new query. 
  • Relational is a business data query engine that requires a high level of effort to format and load data into the datastore.
The fastest way for companies to get into trouble with Hadoop is to try to use it as a one-size-fits-all data warehouse. Much of the news in the Hadoop world today has to do with SQL parsers that run on top of Hadoop data. This is a powerful and valuable technology, but does not mean that you can throw out your data warehouse and replace it with Hadoop just yet.



Tuesday, February 05, 2013

What I'm Talking About When I'm Talking About PaaS


I recently got some feedback on my previous musing that from the customer viewpoint, PaaS equals automation. That led me to think of ways to articulate better what this means both to customers and vendors.

Customers are basically indifferent to PaaS. This can be seen in the very modest market for PaaS as opposed to all the other aaS-es. Where is the PaaS that is producing anywhere near the value of the biggest SalesForce's $2.3B in SaaS revenues or Amazon's ~$1B in IaaS revenues?

Customers are indicating - in the only way that matters - that they value they perceive from PaaS is orders of magnitude lower that the value of other cloud offerings.

Are customers right to be so indifferent about PaaS? In a word, yes.

Vendors have not done a good job of explaining the value of PaaS beyond singing paeans to productivity that comes from being able to deploy a complete application without having to configure the platform services for that application.

The NIST definition of PaaS defines it as "the capability to deploy applications onto the cloud without requiring the consumer to manage the underlying cloud infrastructure." (note: paraphrasing here as the NIST folks don't seem to write in English)

Here's the problem with that definition: it mirrors exactly how 99% of Enterprise developers already work! In the enterprise, the functional equivalent of PaaS is IT. Once an enterprise developer is done with their app, they throw it over the wall to dev ops/app ops folks who magically push it through the production cycle.

For most developers, the value proposition articulated by PaaS vendors just doesn't seem all that different from what they can get from internal IT or external IaaS.


  • IaaS allows me to rent a data center with a credit card and zero delay versus going through a six month IT acquisition cycle - eureka!
  • SaaS allows me to deploy whole new business capabilities without a two-year funding and development cycle - hallelujah!
  • PaaS has a lot more to offer than just productivity, but so far, that is all customers understand about it - so they let out a collective yawn.


Until PaaS vendors find ways to connect their platform to solving critical IT and business problems, PaaS will remain an under-perfoming member of the cloud family.