What Makes your Data Warehouse a “Big Data Warehouse”?

I’ve been closely observing the evolution of marketing of the classic database and data warehouse products over the past 2 years with great interest. Now that Big Data is top-of-mind of most CIOs in corporations around the globe, traditional data vendors like IBM, Oracle, Teradata and Microsoft are referring to their platforms as “Big Data” or “Big Data Warehouses”.

I guess, in the final analysis, this is really an attempt by data vendors at shifting perceptions and melding CIO thinking about Big Data away from Apache Hadoop, Cloudera and Hortonworks and toward their own platforms. Certainly, there are some changes taking place to those traditional data warehouse platforms (MPP, in-memory, columnstore) that are important for workloads that are classic “Big Data” use cases: clickstream analysis, big data analytics, log analytics, risk modeling … And most of those vendors will even tack-on a version of Hadoop with their databases!

But this is not necessarily breaking new ground or an inflection point in terms of technologies. Teradata pioneered MPP decade ago, Oracle led the way with smart caching and proved (once again) the infamous bottleneck in databases is I/O. Columnar databases like Vertica proved their worth in this space and that led to Microsoft and Oracle adopting those technologies, while Aster Data led with MapReduce-style distributed UDFs and analytics, which Teradata just simply bought up in whole.

In other words, the titans in the data market finally felt enough pressure from their core target audiences that Hadoop was coming out of the shadows and Silicon Valley to threaten their data warehouse market share that you will now hear these sorts of slogans from traditional data warehouses:

Oraclehttp://www.oracle.com/us/technologies/big-data/index.html. Oracle lists different products for dealing with different “Big Data” problems: acquire, organize and analyze. The product page lists the Oracle Big Data Appliance, Exadata and Advanced Analytics as just a few products for those traditional data warehouse problems. Yikes.

Teradata: In the world of traditional DWs, Teradata is the Godfather and pioneered many of the concepts that we are talking about today for Big Data Analytics and Big Data DWs. But Aster Data is still a separate technology and technology group under Teradata and sometimes they step on their own messaging by forcing their EDW database products into the same “Big Data” space as Aster Data: http://www.prnewswire.com/news-releases/latest-teradata-database-release-supports-big-data-and-the-convergence-of-advanced-analytics-105674593.html.

But the fact remains that “Hadoop” is still seen as synonymous with “Big Data” and the traditional DW platforms had been used in many of those same scenarios for decades. Hadoop has been seen as an alternative means to provide Big Data Analaytics at a lower cost per scale. Just adding Hadoop to an Oracle Exadata installation, for example, doesn’t solve that problem for customers outside of the original NoSQL and Hadoop community: Yahoo, Google, Amazon, etc.

So what are your criteria for a database data warehouse to qualify as a “Big Data Warehouse”? Here are a few for me that I use:

  1. MPP scale-out nodes
  2. Column-oriented compression and data stores
  3. Distributed programming framework (i.e. MapReduce)
  4. In-memory options
  5. Built-in analytics
  6. Parallel and fast-load data loading options

To me, the “pure-play” Big Data Analytics “warehouses” are: Vertica (HP), Greenplum (EMC) and Aster (Teradata). But the next-generation of platforms that will include improved distributed access & programming, better than today’s MapReduce and Hive, will be Microsoft with PDW & Polybase, Teradata’s appliance with Aster & SQL-H and Cloudera’s Impala, if you like Open Source Software.

Continuation of System Center with SQL Server Series – With a SQL Server Appliance

OK, I know, I am waaaay behind on the next parts of my series on System Center for SQL Server. I promise to continue with SCCM and VMM in the coming weeks. Promise!

That being said, I want to put up a tiny post today on this topic. What I am describing in this series is a data center environment where you are leveraging Microsoft’s best-in-class tools to provide a smooth-running operation where you are monitoring, managing and tuning your SQL Server footprint with minimal wasted effort and time, while maximizing your staff time and your hardware investments.

Seeing that the current trend in IT data centers is toward virtualization to achieve that server hardware optimization, Microsoft has worked with HP to put out a consolidation appliance that is built just for SQL Server database consolidation: http://www.microsoft.com/sqlserver/en/us/solutions-technologies/Appliances/HP-dca.aspx.

This is a hardware rack solution optimized for virtualizing your SQL Server footprint on a Windows Server Hyper-V environment. And it comes installed and licensed (included with the purchase price) with SQL Server and the full System Center Suite that I have been explaining to you in this System Center for the SQL Server DBA series.

A pre-built optimized HP rack with SQL Server & System Center may not be the answer for everyone. But if you are reading this series because you would like to consider SQL Server consolidation on Hyper-V, managing your complex SQL Server environment with Virtual Machine Manager, monitoring, configuration management, etc. the Database Consolidation Appliance is a good option for you to consider. This entire “private cloud” or “optimized data center” environment with SQL Server & System Center that I am describing in this series is already installed, configured and rack mounted for you!

Best, Mark