What Makes your Data Warehouse a “Big Data Warehouse”?

I’ve been closely observing the evolution of marketing of the classic database and data warehouse products over the past 2 years with great interest. Now that Big Data is top-of-mind of most CIOs in corporations around the globe, traditional data vendors like IBM, Oracle, Teradata and Microsoft are referring to their platforms as “Big Data” or “Big Data Warehouses”.

I guess, in the final analysis, this is really an attempt by data vendors at shifting perceptions and melding CIO thinking about Big Data away from Apache Hadoop, Cloudera and Hortonworks and toward their own platforms. Certainly, there are some changes taking place to those traditional data warehouse platforms (MPP, in-memory, columnstore) that are important for workloads that are classic “Big Data” use cases: clickstream analysis, big data analytics, log analytics, risk modeling … And most of those vendors will even tack-on a version of Hadoop with their databases!

But this is not necessarily breaking new ground or an inflection point in terms of technologies. Teradata pioneered MPP decade ago, Oracle led the way with smart caching and proved (once again) the infamous bottleneck in databases is I/O. Columnar databases like Vertica proved their worth in this space and that led to Microsoft and Oracle adopting those technologies, while Aster Data led with MapReduce-style distributed UDFs and analytics, which Teradata just simply bought up in whole.

In other words, the titans in the data market finally felt enough pressure from their core target audiences that Hadoop was coming out of the shadows and Silicon Valley to threaten their data warehouse market share that you will now hear these sorts of slogans from traditional data warehouses:

Oraclehttp://www.oracle.com/us/technologies/big-data/index.html. Oracle lists different products for dealing with different “Big Data” problems: acquire, organize and analyze. The product page lists the Oracle Big Data Appliance, Exadata and Advanced Analytics as just a few products for those traditional data warehouse problems. Yikes.

Teradata: In the world of traditional DWs, Teradata is the Godfather and pioneered many of the concepts that we are talking about today for Big Data Analytics and Big Data DWs. But Aster Data is still a separate technology and technology group under Teradata and sometimes they step on their own messaging by forcing their EDW database products into the same “Big Data” space as Aster Data: http://www.prnewswire.com/news-releases/latest-teradata-database-release-supports-big-data-and-the-convergence-of-advanced-analytics-105674593.html.

But the fact remains that “Hadoop” is still seen as synonymous with “Big Data” and the traditional DW platforms had been used in many of those same scenarios for decades. Hadoop has been seen as an alternative means to provide Big Data Analaytics at a lower cost per scale. Just adding Hadoop to an Oracle Exadata installation, for example, doesn’t solve that problem for customers outside of the original NoSQL and Hadoop community: Yahoo, Google, Amazon, etc.

So what are your criteria for a database data warehouse to qualify as a “Big Data Warehouse”? Here are a few for me that I use:

  1. MPP scale-out nodes
  2. Column-oriented compression and data stores
  3. Distributed programming framework (i.e. MapReduce)
  4. In-memory options
  5. Built-in analytics
  6. Parallel and fast-load data loading options

To me, the “pure-play” Big Data Analytics “warehouses” are: Vertica (HP), Greenplum (EMC) and Aster (Teradata). But the next-generation of platforms that will include improved distributed access & programming, better than today’s MapReduce and Hive, will be Microsoft with PDW & Polybase, Teradata’s appliance with Aster & SQL-H and Cloudera’s Impala, if you like Open Source Software.

Oracle Data in SQL Server, Part 3: Oracle Replication

I’ll finish-up on my series on data migration from Oracle to SQL Server with a few notes on migrating data from Oracle to SQL Server utilizing the SQL Server replication functionality. A SQL Server 2005 & 2008 enterprise edition feature called Oracle Publisher to enable transactional or snapshot replication from Oracle to SQL Server is very useful, particularly to DBAs. Application developers can migrate, integrate and mix & match Oracle and SQL Server data through OLEDB, SSIS, .NET, etc. But DBAs who focus on database management and do not wish to get dirty in the marsh of Visual Studio or BIDS, can use SSMA or T-SQL to configure SQL Server replication with Oracle.

This is very easy to configure, can be maintained through the traditional SQL Server replication monitoring mechanisms already familiar to you and performs very well.

When I set-up my first Oracle Publisher from SQL Server 2008 replication, I used these steps outlined here. You will need to run a script that is a PL/SQL file dropped in your SQL Server instance install directory. I ran it from SQL*PLUS as “system” and it set-up a replication user in Oracle that the Oracle Publisher will use from SQL Server’s replication engine. Also please note from that link above with the steps that you need to follow, that you must have access to the Oracle server or the Oracle client libraries so that you can reach the Oracle database while you are setting up the replication.

On my latest test servers, I have Oracle co-located with SQL Server. But the SQL Server that you set-up as an Oracle replication server will not likely have the Oracle database co-located there like I do. Therefore, use the Oracle TNS and SQL*PLUS utilies to test connectivity before going through this process and getting a headache trying to troubleshoot the configuration process.

Once you’ve configure the publication, you can then monitor and manage the replication as you would with SQL Server replication sources.

Oracle OBIEE with SQL Server

I promised several weeks back to begin blogging more about my experiences with and at Oracle over the years. I’ve been tardy in doing do, but today I’m getting at it!

I want to start with Oracle BI because I worked quite a bit on this product while @ Oracle where we built packaged analytics products using OBIEE. OBIEE is the Enterprise Edition of OBI from Oracle and is now in release 11g, which just hit the market this summer (2010). It was finally updated using new ADF capabilities and other improved features which came from the Oracle acquisition of Siebel.

OBIEE started as a CRM Sales Analytics product from Siebel and Oracle primarily uses it to sell packaged analytics offerings for their ERP applications. But you can use it as a customizable BI tool, which many customers do. It is quite pricey, but if you have invested in it within your business, you can still utilize your SQL Server sources and applications.

First, SQL Server can be a source database. When you are in the OBI Admin Tool, you can use the MS SQL ODBC driver to bring in your SQL Server database sources as is demonstrated in this blog here.

Secondly, the OBIEE metadata, reports and KPIs are exposed via ODBC, meaning that you can including those in your Microsoft SharePoint BI solutions or in your SQL Server BI applications such as is discussed nicely here, laying out some options.

I’m not going to say that the second part above is easy. In fact, I have experience with Oracle customers that sometimes would fall back to the OBIEE add-in for Office to get at OBI reports. But I have used the SQL Server data source approach with OBIEE in the past to build dimensional models in OBIEE and it worked quite well without too much effort.

Microsoft BI with Oracle Applications

Is it necessary to always think in terms of Microsoft v. Oracle from a DBA or developer’s perspective? Or can these 2 technologies co-exist and flourish together when you are faced with such a heterogenous environment?

Of course, they can exist side-by-side and I have experienced many environments like this and have written a few pieces about some of these mixes. An example would be a Microsoft .NET or Microsoft BI developer writing solutions and reports against an Oracle application. I have some examples such as on my Oracle blog here which I authored when I was an Oracle product manager, where I demonstrate using Report Builder with Oracle applications and an Oracle data warehouse.

I am going to expand on using Microsoft BI & reporting from SQL Server to access Oracle applications and Oracle databases, as well as best practices in exchanging data bi-direction between the 2 databases. It is important to understand best practices, limitations and gotchas which I run into from time to time, especially when migrating databases from Oracle to SQL Server. I’ll run through this in about 3 posts as a new series starting when I return home later in the week.

In the meantime, if you are interested in more sharing or Oracle & SQL Server co-existing, check out the SQL Server Oracle Virtual Chapter from PASS, the Professional Association of SQL Server chapter for Oracle.