Convert SSAS Snowflake Schemas to Pentaho Mondrian

Last weekend at a local tech conference, I was speaking to a few of my friends in the SQL Server & Microsoft BI consulting business about migrating cubes and BI solutions from SSAS to Pentaho using the Mondrian ROLAP engine with tools like ssas2mondrian. I explain different methods of starting over and generating new OLAP models using Pentaho’s auto modeler and Analyzer here and how to start with migrating simple SSAS cubes to Mondrian ROLAP schemas here.

So I felt like it would be a good challenge to see if I could convert a more complex SSAS cube to a Mondrian model that included Reference Dimensions in SSAS and convert those to a snowflake model in Mondrian using JOINs in dimension definitions.

Additionally, if your Microsoft BI solution is using Excel in SharePoint for visualizations (Excel is widely used for business intelligence), you will want to use the Pentaho visualizations to connect to the Mondrian ROLAP engine after migration. I think I’ve got this most of the way there, so I thought I’d kick it out in a blog post here.

The best way to begin the migration process is to use the ssas2mondrian utility that I put here on Box as a compiled EXE, or grab the source code on GITHUB and have at it. I would love to get some community additions to grow this migration capability.

The Adventure Works sample data set includes snowflake dimension designs, but I felt like working on a different data set this time. So I used the Mondrian sample Foodmart data set this time and made a simple cube in SSAS that included the dimension-to-dimension join to represent a snowflake schema with the Product Class dimension to Product dimension, along with Regular dimension relationships to fill-out just a simple, minimal cube:

ssas2_5

After deploying that cube to the SSAS server, Microsoft BI consultants will want to put visualizations for end-users in Excel, SharePoint and SSRS that will look something like this:

ssas2_1

This is a simple Excel representation of sales from the Foodmart fact table sliced up by Product Family, which is from a leaf-level dimension attribute coming from the Product Class dimension table, represented in SSAS as a Reference Attribute and as a snowflake join in Mondrian.

Now let’s convert that to Pentaho …

First, I’ll run the ssas2mondrian utility to build out the base model, which takes the online cube from my SSAS database and generates Mondrian schema XML file output which I’ll redirect to a local file on my filesystem from a Windows PowerShell (DOS prompt will work just fine, too):

ssas2_3

Ok, let’s open this up and see what we get in PSW (Pentaho Schema Workbench):

ssas2_6

Notice that ssas2mondrian has converted measure groups to virtual cubes (no support yet for measure groups in Mondrian 3.x), assumptions have been made about key columns for dimensions (you can’t have composite keys for dimensions in Mondrian 3.x), hierarchies were generated and physical cubes were generated from fact tables. Check the help (ssas2mondrian /help) in the utility for other options when migrating from SSAS to Mondrian.

To fix the red Xs in the schema, we need to clean-up a few things. In this case, PSW did not like the table names because Mondrian is a ROLAP engine and will issue SQL queries converted from MDX from a client tool. I did used SELECT statements from SSAS because I kept my Foodmart database in MySQL as the source for this SSAS schema. So I cleaned-up those names and removed a few levels in the hierarchies that didn’t make sense for these sample reports to get to a clean schema check:

ssas2_4

That’s better. Now publish the Mondrian schema to your Pentaho Business Analytics server and you’re ready to start building Analyzer reports like this pie which is an OLAP query sending MDX to Mondrian using the snowflake join dimension of Product Family:

ssas2_2

Let’s do a quick solution check: we’ve migrated the cube model from SSAS to Pentaho Mondrian, the report from Excel to Pentaho Analyzer and now let’s build a dashboard to replace SharePoint … For more info on building Pentaho Dashboards, check out this YouTube video.

ssas2_7

There ya have it. Now, admittedly, I used a set of controlled samples that pretty much led me to a successful migration in a snowflake model. That begin said, I am hopeful that these instructions, the demo and the ssas2mondrian utility help to provide you with the means to get started with a migration.

BTW, I’ve posted the SSAS & Mondrian schemas here if you’d like to download them to play around with.

AdventureWorks DW on Pentaho, part 3: Geo Mapping AdventureWorks

In the previous parts of this series, I’ve showed you how to make quick, auto-modeled OLAP cubes and reports on a SQL Server AdventureWorks data warehouse as well as how to take your existing SSAS cubes on top of those SQL Server DWs and turn them into Mondrian cubes, which runs natively inside of the Pentaho Business Analytics suite.

Here in part 3, I am going to take the Analytics a step futher by using Geo Maps on top of the SQL Server Adventure Works data warehouse by modeling the ROLAP schema manually instead of with the Pentaho auto-modeler. And since I’m finally upgraded to the latest 5.0 version of the Pentaho Suite. you may notice a few UI differences and workflow changes in the process. To follow along, make sure you download the latest & greatest Pentaho 5.0 product here.

Instead of starting the analysis process from the Pentaho User Console or Analyzer, we will build a model first as an XML document and we’ll use the Pentaho cube design tool, Pentaho Schema Workbench:

p5_04

You can download the complete sample XML schema that I built which you use in Pentaho Mondrian and Schema Workbench to connect to your SQL Server AdventureWorks Data Warehouse database to try this out on your own Pentaho Business Analytics suite. What I typically do when building a manual Mondrian schema is to start-up the Pentaho Schema Workbench (PSW) client and build out the primary parts of the model such as the measures, dimensions, calculations, etc. Then save the schema file as an XML file and edit it with an XML editor such as Visual Studio, Eclipse or Notepad++.

The reason why we are following this workflow for this example is do to the nature of the SQL Server sample data warehouse, Adventure Works (AW). AW makes use of a number of “snowflake” schema joins between dimension tables. The auto-modeler from Analyzer and PUC in Pentaho support only “star” schemas, so I write the model using PSW and manual XML to join the Customer dimension with the Geography dimension and the Customer key back to the customer key in the Fact table, which is FactInternetSales in this model. This will provide Analyzer with an MDX query to resolve the geography from the Fact table and graph it on the map using the hierarchy that I’ve defined below as Country, State, City, Postal Code:

<Dimension type=”StandardDimension” visible=”true” name=”Geo”>
<Hierarchy name=”default” visible=”true” hasAll=”true” primaryKey=”CustomerKey” primaryKeyTable=”c”>
<Join leftAlias=”c” leftKey=”GeographyKey” rightAlias=”g” rightKey=”GeographyKey”>
<Table name=”DimCustomer” schema=”dbo” alias=”c”>
</Table>
<Table name=”DimGeography” schema=”dbo” alias=”g”>
</Table>
</Join>
<Level name=”Country” visible=”true” table=”g” column=”CountryRegionCode” uniqueMembers=”false”>
</Level>
<Level name=”State” visible=”true” table=”g” column=”StateProvinceName” uniqueMembers=”false”>
</Level>
<Level name=”City” visible=”true” table=”g” column=”City” uniqueMembers=”false”>
</Level>
<Level name=”PostalCode” visible=”true” table=”g” column=”PostalCode” uniqueMembers=”false”>
</Level>
</Hierarchy>
</Dimension>

. . .

<Cube name=”SalesGeo” visible=”true” cache=”true” enabled=”true”>
<Table name=”FactInternetSales” schema=”dbo”>
</Table>
<DimensionUsage source=”Geo” name=”Geo” visible=”true” foreignKey=”CustomerKey”>
</DimensionUsage>
<Measure name=”TotalSales” column=”SalesAmount” datatype=”Numeric” aggregator=”sum” visible=”true”>
</Measure>
<Measure name=”NumOfCusts” column=”CustomerKey” aggregator=”distinct-count” visible=”true”>
</Measure>
</Cube>

Now that we have a functioning schema, publish it to the Pentaho BA Server either from Schema Workbench under File >> Publish, or through the Import mechanism in the Pentaho website. This will make the schema available to both the Mondrian ROLAP engine and to your report designers who are using Analyzer for OLAP queries and reports:

p5_05

 

I’ll select the AWGeo model and now we’re in the familiar Pentaho Analyzer experience where I can grab the measures that I built in the model from AdventureWorks for total sales and distinct count of unique customers:

p5_01

 

To put this into a Geo Map, I’ll remove the levels from the Geography dimension and leave just the top-level Country level as the Geo Dimension on the design surface. This way, we’ll drill-down from highest level down to finest grain in the map. Select Geo Map from the visualization drop-down menu at the top-right of the Analyzer screen … Now inside the map, I can drill to detail based on my 2 measures that are defining the circle size and color:

p5_02 p5_03

 

There ya have it … SQL Server 2012 data warehouse serving data to the Pentaho ROLAP engine (Mondrian) with a defined schema that models the Geography dimension for customers of Adventure Works with total sales and unique customers.

 

Migrate Adventure Works Sales Cubes from SSAS MOLAP to Mondrian ROLAP

In the 2 previous blog entries about recreating the SSAS example cubes for Adventure Works sales BI on Pentaho, I focused on generating new departmental-sized ROLAP cubes through the thin client Auto Modeler tools here & here.

In this entry, I’m going to show you how you can move over to the Pentaho BA Suite quickly by getting started with your existing SSAS cubes, such as the Adventure Works sales cubes, by migrating them pretty much as-is to the Pentaho ROLAP engine called Mondrian. This way, you can use the modeling tools that you are already familiar with (i.e. BIDS) and just convert them to Mondrian. Once you are converted over to the Pentaho platform, then you can start playing around with Pentaho analytics and manage the new version of the cubes in Pentaho.

In case you didn’t already do so, make sure you go here to download the evaluation of the Pentaho Business Suite here. And here is the Codeplex link for the open source download of the Adventure Works SSAS 2012 MOLAP cubes that we’ll use for this demonstration.

Make sure you also download the SQL Server 2012 JDBC driver and the tool that I built to help you migrate your online SSAS cubes from the SQL Server Analysis Server to Mondrian XML schema output files using AMO here, called SSAS2MONDRIAN. The full source code and C# project is available on GitHub in case you want to add more to it, have a look around, etc. The tool needs more features and error catching, so please contribute your ideas back to the source project!

So basically, instead of recreating a new OLAP model in Pentaho’s Analyzer, we are going to just migrate as-is from SSAS. Instead of rewriting the details of some of the primary differences between the 2 engines that you need to be aware of, I’m going to just link you back to my last posting on this topic at KromerBigData. You’ll need to become familiar with the Mondrian schema and MDX differences because we don’t have a complete migration tool that can do it all for you. Instead, the ssas2mondrian utility will take your model outline of dimensions, measures, expressions, attributes, hierarchies and  measure groups and convert them to Mondrian descriptors and names, but leaving out decisions about conflicts and conversion gaps in the hands of the BI developer.

Ok, once you’ve loaded the AdventureWorks 2012 DW database and the MOLAP cubes from Codeplex into SQL Server, you should see them appear in your SSMS:

s0

 

Next, run the ssas2mondrian utility and redirect the output to an XML file so that we’ll be able to open the model in Pentaho Schema Workbench to make it ready for Mondrian:

mon0

You can see some of the output in that PowerShell screenshot above. I first ran the conversion utility without redirecting the output to a file, so some of the stdout is visible. I selected the server, cube, SSAS databsae and chose a new name for the resulting Mondrian schema, which is what will appear to report users when they build reports in Pentaho Analyzer.

Notice the conversion utility just using Console.WriteLine to output a Mondrian schema format. But, now I can open that schema file in Pentaho Schema Workbench (PSW) and work with it to clean things up and make it ready for Pentaho Analytics:

psw2

Make sure that you connect to the AdventureWorksDW2012 database from PSW as well under Database Connections. Remember that Mondrian is ROLAP only, so you will always send queries to the source database in Mondrian which will handle caching as well. Also take notice of the red check marks above. That means that you’ll want to open those nodes in the tree and look at the error in the bottom of the PSW screen. It is possible that some of the table names are actually named queries in the SSAS cube model, which means that those will need to become views in the source Adventure Works database to resolve the red check in PSW for Mondrian.

psw1

You can now use that ROLAP model to build Pentaho Analyzer reports and continue to maintain the models directly in Pentaho Mondrian, taking SSAS away from the solution. Or, if you like designing models in BIDS, build your cube there and then run ssas2mondrian to load the model into the Pentaho suite!

OPASS Discussions & Big Data in the Real World

If you are in the Orlando Sanford area tomorrow night (Thursday October 24 @ 6;30 register here) then come join the Orlando PASS gang where I will lead a discussion on Big Data and Big Data Analytics in the real world. I am basing this discussion on the systems that we designed and took to market for Razorfish and will include demos of HDInsight, Hadoop, NoSQL databases, Big Data Analytics with Pentaho and SQL Server. Here is the Slideshare link to the presentation for the meeting.

Adventure Works SQL DW on Pentaho OLAP, part 2

In Part 1, I built a very simple rudimentary OLAP model for reporting in Pentaho’s Business Analytics suite by pointing the Web-based auto modeling tool to my SQL Server Adventure Works data warehouse.

In part 2, I am going to show you how to use the OLAP Thin Modeler and Analyzer tool to modify that base auto-generated model and to create custom calculations from Adventure Works.

I’m going to start in the Pentaho User Console and I’m going to edit the model that was generated for AdventureWorks in Part 1 by selecting “Manage Existing”

aw3

In the subsequent Thin Model Editor screens, you will be able to do a number of operations on the cube model that you may be familiar with if you are developed SSAS cubes in the past and had to modify data source views, dimensions, measures, calculations, etc. What I am going to do is to take out some of the fields that will just cause confusion and noise for the end Analyzer user and to make sure that I have the proper formatting and properties set on the most important and common reporting fields in the model.

aw10

 

You may notice that the default auto-generated OLAP model contains Dimension Levels that may have been created with assumptions that don’t fit your business needs. This is where you will go in there to clean those up, modify hierarchies, etc. In my case, I renamed CalendarYear to just “Calendar” and dragged in Year, Quarter, Semester to build a hierarchy to aid users in navigating the data.

Now when a user builds Analyzer reports from my model, they will have a much cleaner logical model to work from. Make sure that you also use that last step to clean up names and make the entity names user-friendly with business entity desciptors, not database column names!

Next, create an Analyzer report from this new AdventureWorks cube model. I’ve dragged over UnitPrice as m measure into the pivot table and I’m going to slice the data my Model Name. But I also want to see the total per member of the total amount, which is Unit Price + Tax Amount. To do this, i made a custom measure from Analyzer with a right-click on the table header and enter this simple formula into the edit box: [UnitPrice] + [TaxAmt]“. Any valid MDX statement will work here to make calculated measures, i.e. ([Measures].[Unit Sales], Ancestor([Product].CurrentMember, [Product].[Product Family]))”.

aw11

 

And here’s what that looks like as a chart with a bar for the UnitPrice from the AventureWorks DW and my new OLAP Calculated Measure for Total Price:

aw12

 

Next up in the Part 3 finale of the series: We’ll take that cube model outside of the Analyzer / Auto Modeler realm and build it out even further and test it with our own MDX queries directly.

 

Adventure Works SQL DW on Pentaho OLAP, part 1

Let’s say you have SQL Server databases for your data marts and warehouses in your organization, but you are looking for alternatives to SSAS cubes and SharePoint-based dashboards. If you are like me and have built many BI solutions in SSAS OLAP and PPS, then you’ll find a transition to OSS with Pentaho to be very easy and a natural fit for those BI scenarios. We’re going to take this SQL Server 2012 DW and SSAS cube and recreate it all in Pentaho:

aw1aw2

I’m going to kick this series off with a very simple BI dashboard using the traditional SQL Server Adventure Works data warehouse data set and put a Pentaho ROLAP cube and dashboard on top of that data. What I think you’ll really be impressed with is how much easier it is than building cubes in Visual Studio and PPS dashboards.

Start by going to Pentaho.com to download an evaluation version of the Pentaho Business Analytics suite and run the installer. We’re going to stay completely in a thin client browser experience for this demo, so no need to open any IDE tools through this entire workflow.

I’m using 4.8.2 of the Pentaho suite for this series and so when I log into the portal, I will use the Analysis feature to point to the SQL Server database and auto-generate the OLAP cube as well as design the visualizations. This is also where I can put my reports and analysis views together for end-user dashboards:

aw3

 

When you click to create new Analysis content, all you have to do to recreate a cube in the Pentaho Mondrian OLAP engine is to point to your DW database source and the engine will auto generate the model for you. So to replace the SSAS cube I have above, just follow the flow of these next 3 steps as I point to SQL Server, select my facts and dimension tables and create the star joins. This is a very simple beginning model which we will use for the rest of this series to build upon with custom calculations and other features. In classic AdventureWorks mode, I will name InternetSales as my fact table and let’s keep things simple and straightforward for this first intro to Pentaho modeling and just join in the Date and Product dimension tables. Just like SSAS, Mondrian will treat facts as measures and dimension tables as dimensional hierarchies and aggregation levels from your data warehouse. Much, much more on this in the coming weeks …

aw4

 

aw5aw6

 

That’s all you have to do. Now you will select the model which will show up in your list as the name that you typed in from the screenshot above which I called simply “AdventureWorks”. Selecting that model will drop you into the interactive Analyzer reporting tool where you will see a field list to make pivot tables and data visualizations which you can see below. Think of that last step as running through the SSAS cube wizard in Visual Studio with data source views and publishing the cube on the SSAS server. This can be done in a much smaller number of steps here in Pentaho.

aw8

 

Remember that this is using ROLAP for the cube engine (Pentaho’s Mondrian), so in the upcoming parts of the series, I’ll talk about optimizing and customizing the logical OLAP models so that these reports can perform well for you. Your final step is to put your reports together as an interactive dashboard. In upcoming parts of the series, we’ll make the dashboard fully interactive as well. For now, just pick a couple of reports that you’ve generated from the AdventureWorks model and drop them onto the Dashboard Designer surface:

aw9

 

That’s pretty much it. We’ve recreated the OLAP cubes and reports and put them into a dashboard all in a browser using Pentaho with the SQL Server database source, replacing SSAS and SharePoint PPS. I know it was quick & brief, so think of this first part as just a teaser. We’ll continue to build this out with more features so that you can have some guidance when trying out Pentaho’s BI Suite.

As always, just reach out to me with any questions or requests. Enjoy! Br, Mark

FYI, if you need to download the SQL Server sample databases above, you can go here on Microsoft’s Codeplex site to get Adventureworks.

 

SQL Dude Adventures and BI Blogging

Hi Everyone! My apologies for the gap in blogging … It’s been a long couple of months transition over to the Open Source Software world and relocating to Orlando for Pentaho. It’s been an amazing adventure so far and I’ve just started transitioning my work to the hybrid OSS / commercial world for this MSSQLDUDE blog, SQL PASS, local BI Meetups & the SQL Server Pro Magazine BI Blog.

I will post my Adventure Works on Pentaho OLAP blog post this evening to kick it off and if you are looking for the more purist Big Data Analytics and Pentaho coverage, check out my Big Data Analytics blog here.

See ya soon! Best, Mark

What Makes your Data Warehouse a “Big Data Warehouse”?

I’ve been closely observing the evolution of marketing of the classic database and data warehouse products over the past 2 years with great interest. Now that Big Data is top-of-mind of most CIOs in corporations around the globe, traditional data vendors like IBM, Oracle, Teradata and Microsoft are referring to their platforms as “Big Data” or “Big Data Warehouses”.

I guess, in the final analysis, this is really an attempt by data vendors at shifting perceptions and melding CIO thinking about Big Data away from Apache Hadoop, Cloudera and Hortonworks and toward their own platforms. Certainly, there are some changes taking place to those traditional data warehouse platforms (MPP, in-memory, columnstore) that are important for workloads that are classic “Big Data” use cases: clickstream analysis, big data analytics, log analytics, risk modeling … And most of those vendors will even tack-on a version of Hadoop with their databases!

But this is not necessarily breaking new ground or an inflection point in terms of technologies. Teradata pioneered MPP decade ago, Oracle led the way with smart caching and proved (once again) the infamous bottleneck in databases is I/O. Columnar databases like Vertica proved their worth in this space and that led to Microsoft and Oracle adopting those technologies, while Aster Data led with MapReduce-style distributed UDFs and analytics, which Teradata just simply bought up in whole.

In other words, the titans in the data market finally felt enough pressure from their core target audiences that Hadoop was coming out of the shadows and Silicon Valley to threaten their data warehouse market share that you will now hear these sorts of slogans from traditional data warehouses:

Oraclehttp://www.oracle.com/us/technologies/big-data/index.html. Oracle lists different products for dealing with different “Big Data” problems: acquire, organize and analyze. The product page lists the Oracle Big Data Appliance, Exadata and Advanced Analytics as just a few products for those traditional data warehouse problems. Yikes.

Teradata: In the world of traditional DWs, Teradata is the Godfather and pioneered many of the concepts that we are talking about today for Big Data Analytics and Big Data DWs. But Aster Data is still a separate technology and technology group under Teradata and sometimes they step on their own messaging by forcing their EDW database products into the same “Big Data” space as Aster Data: http://www.prnewswire.com/news-releases/latest-teradata-database-release-supports-big-data-and-the-convergence-of-advanced-analytics-105674593.html.

But the fact remains that “Hadoop” is still seen as synonymous with “Big Data” and the traditional DW platforms had been used in many of those same scenarios for decades. Hadoop has been seen as an alternative means to provide Big Data Analaytics at a lower cost per scale. Just adding Hadoop to an Oracle Exadata installation, for example, doesn’t solve that problem for customers outside of the original NoSQL and Hadoop community: Yahoo, Google, Amazon, etc.

So what are your criteria for a database data warehouse to qualify as a “Big Data Warehouse”? Here are a few for me that I use:

  1. MPP scale-out nodes
  2. Column-oriented compression and data stores
  3. Distributed programming framework (i.e. MapReduce)
  4. In-memory options
  5. Built-in analytics
  6. Parallel and fast-load data loading options

To me, the “pure-play” Big Data Analytics “warehouses” are: Vertica (HP), Greenplum (EMC) and Aster (Teradata). But the next-generation of platforms that will include improved distributed access & programming, better than today’s MapReduce and Hive, will be Microsoft with PDW & Polybase, Teradata’s appliance with Aster & SQL-H and Cloudera’s Impala, if you like Open Source Software.

Monitor SQL Server CDC with Customer Data Collector

I’ve posted several times over the years here @MSSQLDUDE on the benefits of SQL Server Management Data Warehouse (MDW) for baselining your SQL Server performance and sending those measurements to a central server for monitoring. MDW makes use of the SQL Server Data Collector feature and is a feature you get out of the box. Here are some of my writings on MDW here @MSSQLDUDE: http://en.search.wordpress.com/?q=mdw&site=mssqldude.wordpress.com.

There is a debate among SQL Server DBAs about the worthiness of MDW and Data Collectors because they can appear to be a “heavy burden” on your server and provide overhead that you may not wish to occur on your production server. That being said, the way things work in our world and jobs in databases is a give and take with performance, monitoring, scalability and providing functionality to our business users & customers to allow them to do their jobs. Any monitoring and baselining product incur a small performance cost to your servers. There are a number of options out there in the market and many DBAs write their own scripts to store DMV data into tables and files to provide similar functionality.

What I wanted to point you all to is something that I think proves the value of Data Collectors (and MDW) through extending the functionality with custom data collectors.

I’ve been working on a CDC project recently in a large BI/DW scenario and one thing that you notice when working on CDC instead of longer-run SQL Server features like Replication, is that there is not much support in SSMS or other built-in SQL Server tools for CDC.

But what is really nice is this custom data collector for CDC that is included on MSDN from this terrific article onMonitoring the Change Data Capture Process. Scroll down to the Data Collector portion and the code to paste into SSMS is right there for you to create and start your data collector. This is a great way to monitor your CDC log processes.

BTW, you have to make an update to the XML in code from MSDN … Look for the SELECT @parameters code and replace the entire SELECT … XML statement with this:

SELECT @paramters = CONVERT(xml, 
    N'<ns:TSQLQueryCollector xmlns:ns="DataCollectorType">
        <Query>
          <Value>SELECT * FROM UserDB.sys.dm_cdc_log_scan_sessions</Value>
          <OutputTable>cdc_log_scan_data</OutputTable>
        </Query>
      </ns:TSQLQueryCollector>');

Enjoy! Best, Mark

Big Data and the Telecom World

The complicated world of telecommunications analytics continues to be a primary driver behind complex data analytics solutions and I find it mentioned time and time again in Big Data use cases and scenarios.

Those of us who have lived in this world for years will probably agree with me that we’ve been pioneers in “Big Data techniques” ever since we were asked to build CDR (call detail record) data warehouses. My first CDR solution was for customer service and marketing at AT&T in the 1990s. We used Oracle for the DW and hired PhD statisticians to build models for predictive analytics on top of that data.

The marketing department was able to utilize that data to better understand customer patterns of usage and make data-driven decisions about how to package subscriptions and products. The call center team used the analytics from our cubes for market basket and association algorithms that provided reps with the ability to cross-sell to customers, which was also used by sales for up-sell opportunities to corporate accounts.

Then there is also the mass amounts of streaming data coming from network equipment which was used by engineering and the NOC for troubleshooting, predicting outages and tuning the network. Rules for correlation, thresholds and root-cause were needed to make sense of the 1000s of events/sec and not overwhelm systems and dashboards.

Does that sound familiar to today’s “Big Data use cases”? It should. We used to call these techniques CEP (complex event processing) and VLDB (very large databases). Really, at the end of the day, what this meant was that our DBAs, architects and developers needed to think about scale and distributed architectures to account for the scale that we were dealing with.

Today, it is a nice evolution of these techniques to see Hadoop, sharded databases, NoSQL and in-database analytics providing packaged, easier ways to process and manage systems of TB & PB scale.

Essentially, what this means is that these techniques now become available to all IT departments and examples like the churn & customer analytics (the holy grail of telcos is churn management) solutions become better, faster with improved data sampling because of new, emerging Big Data technologies.

I found this story on the Internet by Harish Vadada from Telecom Cloud here. It talks about T-Mobile with databases like Oracle & SQL Server using Big Data technologies such as Hadoop, to improve the delivery of customer & churn analytics to drive both the bottom-line and top-line of their business. Very impressive and spot-on to what I am saying here in this post.

Cheers! Mark