Convert SSAS Snowflake Schemas to Pentaho Mondrian

Last weekend at a local tech conference, I was speaking to a few of my friends in the SQL Server & Microsoft BI consulting business about migrating cubes and BI solutions from SSAS to Pentaho using the Mondrian ROLAP engine with tools like ssas2mondrian. I explain different methods of starting over and generating new OLAP models using Pentaho’s auto modeler and Analyzer here and how to start with migrating simple SSAS cubes to Mondrian ROLAP schemas here.

So I felt like it would be a good challenge to see if I could convert a more complex SSAS cube to a Mondrian model that included Reference Dimensions in SSAS and convert those to a snowflake model in Mondrian using JOINs in dimension definitions.

Additionally, if your Microsoft BI solution is using Excel in SharePoint for visualizations (Excel is widely used for business intelligence), you will want to use the Pentaho visualizations to connect to the Mondrian ROLAP engine after migration. I think I’ve got this most of the way there, so I thought I’d kick it out in a blog post here.

The best way to begin the migration process is to use the ssas2mondrian utility that I put here on Box as a compiled EXE, or grab the source code on GITHUB and have at it. I would love to get some community additions to grow this migration capability.

The Adventure Works sample data set includes snowflake dimension designs, but I felt like working on a different data set this time. So I used the Mondrian sample Foodmart data set this time and made a simple cube in SSAS that included the dimension-to-dimension join to represent a snowflake schema with the Product Class dimension to Product dimension, along with Regular dimension relationships to fill-out just a simple, minimal cube:

ssas2_5

After deploying that cube to the SSAS server, Microsoft BI consultants will want to put visualizations for end-users in Excel, SharePoint and SSRS that will look something like this:

ssas2_1

This is a simple Excel representation of sales from the Foodmart fact table sliced up by Product Family, which is from a leaf-level dimension attribute coming from the Product Class dimension table, represented in SSAS as a Reference Attribute and as a snowflake join in Mondrian.

Now let’s convert that to Pentaho …

First, I’ll run the ssas2mondrian utility to build out the base model, which takes the online cube from my SSAS database and generates Mondrian schema XML file output which I’ll redirect to a local file on my filesystem from a Windows PowerShell (DOS prompt will work just fine, too):

ssas2_3

Ok, let’s open this up and see what we get in PSW (Pentaho Schema Workbench):

ssas2_6

Notice that ssas2mondrian has converted measure groups to virtual cubes (no support yet for measure groups in Mondrian 3.x), assumptions have been made about key columns for dimensions (you can’t have composite keys for dimensions in Mondrian 3.x), hierarchies were generated and physical cubes were generated from fact tables. Check the help (ssas2mondrian /help) in the utility for other options when migrating from SSAS to Mondrian.

To fix the red Xs in the schema, we need to clean-up a few things. In this case, PSW did not like the table names because Mondrian is a ROLAP engine and will issue SQL queries converted from MDX from a client tool. I did used SELECT statements from SSAS because I kept my Foodmart database in MySQL as the source for this SSAS schema. So I cleaned-up those names and removed a few levels in the hierarchies that didn’t make sense for these sample reports to get to a clean schema check:

ssas2_4

That’s better. Now publish the Mondrian schema to your Pentaho Business Analytics server and you’re ready to start building Analyzer reports like this pie which is an OLAP query sending MDX to Mondrian using the snowflake join dimension of Product Family:

ssas2_2

Let’s do a quick solution check: we’ve migrated the cube model from SSAS to Pentaho Mondrian, the report from Excel to Pentaho Analyzer and now let’s build a dashboard to replace SharePoint … For more info on building Pentaho Dashboards, check out this YouTube video.

ssas2_7

There ya have it. Now, admittedly, I used a set of controlled samples that pretty much led me to a successful migration in a snowflake model. That begin said, I am hopeful that these instructions, the demo and the ssas2mondrian utility help to provide you with the means to get started with a migration.

BTW, I’ve posted the SSAS & Mondrian schemas here if you’d like to download them to play around with.

Adventure Works SQL DW on Pentaho OLAP, part 2

In Part 1, I built a very simple rudimentary OLAP model for reporting in Pentaho’s Business Analytics suite by pointing the Web-based auto modeling tool to my SQL Server Adventure Works data warehouse.

In part 2, I am going to show you how to use the OLAP Thin Modeler and Analyzer tool to modify that base auto-generated model and to create custom calculations from Adventure Works.

I’m going to start in the Pentaho User Console and I’m going to edit the model that was generated for AdventureWorks in Part 1 by selecting “Manage Existing”

aw3

In the subsequent Thin Model Editor screens, you will be able to do a number of operations on the cube model that you may be familiar with if you are developed SSAS cubes in the past and had to modify data source views, dimensions, measures, calculations, etc. What I am going to do is to take out some of the fields that will just cause confusion and noise for the end Analyzer user and to make sure that I have the proper formatting and properties set on the most important and common reporting fields in the model.

aw10

 

You may notice that the default auto-generated OLAP model contains Dimension Levels that may have been created with assumptions that don’t fit your business needs. This is where you will go in there to clean those up, modify hierarchies, etc. In my case, I renamed CalendarYear to just “Calendar” and dragged in Year, Quarter, Semester to build a hierarchy to aid users in navigating the data.

Now when a user builds Analyzer reports from my model, they will have a much cleaner logical model to work from. Make sure that you also use that last step to clean up names and make the entity names user-friendly with business entity desciptors, not database column names!

Next, create an Analyzer report from this new AdventureWorks cube model. I’ve dragged over UnitPrice as m measure into the pivot table and I’m going to slice the data my Model Name. But I also want to see the total per member of the total amount, which is Unit Price + Tax Amount. To do this, i made a custom measure from Analyzer with a right-click on the table header and enter this simple formula into the edit box: [UnitPrice] + [TaxAmt]”. Any valid MDX statement will work here to make calculated measures, i.e. ([Measures].[Unit Sales], Ancestor([Product].CurrentMember, [Product].[Product Family]))”.

aw11

 

And here’s what that looks like as a chart with a bar for the UnitPrice from the AventureWorks DW and my new OLAP Calculated Measure for Total Price:

aw12

 

Next up in the Part 3 finale of the series: We’ll take that cube model outside of the Analyzer / Auto Modeler realm and build it out even further and test it with our own MDX queries directly.

 

2011 End-of-year Microsoft BI Wrap-up

Welcome to the final week of 2011 everyone!

I have a couple of really quick loose ends to tie up here in the blogosphere for Microsoft Business Intelligence. Ever since Microsoft acquired Proclarity in 2006, they have shown a real commitment throughout the organization to business intelligence. Microsoft’s investments and focus on democratization of BI and BI for the masses has been a huge success, particularly with self-service for business decision makers and the decision to make BI accessible to Excel and business users.

So, that being said, there are just 3 topics that I want to close up on in my MSSQLDUDE blog this year:

  1. A few friends of mine here in the Microsoft SQL Server & BI field on the east coast here in the U.S. have started a YouTube channel where we are recording our monthly lunch ‘n learn sessions, with a focus on BI & DW: http://www.youtube.com/user/MSFTSQLBIEASTEPG
  2. I have written about the new Enterprise Information Management (EIM) capabilities that SQL Server 2012 is exposing and expanding at blogs such as SearchSQLServer, MSSQLDUDE and SQL Pro BI Blog. I can remember pushing hard for data quality, MDM and expanding those capabilities into ETL through SSIS back in a series of meetings in Redmond in 2007. That is all coming into place now in the SQL Server product and coming to market finally. I wanted to point you all to a nice series of videos to explain how to get started with the DQS product coming in SQL Server 2012 to provide data quality services to your BI solutions here.
  3. Lastly, I wrote a quick blog with an intro to the new Silverlight Power View BI reporting and data exploration tool in SQL Server 2012 here. Unfortunately, the image links do not seem to be working, so I’ve included the thumbnails to th screenshots of Power View and the SharePoint BISM data connection page below. Notice the awesome Silverlight-based data visualizations that you can now generate out-of-the-box with Power View, which is part of SQL Server Reporting Service in SQL Server 2012. You can try this now, today, with the SQL Server 2012 RC0 download.

Happy New Year!!

Let’s Start with Change Data Capture

Change Data Capture (CDC) is a capability that SQL Server added starting with SQL Server 2008. It is a capability that other databases such as Oracle and 3rd party providers of general CDC behavior. In the past, many database administrators would use SQL Server’s transactional or snapshot replication to achieve a similar data refresh and data capture functionality in data warehouse systems.

CDC is very powerful and with the capabilities of CDC in SQL Server 2008, it is much easier to keep applications such as data warehouse and business intelligence systems refreshed with recent transactional system changes. If you were to use CDC in an SSIS pipeline, for example, you could query for the changed rows and perform transformations on that updated dataset.

I also started the blog off with CDC because I previously wrote an article for SearchSQLServer that I can just point you to for more insights here!

But there is also a much more lengthy, and very good, article in MS SQL Tips here that shows the process in its entirety as part of an SSIS package. Check it out.