Convert SSAS Snowflake Schemas to Pentaho Mondrian

Last weekend at a local tech conference, I was speaking to a few of my friends in the SQL Server & Microsoft BI consulting business about migrating cubes and BI solutions from SSAS to Pentaho using the Mondrian ROLAP engine with tools like ssas2mondrian. I explain different methods of starting over and generating new OLAP models using Pentaho’s auto modeler and Analyzer here and how to start with migrating simple SSAS cubes to Mondrian ROLAP schemas here.

So I felt like it would be a good challenge to see if I could convert a more complex SSAS cube to a Mondrian model that included Reference Dimensions in SSAS and convert those to a snowflake model in Mondrian using JOINs in dimension definitions.

Additionally, if your Microsoft BI solution is using Excel in SharePoint for visualizations (Excel is widely used for business intelligence), you will want to use the Pentaho visualizations to connect to the Mondrian ROLAP engine after migration. I think I’ve got this most of the way there, so I thought I’d kick it out in a blog post here.

The best way to begin the migration process is to use the ssas2mondrian utility that I put here on Box as a compiled EXE, or grab the source code on GITHUB and have at it. I would love to get some community additions to grow this migration capability.

The Adventure Works sample data set includes snowflake dimension designs, but I felt like working on a different data set this time. So I used the Mondrian sample Foodmart data set this time and made a simple cube in SSAS that included the dimension-to-dimension join to represent a snowflake schema with the Product Class dimension to Product dimension, along with Regular dimension relationships to fill-out just a simple, minimal cube:

ssas2_5

After deploying that cube to the SSAS server, Microsoft BI consultants will want to put visualizations for end-users in Excel, SharePoint and SSRS that will look something like this:

ssas2_1

This is a simple Excel representation of sales from the Foodmart fact table sliced up by Product Family, which is from a leaf-level dimension attribute coming from the Product Class dimension table, represented in SSAS as a Reference Attribute and as a snowflake join in Mondrian.

Now let’s convert that to Pentaho …

First, I’ll run the ssas2mondrian utility to build out the base model, which takes the online cube from my SSAS database and generates Mondrian schema XML file output which I’ll redirect to a local file on my filesystem from a Windows PowerShell (DOS prompt will work just fine, too):

ssas2_3

Ok, let’s open this up and see what we get in PSW (Pentaho Schema Workbench):

ssas2_6

Notice that ssas2mondrian has converted measure groups to virtual cubes (no support yet for measure groups in Mondrian 3.x), assumptions have been made about key columns for dimensions (you can’t have composite keys for dimensions in Mondrian 3.x), hierarchies were generated and physical cubes were generated from fact tables. Check the help (ssas2mondrian /help) in the utility for other options when migrating from SSAS to Mondrian.

To fix the red Xs in the schema, we need to clean-up a few things. In this case, PSW did not like the table names because Mondrian is a ROLAP engine and will issue SQL queries converted from MDX from a client tool. I did used SELECT statements from SSAS because I kept my Foodmart database in MySQL as the source for this SSAS schema. So I cleaned-up those names and removed a few levels in the hierarchies that didn’t make sense for these sample reports to get to a clean schema check:

ssas2_4

That’s better. Now publish the Mondrian schema to your Pentaho Business Analytics server and you’re ready to start building Analyzer reports like this pie which is an OLAP query sending MDX to Mondrian using the snowflake join dimension of Product Family:

ssas2_2

Let’s do a quick solution check: we’ve migrated the cube model from SSAS to Pentaho Mondrian, the report from Excel to Pentaho Analyzer and now let’s build a dashboard to replace SharePoint … For more info on building Pentaho Dashboards, check out this YouTube video.

ssas2_7

There ya have it. Now, admittedly, I used a set of controlled samples that pretty much led me to a successful migration in a snowflake model. That begin said, I am hopeful that these instructions, the demo and the ssas2mondrian utility help to provide you with the means to get started with a migration.

BTW, I’ve posted the SSAS & Mondrian schemas here if you’d like to download them to play around with.

AdventureWorks DW on Pentaho, part 3: Geo Mapping AdventureWorks

In the previous parts of this series, I’ve showed you how to make quick, auto-modeled OLAP cubes and reports on a SQL Server AdventureWorks data warehouse as well as how to take your existing SSAS cubes on top of those SQL Server DWs and turn them into Mondrian cubes, which runs natively inside of the Pentaho Business Analytics suite.

Here in part 3, I am going to take the Analytics a step futher by using Geo Maps on top of the SQL Server Adventure Works data warehouse by modeling the ROLAP schema manually instead of with the Pentaho auto-modeler. And since I’m finally upgraded to the latest 5.0 version of the Pentaho Suite. you may notice a few UI differences and workflow changes in the process. To follow along, make sure you download the latest & greatest Pentaho 5.0 product here.

Instead of starting the analysis process from the Pentaho User Console or Analyzer, we will build a model first as an XML document and we’ll use the Pentaho cube design tool, Pentaho Schema Workbench:

p5_04

You can download the complete sample XML schema that I built which you use in Pentaho Mondrian and Schema Workbench to connect to your SQL Server AdventureWorks Data Warehouse database to try this out on your own Pentaho Business Analytics suite. What I typically do when building a manual Mondrian schema is to start-up the Pentaho Schema Workbench (PSW) client and build out the primary parts of the model such as the measures, dimensions, calculations, etc. Then save the schema file as an XML file and edit it with an XML editor such as Visual Studio, Eclipse or Notepad++.

The reason why we are following this workflow for this example is do to the nature of the SQL Server sample data warehouse, Adventure Works (AW). AW makes use of a number of “snowflake” schema joins between dimension tables. The auto-modeler from Analyzer and PUC in Pentaho support only “star” schemas, so I write the model using PSW and manual XML to join the Customer dimension with the Geography dimension and the Customer key back to the customer key in the Fact table, which is FactInternetSales in this model. This will provide Analyzer with an MDX query to resolve the geography from the Fact table and graph it on the map using the hierarchy that I’ve defined below as Country, State, City, Postal Code:

<Dimension type=”StandardDimension” visible=”true” name=”Geo”>
<Hierarchy name=”default” visible=”true” hasAll=”true” primaryKey=”CustomerKey” primaryKeyTable=”c”>
<Join leftAlias=”c” leftKey=”GeographyKey” rightAlias=”g” rightKey=”GeographyKey”>
<Table name=”DimCustomer” schema=”dbo” alias=”c”>
</Table>
<Table name=”DimGeography” schema=”dbo” alias=”g”>
</Table>
</Join>
<Level name=”Country” visible=”true” table=”g” column=”CountryRegionCode” uniqueMembers=”false”>
</Level>
<Level name=”State” visible=”true” table=”g” column=”StateProvinceName” uniqueMembers=”false”>
</Level>
<Level name=”City” visible=”true” table=”g” column=”City” uniqueMembers=”false”>
</Level>
<Level name=”PostalCode” visible=”true” table=”g” column=”PostalCode” uniqueMembers=”false”>
</Level>
</Hierarchy>
</Dimension>

. . .

<Cube name=”SalesGeo” visible=”true” cache=”true” enabled=”true”>
<Table name=”FactInternetSales” schema=”dbo”>
</Table>
<DimensionUsage source=”Geo” name=”Geo” visible=”true” foreignKey=”CustomerKey”>
</DimensionUsage>
<Measure name=”TotalSales” column=”SalesAmount” datatype=”Numeric” aggregator=”sum” visible=”true”>
</Measure>
<Measure name=”NumOfCusts” column=”CustomerKey” aggregator=”distinct-count” visible=”true”>
</Measure>
</Cube>

Now that we have a functioning schema, publish it to the Pentaho BA Server either from Schema Workbench under File >> Publish, or through the Import mechanism in the Pentaho website. This will make the schema available to both the Mondrian ROLAP engine and to your report designers who are using Analyzer for OLAP queries and reports:

p5_05

 

I’ll select the AWGeo model and now we’re in the familiar Pentaho Analyzer experience where I can grab the measures that I built in the model from AdventureWorks for total sales and distinct count of unique customers:

p5_01

 

To put this into a Geo Map, I’ll remove the levels from the Geography dimension and leave just the top-level Country level as the Geo Dimension on the design surface. This way, we’ll drill-down from highest level down to finest grain in the map. Select Geo Map from the visualization drop-down menu at the top-right of the Analyzer screen … Now inside the map, I can drill to detail based on my 2 measures that are defining the circle size and color:

p5_02 p5_03

 

There ya have it … SQL Server 2012 data warehouse serving data to the Pentaho ROLAP engine (Mondrian) with a defined schema that models the Geography dimension for customers of Adventure Works with total sales and unique customers.

 

Adventure Works SQL DW on Pentaho OLAP, part 1

Let’s say you have SQL Server databases for your data marts and warehouses in your organization, but you are looking for alternatives to SSAS cubes and SharePoint-based dashboards. If you are like me and have built many BI solutions in SSAS OLAP and PPS, then you’ll find a transition to OSS with Pentaho to be very easy and a natural fit for those BI scenarios. We’re going to take this SQL Server 2012 DW and SSAS cube and recreate it all in Pentaho:

aw1aw2

I’m going to kick this series off with a very simple BI dashboard using the traditional SQL Server Adventure Works data warehouse data set and put a Pentaho ROLAP cube and dashboard on top of that data. What I think you’ll really be impressed with is how much easier it is than building cubes in Visual Studio and PPS dashboards.

Start by going to Pentaho.com to download an evaluation version of the Pentaho Business Analytics suite and run the installer. We’re going to stay completely in a thin client browser experience for this demo, so no need to open any IDE tools through this entire workflow.

I’m using 4.8.2 of the Pentaho suite for this series and so when I log into the portal, I will use the Analysis feature to point to the SQL Server database and auto-generate the OLAP cube as well as design the visualizations. This is also where I can put my reports and analysis views together for end-user dashboards:

aw3

 

When you click to create new Analysis content, all you have to do to recreate a cube in the Pentaho Mondrian OLAP engine is to point to your DW database source and the engine will auto generate the model for you. So to replace the SSAS cube I have above, just follow the flow of these next 3 steps as I point to SQL Server, select my facts and dimension tables and create the star joins. This is a very simple beginning model which we will use for the rest of this series to build upon with custom calculations and other features. In classic AdventureWorks mode, I will name InternetSales as my fact table and let’s keep things simple and straightforward for this first intro to Pentaho modeling and just join in the Date and Product dimension tables. Just like SSAS, Mondrian will treat facts as measures and dimension tables as dimensional hierarchies and aggregation levels from your data warehouse. Much, much more on this in the coming weeks …

aw4

 

aw5aw6

 

That’s all you have to do. Now you will select the model which will show up in your list as the name that you typed in from the screenshot above which I called simply “AdventureWorks”. Selecting that model will drop you into the interactive Analyzer reporting tool where you will see a field list to make pivot tables and data visualizations which you can see below. Think of that last step as running through the SSAS cube wizard in Visual Studio with data source views and publishing the cube on the SSAS server. This can be done in a much smaller number of steps here in Pentaho.

aw8

 

Remember that this is using ROLAP for the cube engine (Pentaho’s Mondrian), so in the upcoming parts of the series, I’ll talk about optimizing and customizing the logical OLAP models so that these reports can perform well for you. Your final step is to put your reports together as an interactive dashboard. In upcoming parts of the series, we’ll make the dashboard fully interactive as well. For now, just pick a couple of reports that you’ve generated from the AdventureWorks model and drop them onto the Dashboard Designer surface:

aw9

 

That’s pretty much it. We’ve recreated the OLAP cubes and reports and put them into a dashboard all in a browser using Pentaho with the SQL Server database source, replacing SSAS and SharePoint PPS. I know it was quick & brief, so think of this first part as just a teaser. We’ll continue to build this out with more features so that you can have some guidance when trying out Pentaho’s BI Suite.

As always, just reach out to me with any questions or requests. Enjoy! Br, Mark

FYI, if you need to download the SQL Server sample databases above, you can go here on Microsoft’s Codeplex site to get Adventureworks.

 

PSSUG November 2012 Presentation: Big Data with SQL Server

Thank you all for coming out on a rainy, snowy, cold evening to join us for this month’s PSSUG meeting!

Here is a link to the slides that I used tonight during my presentation of Big Data with SQL Server and Hadoop demos: http://www.slideshare.net/kromerm/pssug-nov-2012-big-data-with-sql-server.

Br, Mark

What Does the New SQL 2012 BI Environment Look Like?

This is sort of a continuation of my recent tool by tool exploration of the SQL Server BI ecosystem in SQL Server 2012 that I’ve been writing about for SQL Server Pro Magazine here and here.

For years, I’ve carried around with me many different high-level data flow diagrams of what and end-to-end BI solution using the Microsoft stack would look like. Come to think of it, I was able to use essentially the same diagram in SQL Server 2005 and SQL Server 2008. Some of the rendering tools changed like Proclarity, PerformancePoint and SharePoint added more BI features. But there was always SSRS, SSIS and SSAS, so I would use something like this below:

When SQL Server 2008 R2 came to market and introduced PowerPivot, I still stuck with this general architecture because PP was still on the uptake / heavy-lift portion of the curve and the majority of production-ready BI solutions were using SSAS for the semantic modeling and cube building.

SQL Server 2012 has changed the game enough such that I’ve started a new data flow diagram in Visio, albeit not as detailed or fancy as the one that I show above.  A big reason for that is (1) I just created this new diagram this week! And (2) it has to evolve over time. As SQL Server 2012 BI solutions using Tabular Model databases and techniques becomes more mature and builds up a larger set of best practices and lessons-learned, then I will update these diagrams and share them here on my blog as well as over at SQL Server Pro Mag.

Now that SQL Server 2012 fully embeds and supports columnar compression through the Vertipaq engine in SSAS, you can build semantic models with Visual Studio or with PowerPivot. To use the Power View visualizations such as I am depicting in this diagram, you will need to have a BI Semantic Model, so I’m now shifting to this guidance in many cases. Using PowerPivot for data modeling, IMO, is very advantageous because it expands the data analyst community to Excel users and data experts and allows for easy trail-and-error style of data modeling whereby Excel becomes the design surface to test your models through Pivot reports.

SQL Saturday Philly: SSAS vs PowerPivot Cubes

Thank you to all of those who joined me for my Philly SQL Saturday session on building SQL Server cubes in SSAS, PowerPivot and Denali. We can use this thread for follow-up questions, discussions and links related to the topic.

First, you can download the slide deck that I used here. Second, if you want to try creating a simple baseline starting-point cube by hand using SSAS like I did in the session, I highly recommend this post here by one of my favorite, highly-recognized leaders in training Microsoft BI users, Stacia Misner. You can walk through her step-by-step instructions in a fairly short amount of time. And what I am certain that you will find is that once you’ve gone through the process of building a cube, focusing on the basics, that you will find it is not very difficult at all. After you’ve built your BI solution proof, then go back and partition the cube, adjust aggregation designs, add actions, etc. You do not need to use those cool SSAS features and most importantly, you really do not need to know MDX to get a cube started and published out to your business users.

Ok, that transitions nicely into the PowerPivot portion of the talk yesterday. To review, PowerPivot is new with SQL Server 2008 R2 and requires Excel 2010. If you wish to try out PowerPivot, all you need are the evaluation version of Excel 2010 and the PowerPivot add in, if you are not already using Excel 2010. In order to truely experience the collaboration nature of PowerPivot models, though, you will need SharePoint 2010 with SQL Server 2008 R2 to persist the in-memory cubes as they are hydrated by SharePoint.

The sample PowerPivot that I used which gives you the exact same results as the SSAS cube that I built is available as a download here. PowerPivot makes it even easier to build solutions than the cube method that I demo’d with SSAS because the cube is processed immediately in memory and is abstracted away from the model power-user. But it has many limitations in V1 that you need to make your decisions based on which Microsoft technology to use for your BI solution based on a knowledge and understanding of the impact of those limitations. For example, you cannot build explicit hierarchies in PowerPivot like you can in SSAS. The role-based security is not available in the Excel version like you have in SQL Server and the advanced capabilities outside of what I showed yesterday such as partitioning or many-to-many are not available to you.

Now, you may not need those for your solution. And as I stated several times on Saturday, I am a true believer in smaller departmental cubes as opposed to mondo big enterprise cubes. So PowerPivot will work nicely in those circumstances. Go to http://www.powerpivot.com for more examples and tutorials.

Lastly, we did not get a chance to talk about this too much, but the SQL Server Denali CTP 2, which will be made available publicly soon, will give you a first-glance chance to preview and try out the new BIDS environment which is a convergence of the PowerPivot and SSAS technologies. What this means is that if you are intimidated by MDX, you can build cubes in BIDS with DAX instead. DAX is the PowerPivot scripting language and uses Excel syntax and is much more familiar to users of Excel functions that you find in statistics, math and finance. This makes sense to me because the primary users of business intelligence are, after all, experts in math, stats & finance.

But if you are really good at making fast, efficient, complicated cubes in SSAS, you can still use the UDM model, use MDX and there will be new features add for those developers as well. But you won’t get the quick, immediate feedback on your cube through the Vertipaq technology that PowerPivot brings to the table. Instead, you have to process your cubes and worry about storage techniques as you do now. I think of this direction in SQL Server as bringing the power of the PowerPivot compression engine and in-memory analysis server to all BI developers. Very powerful stuff.

Probably the best source of information about this direction in SSAS future is directly from the SSAS product team on their blog, which you can read here.

I probably did not hit all areas in this blog post that everyone wanted as follow-up from the session. So please send in your comments and feedback and we’ll keep the thread going. My goal here is to see you all using Microsoft SQL Server in one fashion or another for your BI solutions. Let me know what I can do to help. And your feedback on the session, presentaiton, contents, etc. is also greatly appreciated.

Thanks & best! Mark