AdventureWorks DW on Pentaho, part 3: Geo Mapping AdventureWorks

In the previous parts of this series, I’ve showed you how to make quick, auto-modeled OLAP cubes and reports on a SQL Server AdventureWorks data warehouse as well as how to take your existing SSAS cubes on top of those SQL Server DWs and turn them into Mondrian cubes, which runs natively inside of the Pentaho Business Analytics suite.

Here in part 3, I am going to take the Analytics a step futher by using Geo Maps on top of the SQL Server Adventure Works data warehouse by modeling the ROLAP schema manually instead of with the Pentaho auto-modeler. And since I’m finally upgraded to the latest 5.0 version of the Pentaho Suite. you may notice a few UI differences and workflow changes in the process. To follow along, make sure you download the latest & greatest Pentaho 5.0 product here.

Instead of starting the analysis process from the Pentaho User Console or Analyzer, we will build a model first as an XML document and we’ll use the Pentaho cube design tool, Pentaho Schema Workbench:


You can download the complete sample XML schema that I built which you use in Pentaho Mondrian and Schema Workbench to connect to your SQL Server AdventureWorks Data Warehouse database to try this out on your own Pentaho Business Analytics suite. What I typically do when building a manual Mondrian schema is to start-up the Pentaho Schema Workbench (PSW) client and build out the primary parts of the model such as the measures, dimensions, calculations, etc. Then save the schema file as an XML file and edit it with an XML editor such as Visual Studio, Eclipse or Notepad++.

The reason why we are following this workflow for this example is do to the nature of the SQL Server sample data warehouse, Adventure Works (AW). AW makes use of a number of “snowflake” schema joins between dimension tables. The auto-modeler from Analyzer and PUC in Pentaho support only “star” schemas, so I write the model using PSW and manual XML to join the Customer dimension with the Geography dimension and the Customer key back to the customer key in the Fact table, which is FactInternetSales in this model. This will provide Analyzer with an MDX query to resolve the geography from the Fact table and graph it on the map using the hierarchy that I’ve defined below as Country, State, City, Postal Code:

<Dimension type=”StandardDimension” visible=”true” name=”Geo”>
<Hierarchy name=”default” visible=”true” hasAll=”true” primaryKey=”CustomerKey” primaryKeyTable=”c”>
<Join leftAlias=”c” leftKey=”GeographyKey” rightAlias=”g” rightKey=”GeographyKey”>
<Table name=”DimCustomer” schema=”dbo” alias=”c”>
<Table name=”DimGeography” schema=”dbo” alias=”g”>
<Level name=”Country” visible=”true” table=”g” column=”CountryRegionCode” uniqueMembers=”false”>
<Level name=”State” visible=”true” table=”g” column=”StateProvinceName” uniqueMembers=”false”>
<Level name=”City” visible=”true” table=”g” column=”City” uniqueMembers=”false”>
<Level name=”PostalCode” visible=”true” table=”g” column=”PostalCode” uniqueMembers=”false”>

. . .

<Cube name=”SalesGeo” visible=”true” cache=”true” enabled=”true”>
<Table name=”FactInternetSales” schema=”dbo”>
<DimensionUsage source=”Geo” name=”Geo” visible=”true” foreignKey=”CustomerKey”>
<Measure name=”TotalSales” column=”SalesAmount” datatype=”Numeric” aggregator=”sum” visible=”true”>
<Measure name=”NumOfCusts” column=”CustomerKey” aggregator=”distinct-count” visible=”true”>

Now that we have a functioning schema, publish it to the Pentaho BA Server either from Schema Workbench under File >> Publish, or through the Import mechanism in the Pentaho website. This will make the schema available to both the Mondrian ROLAP engine and to your report designers who are using Analyzer for OLAP queries and reports:



I’ll select the AWGeo model and now we’re in the familiar Pentaho Analyzer experience where I can grab the measures that I built in the model from AdventureWorks for total sales and distinct count of unique customers:



To put this into a Geo Map, I’ll remove the levels from the Geography dimension and leave just the top-level Country level as the Geo Dimension on the design surface. This way, we’ll drill-down from highest level down to finest grain in the map. Select Geo Map from the visualization drop-down menu at the top-right of the Analyzer screen … Now inside the map, I can drill to detail based on my 2 measures that are defining the circle size and color:

p5_02 p5_03


There ya have it … SQL Server 2012 data warehouse serving data to the Pentaho ROLAP engine (Mondrian) with a defined schema that models the Geography dimension for customers of Adventure Works with total sales and unique customers.


Adventure Works SQL DW on Pentaho OLAP, part 2

In Part 1, I built a very simple rudimentary OLAP model for reporting in Pentaho’s Business Analytics suite by pointing the Web-based auto modeling tool to my SQL Server Adventure Works data warehouse.

In part 2, I am going to show you how to use the OLAP Thin Modeler and Analyzer tool to modify that base auto-generated model and to create custom calculations from Adventure Works.

I’m going to start in the Pentaho User Console and I’m going to edit the model that was generated for AdventureWorks in Part 1 by selecting “Manage Existing”


In the subsequent Thin Model Editor screens, you will be able to do a number of operations on the cube model that you may be familiar with if you are developed SSAS cubes in the past and had to modify data source views, dimensions, measures, calculations, etc. What I am going to do is to take out some of the fields that will just cause confusion and noise for the end Analyzer user and to make sure that I have the proper formatting and properties set on the most important and common reporting fields in the model.



You may notice that the default auto-generated OLAP model contains Dimension Levels that may have been created with assumptions that don’t fit your business needs. This is where you will go in there to clean those up, modify hierarchies, etc. In my case, I renamed CalendarYear to just “Calendar” and dragged in Year, Quarter, Semester to build a hierarchy to aid users in navigating the data.

Now when a user builds Analyzer reports from my model, they will have a much cleaner logical model to work from. Make sure that you also use that last step to clean up names and make the entity names user-friendly with business entity desciptors, not database column names!

Next, create an Analyzer report from this new AdventureWorks cube model. I’ve dragged over UnitPrice as m measure into the pivot table and I’m going to slice the data my Model Name. But I also want to see the total per member of the total amount, which is Unit Price + Tax Amount. To do this, i made a custom measure from Analyzer with a right-click on the table header and enter this simple formula into the edit box: [UnitPrice] + [TaxAmt]”. Any valid MDX statement will work here to make calculated measures, i.e. ([Measures].[Unit Sales], Ancestor([Product].CurrentMember, [Product].[Product Family]))”.



And here’s what that looks like as a chart with a bar for the UnitPrice from the AventureWorks DW and my new OLAP Calculated Measure for Total Price:



Next up in the Part 3 finale of the series: We’ll take that cube model outside of the Analyzer / Auto Modeler realm and build it out even further and test it with our own MDX queries directly.