AdventureWorks DW on Pentaho, part 3: Geo Mapping AdventureWorks

In the previous parts of this series, I’ve showed you how to make quick, auto-modeled OLAP cubes and reports on a SQL Server AdventureWorks data warehouse as well as how to take your existing SSAS cubes on top of those SQL Server DWs and turn them into Mondrian cubes, which runs natively inside of the Pentaho Business Analytics suite.

Here in part 3, I am going to take the Analytics a step futher by using Geo Maps on top of the SQL Server Adventure Works data warehouse by modeling the ROLAP schema manually instead of with the Pentaho auto-modeler. And since I’m finally upgraded to the latest 5.0 version of the Pentaho Suite. you may notice a few UI differences and workflow changes in the process. To follow along, make sure you download the latest & greatest Pentaho 5.0 product here.

Instead of starting the analysis process from the Pentaho User Console or Analyzer, we will build a model first as an XML document and we’ll use the Pentaho cube design tool, Pentaho Schema Workbench:

p5_04

You can download the complete sample XML schema that I built which you use in Pentaho Mondrian and Schema Workbench to connect to your SQL Server AdventureWorks Data Warehouse database to try this out on your own Pentaho Business Analytics suite. What I typically do when building a manual Mondrian schema is to start-up the Pentaho Schema Workbench (PSW) client and build out the primary parts of the model such as the measures, dimensions, calculations, etc. Then save the schema file as an XML file and edit it with an XML editor such as Visual Studio, Eclipse or Notepad++.

The reason why we are following this workflow for this example is do to the nature of the SQL Server sample data warehouse, Adventure Works (AW). AW makes use of a number of “snowflake” schema joins between dimension tables. The auto-modeler from Analyzer and PUC in Pentaho support only “star” schemas, so I write the model using PSW and manual XML to join the Customer dimension with the Geography dimension and the Customer key back to the customer key in the Fact table, which is FactInternetSales in this model. This will provide Analyzer with an MDX query to resolve the geography from the Fact table and graph it on the map using the hierarchy that I’ve defined below as Country, State, City, Postal Code:

<Dimension type=”StandardDimension” visible=”true” name=”Geo”>
<Hierarchy name=”default” visible=”true” hasAll=”true” primaryKey=”CustomerKey” primaryKeyTable=”c”>
<Join leftAlias=”c” leftKey=”GeographyKey” rightAlias=”g” rightKey=”GeographyKey”>
<Table name=”DimCustomer” schema=”dbo” alias=”c”>
</Table>
<Table name=”DimGeography” schema=”dbo” alias=”g”>
</Table>
</Join>
<Level name=”Country” visible=”true” table=”g” column=”CountryRegionCode” uniqueMembers=”false”>
</Level>
<Level name=”State” visible=”true” table=”g” column=”StateProvinceName” uniqueMembers=”false”>
</Level>
<Level name=”City” visible=”true” table=”g” column=”City” uniqueMembers=”false”>
</Level>
<Level name=”PostalCode” visible=”true” table=”g” column=”PostalCode” uniqueMembers=”false”>
</Level>
</Hierarchy>
</Dimension>

. . .

<Cube name=”SalesGeo” visible=”true” cache=”true” enabled=”true”>
<Table name=”FactInternetSales” schema=”dbo”>
</Table>
<DimensionUsage source=”Geo” name=”Geo” visible=”true” foreignKey=”CustomerKey”>
</DimensionUsage>
<Measure name=”TotalSales” column=”SalesAmount” datatype=”Numeric” aggregator=”sum” visible=”true”>
</Measure>
<Measure name=”NumOfCusts” column=”CustomerKey” aggregator=”distinct-count” visible=”true”>
</Measure>
</Cube>

Now that we have a functioning schema, publish it to the Pentaho BA Server either from Schema Workbench under File >> Publish, or through the Import mechanism in the Pentaho website. This will make the schema available to both the Mondrian ROLAP engine and to your report designers who are using Analyzer for OLAP queries and reports:

p5_05

 

I’ll select the AWGeo model and now we’re in the familiar Pentaho Analyzer experience where I can grab the measures that I built in the model from AdventureWorks for total sales and distinct count of unique customers:

p5_01

 

To put this into a Geo Map, I’ll remove the levels from the Geography dimension and leave just the top-level Country level as the Geo Dimension on the design surface. This way, we’ll drill-down from highest level down to finest grain in the map. Select Geo Map from the visualization drop-down menu at the top-right of the Analyzer screen … Now inside the map, I can drill to detail based on my 2 measures that are defining the circle size and color:

p5_02 p5_03

 

There ya have it … SQL Server 2012 data warehouse serving data to the Pentaho ROLAP engine (Mondrian) with a defined schema that models the Geography dimension for customers of Adventure Works with total sales and unique customers.

 

Advertisements

2 responses to “AdventureWorks DW on Pentaho, part 3: Geo Mapping AdventureWorks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s