How to Make Use of Azure Cognitive Services

Azure Cognitive Services is a service that is available from your Azure portal which allows you to set-up accounts that enable the use of Microsoft’s pre-built algorithms and models for common advanced analytical functions such as Text Analytics, Recommendation Engines, speech and vision recognition as well as many more service-oriented APIs. These models have been trained and are ready for you to use and consume in your solutions, such as Power BI business intelligence solutions.

There is a very common use case emerging that I’m finding in the Cloud Analytics space that relates specifically to a portion of the Text Analytics function where there are a number of solutions that can take advantage of the Sentiment Analysis capabilities that Azure Cognitive Services provides. By sending in a text string to the API service, you can leverage the existing Microsoft sentiment engine to determine if the message that you sent in is positive, negative or neutral based upon a score.

The best way to demonstrate this is with Power BI. I am going to briefly touch on 2 of the most common social media sources to pull real-time data from and core the messages for sentiment analysis: Twitter & Facebook.

First, the easiest with Power BI (PBI) is Facebook. There is an out-of-the-box connector for Facebook included with PBI, so you can use Get Data to bring data directly into a Power BI model from Facebook very easily:

pbi4

Now, to hook into the Microsoft Sentiment Analysis, follow the method described very well on this PBI community page and from the associated PASS Virtual Chapter presentation on this very topic by Gil from DataChant: http://community.powerbi.com/t5/Community-Blog/Sentiment-Analysis-in-Power-BI/ba-p/55898. His full working sample is awesome and you can play around with it here: https://app.powerbi.com/view?r=eyJrIjoiMjU5NWM3MTMtMTg0NS00YzE2LTk0YTYtMmZkZWM4NjYwMmFjIiwidCI6IjIyNzNjNDFiLWI4ZDAtNDVhZi1iZWU2LWUwODQ5NmFlNjcxOCIsImMiOjN9.

To download and use a similar sample in the PBI Desktop, just download the PBIX from my Github. I recommend going into the detailed PBI Community link from Gil above, it rocks and gives you the background you need to customize this for yourself with step-by-step instructions on how this was made.

For the demo above that I uploaded to Github, I included Gil’s Trump & Clinton Facebook election sentiment analysis data on Tab 2 of the report view on PBI. Tab 1 has Microsoft BI Facebook sentiment data where I used the Microsoft BI Facebook page as the data source. Each of those datasets is run through the Cognitive Services API for Sentiment Analysis, part of the Text Analytics suite.

To make it work for you, follow Gil’s instructions on getting an API key from Cognitive Services and plug in your API Key in the GetSentimentResults function in the Power Query / Edit query tool. You can use the free limited service for Sentiment Analysis or go to the Azure Portal and create a full-use Cognitive Services account. The code for calling out to the API and saving the sentiment score in the PBI model is in a Power Query function call in the PBIX and looks like this:

(Source as table) as any =>
let
JsonRecords = Text.FromBinary(Json.FromValue(Source)),
JsonRequest = “{“”documents””: ” & JsonRecords & “}”,

JsonContent = Text.ToBinary(JsonRequest, TextEncoding.Ascii),
Response =
Web.Contents(“https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment?”,
[
Headers = [#”Ocp-Apim-Subscription-Key”= “your api key here”,
#”Content-Type”=”application/json”, Accept=”application/json”],
Content=JsonContent
]),
JsonResponse = Json.Document(Response,1252)
in
JsonResponse

As you can tell from my sample Sentiment Facebook dashboard below, I did not put as much time into the report as Gil did. So take the PBIX and add your own visuals to make it better!

pbi

Now onto the 2nd example, using Twitter data. There is no out-of-the-box connector directly into PBI for live Twitter feeds, so I used the Solution Template for Twitter (Marketing use case) from the Power BI Solution Template site: https://bpsolutiontemplates.com/.

That site provides you with a “wizard” that walks through the set-up of an Azure SQL Database, your Cognitive Services API as I mentioned above (free or the Azure account) and the Twitter handles/keywords that you want to use as a filter. This creates a much more robust solution to grab the Twitter data in that the solution template with spin-up Logic Apps, App Service and API connectors to Twitter and Sentiment Analysis on the backend instead of directly from Power BI as the first example with Facebook above.

So, essentially Power BI will just connect to your Azure SQL DB that you’ve chosen from the template and that SQL DB will get populated from the Azure App that is built automatically for you. This separates the data from the business logic and from the presentation layer in PBI. The code to call Twitter and Cognitive Services are all in the “middle tier” on the App Service in Azure, which you can open directly from your Azure Portal. Modify the API calls, strings and sentiment bins in that code:

//Sentiment analysis – Cognitive APIs
string sentiment = await MakeSentimentRequest(tweet);
sentiment = (double.Parse(sentiment) * 2 – 1).ToString(CultureInfo.InvariantCulture);
string sentimentBin = (Math.Floor(double.Parse(sentiment) * 10) / 10).ToString(CultureInfo.InvariantCulture);
string sentimentPosNeg = String.Empty;
if (double.Parse(sentimentBin) > 0)
{
sentimentPosNeg = “Positive”;
}
else if (double.Parse(sentimentBin) < 0)
{
sentimentPosNeg = “Negative”;
}
else
{
sentimentPosNeg = “Neutral”;
}

The out-of-the-box reports from these Power BI templates are quite good with a social graph visualization and a number of report tabs for different pivots of the Twitter data and sentiment scores.

pbi2

pbi3

Azure Marketplace Solutions

Yes, it’s been a while and I’m just now getting back into the swing of things, keeping MS-SQL Dude running, after my 2 year hiatus in the Open Source world.

To get back into the swing of things, I’ll share with you all a series of videos that I put together with the Microsoft Azure Partner team around the database, analytics and big data offerings from other software vendors that are available as solution template images on the Azure Marketplace:

We first started at the fundamental data level with my old friends at DataStax and their Cassandra database product

We then moved up the stack into batch analytics, streaming and distributed execution engines on Hadoop with my friends at Hortonworks

Lastly, we finished up with the end-user data visualization and BI layer from Looker

All of these data solutions are available from the Azure Marketplace link above.

You can also find several of the presentations from this series at my Slideshare site here.

 

Pentaho and HP Vertica – Big Data Analytics

Anyone going to the HP Vertica conference next week in Boston? (August 12, 2014)

If so, stop by and say Hi at the Pentaho booth in the expo center!

BTW, I uploaded a quick & short video that I threw together with no voiceover, but shows you how easy & quick it is to make a Dashboard in Pentaho 5.1 against a Big Data source like HP Vertica.

See it here.

HP Vertica is a world-class MPP Analytical Database and is a perfect match for Pentaho’s Analytics Suite. In the video you’ll see me through together a quick Dashboard by stacking Analyzer report components that are querying a Mondrian cube.

The business model is based on Vertica’s VMart data warehouse sample data and the semantic models is generated automatically through Pentaho’s auto modeler. I’m using our Web-based Thin Modeler to modify the model a bit before publishing it for ad-hoc interactive analysis.

Hope to see you @ the HP Vertica conference in Boston next week!  Best, Mark

Convert SSAS Snowflake Schemas to Pentaho Mondrian

Last weekend at a local tech conference, I was speaking to a few of my friends in the SQL Server & Microsoft BI consulting business about migrating cubes and BI solutions from SSAS to Pentaho using the Mondrian ROLAP engine with tools like ssas2mondrian. I explain different methods of starting over and generating new OLAP models using Pentaho’s auto modeler and Analyzer here and how to start with migrating simple SSAS cubes to Mondrian ROLAP schemas here.

So I felt like it would be a good challenge to see if I could convert a more complex SSAS cube to a Mondrian model that included Reference Dimensions in SSAS and convert those to a snowflake model in Mondrian using JOINs in dimension definitions.

Additionally, if your Microsoft BI solution is using Excel in SharePoint for visualizations (Excel is widely used for business intelligence), you will want to use the Pentaho visualizations to connect to the Mondrian ROLAP engine after migration. I think I’ve got this most of the way there, so I thought I’d kick it out in a blog post here.

The best way to begin the migration process is to use the ssas2mondrian utility that I put here on Box as a compiled EXE, or grab the source code on GITHUB and have at it. I would love to get some community additions to grow this migration capability.

The Adventure Works sample data set includes snowflake dimension designs, but I felt like working on a different data set this time. So I used the Mondrian sample Foodmart data set this time and made a simple cube in SSAS that included the dimension-to-dimension join to represent a snowflake schema with the Product Class dimension to Product dimension, along with Regular dimension relationships to fill-out just a simple, minimal cube:

ssas2_5

After deploying that cube to the SSAS server, Microsoft BI consultants will want to put visualizations for end-users in Excel, SharePoint and SSRS that will look something like this:

ssas2_1

This is a simple Excel representation of sales from the Foodmart fact table sliced up by Product Family, which is from a leaf-level dimension attribute coming from the Product Class dimension table, represented in SSAS as a Reference Attribute and as a snowflake join in Mondrian.

Now let’s convert that to Pentaho …

First, I’ll run the ssas2mondrian utility to build out the base model, which takes the online cube from my SSAS database and generates Mondrian schema XML file output which I’ll redirect to a local file on my filesystem from a Windows PowerShell (DOS prompt will work just fine, too):

ssas2_3

Ok, let’s open this up and see what we get in PSW (Pentaho Schema Workbench):

ssas2_6

Notice that ssas2mondrian has converted measure groups to virtual cubes (no support yet for measure groups in Mondrian 3.x), assumptions have been made about key columns for dimensions (you can’t have composite keys for dimensions in Mondrian 3.x), hierarchies were generated and physical cubes were generated from fact tables. Check the help (ssas2mondrian /help) in the utility for other options when migrating from SSAS to Mondrian.

To fix the red Xs in the schema, we need to clean-up a few things. In this case, PSW did not like the table names because Mondrian is a ROLAP engine and will issue SQL queries converted from MDX from a client tool. I did used SELECT statements from SSAS because I kept my Foodmart database in MySQL as the source for this SSAS schema. So I cleaned-up those names and removed a few levels in the hierarchies that didn’t make sense for these sample reports to get to a clean schema check:

ssas2_4

That’s better. Now publish the Mondrian schema to your Pentaho Business Analytics server and you’re ready to start building Analyzer reports like this pie which is an OLAP query sending MDX to Mondrian using the snowflake join dimension of Product Family:

ssas2_2

Let’s do a quick solution check: we’ve migrated the cube model from SSAS to Pentaho Mondrian, the report from Excel to Pentaho Analyzer and now let’s build a dashboard to replace SharePoint … For more info on building Pentaho Dashboards, check out this YouTube video.

ssas2_7

There ya have it. Now, admittedly, I used a set of controlled samples that pretty much led me to a successful migration in a snowflake model. That begin said, I am hopeful that these instructions, the demo and the ssas2mondrian utility help to provide you with the means to get started with a migration.

BTW, I’ve posted the SSAS & Mondrian schemas here if you’d like to download them to play around with.

AdventureWorks DW on Pentaho, part 3: Geo Mapping AdventureWorks

In the previous parts of this series, I’ve showed you how to make quick, auto-modeled OLAP cubes and reports on a SQL Server AdventureWorks data warehouse as well as how to take your existing SSAS cubes on top of those SQL Server DWs and turn them into Mondrian cubes, which runs natively inside of the Pentaho Business Analytics suite.

Here in part 3, I am going to take the Analytics a step futher by using Geo Maps on top of the SQL Server Adventure Works data warehouse by modeling the ROLAP schema manually instead of with the Pentaho auto-modeler. And since I’m finally upgraded to the latest 5.0 version of the Pentaho Suite. you may notice a few UI differences and workflow changes in the process. To follow along, make sure you download the latest & greatest Pentaho 5.0 product here.

Instead of starting the analysis process from the Pentaho User Console or Analyzer, we will build a model first as an XML document and we’ll use the Pentaho cube design tool, Pentaho Schema Workbench:

p5_04

You can download the complete sample XML schema that I built which you use in Pentaho Mondrian and Schema Workbench to connect to your SQL Server AdventureWorks Data Warehouse database to try this out on your own Pentaho Business Analytics suite. What I typically do when building a manual Mondrian schema is to start-up the Pentaho Schema Workbench (PSW) client and build out the primary parts of the model such as the measures, dimensions, calculations, etc. Then save the schema file as an XML file and edit it with an XML editor such as Visual Studio, Eclipse or Notepad++.

The reason why we are following this workflow for this example is do to the nature of the SQL Server sample data warehouse, Adventure Works (AW). AW makes use of a number of “snowflake” schema joins between dimension tables. The auto-modeler from Analyzer and PUC in Pentaho support only “star” schemas, so I write the model using PSW and manual XML to join the Customer dimension with the Geography dimension and the Customer key back to the customer key in the Fact table, which is FactInternetSales in this model. This will provide Analyzer with an MDX query to resolve the geography from the Fact table and graph it on the map using the hierarchy that I’ve defined below as Country, State, City, Postal Code:

<Dimension type=”StandardDimension” visible=”true” name=”Geo”>
<Hierarchy name=”default” visible=”true” hasAll=”true” primaryKey=”CustomerKey” primaryKeyTable=”c”>
<Join leftAlias=”c” leftKey=”GeographyKey” rightAlias=”g” rightKey=”GeographyKey”>
<Table name=”DimCustomer” schema=”dbo” alias=”c”>
</Table>
<Table name=”DimGeography” schema=”dbo” alias=”g”>
</Table>
</Join>
<Level name=”Country” visible=”true” table=”g” column=”CountryRegionCode” uniqueMembers=”false”>
</Level>
<Level name=”State” visible=”true” table=”g” column=”StateProvinceName” uniqueMembers=”false”>
</Level>
<Level name=”City” visible=”true” table=”g” column=”City” uniqueMembers=”false”>
</Level>
<Level name=”PostalCode” visible=”true” table=”g” column=”PostalCode” uniqueMembers=”false”>
</Level>
</Hierarchy>
</Dimension>

. . .

<Cube name=”SalesGeo” visible=”true” cache=”true” enabled=”true”>
<Table name=”FactInternetSales” schema=”dbo”>
</Table>
<DimensionUsage source=”Geo” name=”Geo” visible=”true” foreignKey=”CustomerKey”>
</DimensionUsage>
<Measure name=”TotalSales” column=”SalesAmount” datatype=”Numeric” aggregator=”sum” visible=”true”>
</Measure>
<Measure name=”NumOfCusts” column=”CustomerKey” aggregator=”distinct-count” visible=”true”>
</Measure>
</Cube>

Now that we have a functioning schema, publish it to the Pentaho BA Server either from Schema Workbench under File >> Publish, or through the Import mechanism in the Pentaho website. This will make the schema available to both the Mondrian ROLAP engine and to your report designers who are using Analyzer for OLAP queries and reports:

p5_05

 

I’ll select the AWGeo model and now we’re in the familiar Pentaho Analyzer experience where I can grab the measures that I built in the model from AdventureWorks for total sales and distinct count of unique customers:

p5_01

 

To put this into a Geo Map, I’ll remove the levels from the Geography dimension and leave just the top-level Country level as the Geo Dimension on the design surface. This way, we’ll drill-down from highest level down to finest grain in the map. Select Geo Map from the visualization drop-down menu at the top-right of the Analyzer screen … Now inside the map, I can drill to detail based on my 2 measures that are defining the circle size and color:

p5_02 p5_03

 

There ya have it … SQL Server 2012 data warehouse serving data to the Pentaho ROLAP engine (Mondrian) with a defined schema that models the Geography dimension for customers of Adventure Works with total sales and unique customers.

 

Migrate Adventure Works Sales Cubes from SSAS MOLAP to Mondrian ROLAP

In the 2 previous blog entries about recreating the SSAS example cubes for Adventure Works sales BI on Pentaho, I focused on generating new departmental-sized ROLAP cubes through the thin client Auto Modeler tools here & here.

In this entry, I’m going to show you how you can move over to the Pentaho BA Suite quickly by getting started with your existing SSAS cubes, such as the Adventure Works sales cubes, by migrating them pretty much as-is to the Pentaho ROLAP engine called Mondrian. This way, you can use the modeling tools that you are already familiar with (i.e. BIDS) and just convert them to Mondrian. Once you are converted over to the Pentaho platform, then you can start playing around with Pentaho analytics and manage the new version of the cubes in Pentaho.

In case you didn’t already do so, make sure you go here to download the evaluation of the Pentaho Business Suite here. And here is the Codeplex link for the open source download of the Adventure Works SSAS 2012 MOLAP cubes that we’ll use for this demonstration.

Make sure you also download the SQL Server 2012 JDBC driver and the tool that I built to help you migrate your online SSAS cubes from the SQL Server Analysis Server to Mondrian XML schema output files using AMO here, called SSAS2MONDRIAN. The full source code and C# project is available on GitHub in case you want to add more to it, have a look around, etc. The tool needs more features and error catching, so please contribute your ideas back to the source project!

So basically, instead of recreating a new OLAP model in Pentaho’s Analyzer, we are going to just migrate as-is from SSAS. Instead of rewriting the details of some of the primary differences between the 2 engines that you need to be aware of, I’m going to just link you back to my last posting on this topic at KromerBigData. You’ll need to become familiar with the Mondrian schema and MDX differences because we don’t have a complete migration tool that can do it all for you. Instead, the ssas2mondrian utility will take your model outline of dimensions, measures, expressions, attributes, hierarchies and  measure groups and convert them to Mondrian descriptors and names, but leaving out decisions about conflicts and conversion gaps in the hands of the BI developer.

Ok, once you’ve loaded the AdventureWorks 2012 DW database and the MOLAP cubes from Codeplex into SQL Server, you should see them appear in your SSMS:

s0

 

Next, run the ssas2mondrian utility and redirect the output to an XML file so that we’ll be able to open the model in Pentaho Schema Workbench to make it ready for Mondrian:

mon0

You can see some of the output in that PowerShell screenshot above. I first ran the conversion utility without redirecting the output to a file, so some of the stdout is visible. I selected the server, cube, SSAS databsae and chose a new name for the resulting Mondrian schema, which is what will appear to report users when they build reports in Pentaho Analyzer.

Notice the conversion utility just using Console.WriteLine to output a Mondrian schema format. But, now I can open that schema file in Pentaho Schema Workbench (PSW) and work with it to clean things up and make it ready for Pentaho Analytics:

psw2

Make sure that you connect to the AdventureWorksDW2012 database from PSW as well under Database Connections. Remember that Mondrian is ROLAP only, so you will always send queries to the source database in Mondrian which will handle caching as well. Also take notice of the red check marks above. That means that you’ll want to open those nodes in the tree and look at the error in the bottom of the PSW screen. It is possible that some of the table names are actually named queries in the SSAS cube model, which means that those will need to become views in the source Adventure Works database to resolve the red check in PSW for Mondrian.

psw1

You can now use that ROLAP model to build Pentaho Analyzer reports and continue to maintain the models directly in Pentaho Mondrian, taking SSAS away from the solution. Or, if you like designing models in BIDS, build your cube there and then run ssas2mondrian to load the model into the Pentaho suite!

OPASS Discussions & Big Data in the Real World

If you are in the Orlando Sanford area tomorrow night (Thursday October 24 @ 6;30 register here) then come join the Orlando PASS gang where I will lead a discussion on Big Data and Big Data Analytics in the real world. I am basing this discussion on the systems that we designed and took to market for Razorfish and will include demos of HDInsight, Hadoop, NoSQL databases, Big Data Analytics with Pentaho and SQL Server. Here is the Slideshare link to the presentation for the meeting.