In 2008, when I first started really using SSIS 2008, I wrote-up the joy I was having using the Data Profiler task in SSIS at SearchSQLServer.com here. It was a good first step into built-in funcationality around data quality and data profiling in SQL Server. Still not on par with some of my favorite tools from past lives like Informatica & ETI. Prior to that toolbox task in SSIS, SQL Server ETL developers would perform data quality and profiling through 3rd party tools or use the data mining and fuzzy lookup tasks to roll your own. In fact, there is a popular MSDN entry online here that walks you through an entire example of roll-your-own, including data cleansing, which the current SSIS 2008 profile task does not do. It was written and reviewed by SQL Server heavy-weights like Donald Farmer and Stacia Misner. Definitely it is worth your time checking it out.
Then came the exciting news of Microsoft in 2008, acquiring Strature for MDM and Zoomix for Data Quality. Sounds like a perfect marriage. But these technologies are finding their way into the SQL Server family piecemeal. MDM became Master Data Services in SQL Server 2008 R2 and Zoomix is Data Quality Services, which will be part of SSIS in SQL 11, althought the current CTP 1 of Denali does not include these capability so you can’t try it out yet. Personally, I would like to see SSIS, MDS & DQS come together and I’m sure that those of us in the SQL Server community will find ways to do that on our own.
In the interim, I tried to find some good info for my readers on what Data Quality Services (DQS) will look like and do for you in SQL 11 and this is a paraphrased quote from a Microsoft program manager on what to expect out of DQS:
Data Quality Service (DQS) creates and maintains a Data Quality Knowledge Base (DQKB) that is used for performing various data quality operations, such as data cleansing and matching and profiling.
DQS will support three major functionalities:
- Knowledge Management – creating and maintaing a DQJB, including supporting “knowledge discovery” – an automated computer-assisted acquisition of knowledge from a data source sample. In this secnario you will also define a train a set of rules that will serve as the policy governing the matching process, as well as choose and integrate reference data from different parties into the DQKB domains.
- Data Quality Projects – enable correcting, standardizing and matching source data according to domain values, rules and reference data associated with a designated data quality knowledge base.
- Administration – monitoring the current and past DQ processes and defining the various parameters governing the overall DQ activities in your server.
All of the above will be available through our DQS client experience. We will also include a new SSIS correction component, that enables correcting the input data according to domain values, rules, and reference data associated with a designated data quality knowledge base.