So I will now continue to clean-up and work through my backlog of items from previous writings by introducing you to another of my favorite data warehouse-related improvements from SQL Server 2005 to SQL Server 2008: Data Profiler. As in my previous post, start with my deeper and lengthier treatment of this topic from SearchSQLServer here.
Now consider this: most data warehouse and BI projects need to start by finding, collecting, analyzing and then finally utilizing data from disparate systems, commonly transactional applications and databases, as the source for analsis. But how often do you find that data to be spot-on, clean, no duplicates, no confusing similarities and ALWAYS 100% just the way that the customer said the data would look?
Not very often.
Before SQL Server 2008 and the data profile task in SSIS, I used to work on projects where we would look to 3rd party tools as part of the data discovery and cleansing part of the process. Or we would write separate SSIS packages to cleanse data.
But now with Data Profiler as a core part of SSIS in SQL Server 2008, you can easily perform data quality checks for matches, patterns, dupes, etc.