Azure Data Factory: Delete Files From Azure Data Lake Store (ADLS)

In a previous post over at Kromer Big Data, I posted examples of deleting files from Azure Blob Storage and Table Storage as part of your ETL pipeline using Azure Data Factory (ADF). In those examples, I built a small, quick Logic App that used the Azure Storage APIs to delete data. In those post, I’m going to demonstrate how to remove files from Azure Data Lake Store (ADLS). For this demo, we’ll use ADF’s V2 service.

Deleting / removing files after they’ve been processed is a very common task in ETL Data Integration routines. Here’s how to do that for Azure Data Lake Store files in ADF:


  1. Start by creating a new Data Factory from Azure
  2. Click “Author & Monitor” from your factory in order to launch the ADF UI.
  3. Create a new pipeline and add a single Web Activity.
  4. Switch to the “Settings” tab on the properties pane at the bottom of the pipeline builder UI.
  5. The URL in the Web Activity will need to be the URI pointer the ADLS file you wish to delete
    • https://<yourstorename>
  6. The URL above (i.e. file names, folder names) can be parameterized. Click the “Add Dynamic Content” link when editing the URL text box.
  7. Set the Web Activity “Method” to “DELETE”.
  8. For authentication, you will need to have an access token. You can use this method to produce one:
  9. The access token returned will need to be captured and used in the Web Activity header as such:
    • Header = "Authorization"  Expression = "Bearer <ACCESS TOKEN>"
  10. You can now validate and test run your pipeline with the Web Activity. Click the “Debug” button to give it a try.




System Center for the SQL Server DBA, cont’d SCOM Pack for SQL

My original posting on System Center Operations Manager for SQL Server DBAs start in part 1 of the series here. Today, I want to take just a minute to give you some pointers in getting started using the System Center Monitoring Pack for SQL Server. Notice in that download link, that the pack has been updated to allow you to monitor up to SQL Server 2012, which now also includes monitoring your Always On Availability Groups from SCOM, a very important new feature.

One thing that I always mention to DBAs or other IT professionals who are new to using the SCOM SQL monitoring pack is to start by downloading and taking a look at the online guide for the MP:

There is a LOT in there. This is good. As a DBA and someone responsible for SQL Server database health, you want a lot of monitoring tools and capabilities at your fingertips. But I recommend starting off with a small subset of counters that you will monitor.

As a common best practice when using the SCOM MP for SQL Server, start by picking a list of critical areas in your SQL Server environment that you most need visibility to provide the highest level of service, such as performance, uptime, high availability, replication, etc. This kind of approach will force you to think about which counters and reports to light-up in SCOM instead of overloading a dashboard or NOC with too much noise & chatter.

This is one of the best top-10 lists that you can use to get started is by Gregory A. Larsen from Database Journal:

I generally agree with this list and monitor these counters through normal tools like perfmon, DMVs or MDW. It’s a good starting point for you for database performance. From there, you can start to configure the infrastructure monitoring pieces in the monitoring pack through SCOM for clustering, replication, Always On, etc.

Complete SQL Server 2008 Auditing Solution Example

SQL Server 2008 helped to alleviate the need for DBAs to write DDL triggers or to use traces to audit SQL Server by adding the “Server Audit” capability. However, to have a complete audit solution that can be used by auditors or folks outside of the SQL Server administrator teams, you need a tool to format the audit information. It’s not reasonable to test for compliance or to manually look through complex audit logs.

There are a number of very good products on the market today that you can purchase that are meant for compliance standards and auditing of entire ecosystems. But if you would like to take the SQL Audit capability at least a step higher to where the SQL Server audit data from the audit objects are parsed out of logs and into nicely formatted reports and stored in a database that you can manage, search and monitor, then I suggest taking a look at a really cool sample solution.

This is something that the Microsoft SQL CAT team put together with consultants and a customer that will not do all the things a true auditing and compliance product will do. But it will take you a step beyond the current SQL Server Audit object which today stops at writing to a log file or a Windows log. For everything that I’m showing you in this posting, you can download it all and the complete sample application here.

Once you download that material, you will expand it out to a series of folders. You’ll find the full complete set of instructions on how to set-up & configure the complete solution from the Word document called “SQL Audit Solution” in the \SQLComplianceLab\SQLAudit folder. There are SSIS packages, SSRS reporting projects, the Audit database and scripts and even hands-on labs and policy-based management samples. Pretty cool stuff.

The details, requirements and relevance to compliance standards that this audit solution strives to meet are explained very well in this whitepaper on Reaching Compliance with SQL Server 2008. You will see quite a few references to one of Microsoft’s biggest SQL Server customers, CareGroup, because I’m sure that this is where the Microsoft partner and SQL Server CAT team first implemented with solution. Be sure to replace all things related to CareGroup and their file system (you’ll see a log of references to E drive and C drive folders that aren’t relevant to your servers). When going through this material and implementing this SQL Server Audit solution in your environment, I think you’ll agree that Denny Lee and the partner teams did an awesome job with this and it is great to see that they’ve shared the entire solution with everyone at no cost.

The diagram below is a very simplified high-level look at what the solution will do for you as an end-to-end auditing application. There are sample best-practice policies and security objects that you can use from the documentation and install. Once your SQL Server starts writing the audit logs, you can schedule the ETL process from SSIS (see below) to parse the audit logs and store it in the partitioned tables in the SQL Server Audit tables that are included in the download from above.



A few things to focus on when setting up the reports in your SSRS report server (see below) and your SSIS packages for ETL (directly below) based on my efforts in setting this solution up on my systems include:

  1. In my BIDS screenshot below of the SSIS package, I have the variables window open. Go through and make sure that you are pointing to your Server Audit logs folder and that you’ve removed any references to CareGroup that still exist in some of the code from the implementation at the customer site
  2. Do the same scan and scrub of the SSRS data sources in BIDS and make sure you are pointing to your SQL Servers and audit database that came with the install. You’ll need to do this on the source & target adapters in SSIS, too
  3. Run the ETL from BIDS a few times so that it is easier to troubleshoot and find errors before deploying to SSIS and scheduling. It took me a couple of runs to get the variables set right in the packages. Most of it is actually pretty well documented. But a few things took 1 or 2 tries to get right


Ok. So you’ve gone through the trouble of downloading the material and the scripts. You’ve installed the audit database, configured the SSIS packages, modified the connection strings in SSRS for the reports … Now, what do I get for all of my work & efforts??

See below … I’ve taken just a few of the report screenshots from the document in the Reaching Compliance whitepaper from SQL CAT. Now, this is not going to be a robust, supported solution for auditing in the same vein as off-the-shelf solutions that are sold by Microsoft (System Center Audit Collection Services) or 3rd party auditing tools that you can find in the market today by companies like Lumigent and Idera. You are essentially implementing custom code and are on your own to configure and support. You can always hire a consultant to provide help and support on a solution like this. And like I said earlier, this is much more robust and accurate that wiring together a bunch of triggers or manually scanning log tables and files. Just be aware of the risks of implementing something that is not a supported product.

Mid-July Updates

I just completed 2 weeks of prep and delivery of my new SQL Server 2008 R2 Security Audit workshop, so I’m catching up on the blogs and other publishing matters this week and next week. In the meantime, I’ve posted a new blog on SQL Server Magazine’s BI blog on using Change Tracking and CDC to reduce ETL load times here and a few updates on Twitter around announcements including a free Philly SQL Server User Group evening session on SQL Server PDW in our Malvern office on August 10:

I am going to start a SQL Server security auditing series, probably split across 3 parts, starting this week that includes most of the material from my workshop, so look for that and enjoy!

All the best, Mark

SQL Server 2008 MDW Data Collectors for SQL Server 2005

UPDATED January 22: Ruggerio kindly pointed out that I had called MDW an Enterprise Edition feature in SQL Server. UCP, which uses MDW data, is an Enterprise Edition feature. However, the Data Collectors and the MDW Reports are available in Standard Edition as well. H/T to Ruggerio.

SQL Server 2008 introduced a great piece of functionality to help you monitor performance natively on your SQL Server called Management Data Warehouse or MDW. It is very easy to configure with a wizard from SSMS and comes with a prebuilt set of “data collectors” that you schedule to run against your databases to collect performance counters and stats. There are a number of very interesting reports that come out of the box and run from the SSMS right-click Reports capability. I pasted an example of one of the reports below. These reports include drill-down with interactive click-through and timeline slider capabilities. The data warehouse database that the tool creates on your instance is open and easy to understand so that you can modify, add your own reports and build your own data collectors. Very cool and out of the box with SQL Server 2008 & SQL Server 2008 R2 Enterprise Edition. Not as feature-rich as the awesome SQL Server performance tools that are sold on the market today. But it is more advanced way to monitor your performance and keep trends for historical baselines then was previously provided by Microsoft.

MDW is a SQL Server 2008 Enterprise Edition or Data Center Edition feature that relies on SQL Server’s performance data collectors, which is a feature that created the jobs, schedules and SSIS packages needed to parse, collect, upload and store your server’s performance metrics. So what I wanted to show you today was a way that you can pull in counters and stats from your SQL Server 2005 editions as well by creating a custom data collectors that will run on a schedule from your SQL Server 2008 instance that you can run as a monitoring hub. You cannot create a custom data collector from SSMS as a New object, you have to script it in T-SQL. But once you create it, you can manage it in SSMS by setting the collection schedule and data retention policy.

The easiest way to create the new data collector is by scripting one of the existing “System Data Collectors” that are provided for you out of the box and then modify that. I’m not going to go into the details here and instead will point you to an awesome posting that Greg Larson at did when this feature first was introduced into SQL Server. Check it out here. It is one of the best tutorials that I have seen on the topic of SQL Server MDW custom data collectors and he did it over 4 pages of blog postings. Nice work!

Follow through that tutorial and then if you would like to collect performance from a SQL Server 2005 instance, follow the these steps:

1. Create a Linked Server to your SQL Server 2005 instance. My sample uses a SQL Server 2005 instance called “Chicago”.

2. Use the Linked Server in the system views query from Greg’s example in Section C like this:

— Begin Section C ————————————–
Declare @collection_item_id_4 int
EXEC [msdb].[dbo].[sp_syscollector_create_collection_item]
@name=N’Remote Number of Rows’,
@parameters=N'<ns:TSQLQueryCollector xmlns:ns=”DataCollectorType”><Query><Value>
SELECT OBJECT_NAME(o.object_id) ObjectName
, SCHEMA_NAME(o.schema_id) SchemaName
, SUM(p.Rows) NumOfRows
FROM Chicago.msdb.sys.partitions p
JOIN Chicago.msdb.sys.objects o ON p.object_id = o.object_id
WHERE Index_ID &lt; 2 AND o.type = ”U”
GROUP BY SCHEMA_NAME(o.schema_id), OBJECT_NAME(o.object_id)
ORDER BY SCHEMA_NAME(o.schema_id), OBJECT_NAME(o.object_id)
</Value><OutputTable>NumOfRows</OutputTable></Query><Databases UseSystemDatabases=”true” UseUserDatabases=”true” /></ns:TSQLQueryCollector>’,
@collection_item_id=@collection_item_id_4 OUTPUT,
— End Section C —————————————-

Notice that I added my Linked Server and database reference (MSDB) to the front of the sys views in the FROM clause. That’s it. But be sure to be careful with the syntax, because that SP parameter is looking for XML syntax, not general T-SQL, so things like less than (<) need to be changed to  &lt; .

More Spring Cleaning: SSIS Data Profiler

So I will now continue to clean-up and work through my backlog of items from previous writings by introducing you to another of my favorite data warehouse-related improvements from SQL Server 2005 to SQL Server 2008: Data Profiler. As in my previous post, start with my deeper and lengthier treatment of this topic from SearchSQLServer here.

Now consider this: most data warehouse and BI projects need to start by finding, collecting, analyzing and then finally utilizing data from disparate systems, commonly transactional applications and databases, as the source for analsis. But how often do you find that data to be spot-on, clean, no duplicates, no confusing similarities and ALWAYS 100% just the way that the customer said the data would look?

Not very often.

Before SQL Server 2008 and the data profile task in SSIS, I used to work on projects where we would look to 3rd party tools as part of the data discovery and cleansing part of the process. Or we would write separate SSIS packages to cleanse data.

But now with Data Profiler as a core part of SSIS in SQL Server 2008, you can easily perform data quality checks for matches, patterns, dupes, etc.