In a previous post over at Kromer Big Data, I posted examples of deleting files from Azure Blob Storage and Table Storage as part of your ETL pipeline using Azure Data Factory (ADF). In those examples, I built a small, quick Logic App that used the Azure Storage APIs to delete data. In those post, I’m going to demonstrate how to remove files from Azure Data Lake Store (ADLS). For this demo, we’ll use ADF’s V2 service.
Deleting / removing files after they’ve been processed is a very common task in ETL Data Integration routines. Here’s how to do that for Azure Data Lake Store files in ADF:
- Start by creating a new Data Factory from Azure
- Click “Author & Monitor” from your factory in order to launch the ADF UI.
- Create a new pipeline and add a single Web Activity.
- In that Web Activity, we are going to call the ADLS DELETE REST API.
- Switch to the “Settings” tab on the properties pane at the bottom of the pipeline builder UI.
- The URL in the Web Activity will need to be the URI pointer the ADLS file you wish to delete
- The URL above (i.e. file names, folder names) can be parameterized. Click the “Add Dynamic Content” link when editing the URL text box.
- Set the Web Activity “Method” to “DELETE”.
- For authentication, you will need to have an access token. You can use this method to produce one:
curl -X POST https://login.microsoftonline.com/<TENANT-ID>/oauth2/token \ -F grant_type=client_credentials \ -F resource=https://management.core.windows.net/ \ -F client_id=<CLIENT-ID> \ -F client_secret=<AUTH-KEY>
- More info on using Curl to get your access token is here
- The access token returned will need to be captured and used in the Web Activity header as such:
Header = "Authorization" Expression = "Bearer <ACCESS TOKEN>"
- You can now validate and test run your pipeline with the Web Activity. Click the “Debug” button to give it a try.