Big Data with SQL Server, part 2: Sqoop

I started off my series on Hadoop on Windows with the new Windows distribution of Hadoop known as Microsoft HDInsight, by talking about installing the local version of Hadoop on Windows. There is also a public cloud version of Hadoop on Azure: http://www.hadooponazure.com.

Here in part 2, I’ll focus on moving data between SQL Server and HDFS using Sqoop.

In this demo, I’m going to move data between a very simple sample SQL Server 2012 database that I’ve created called “sqoop_test” with a single table called “customers”. You’ll see the table is very simple for this demo with just a customer ID and a customer name. What I’m going to do is to show you how the Microsoft & Hortonworks Hadoop distribution for Windows (HDInsights) includes Sqoop for moving data between SQL Server & Hadoop.

You can also move data between HDFS and SQL Server with the Linux distributions of Hadoop and Sqoop by using the Microsoft Sqoop adapter available for download here.

First, I’ll start with moving data from SQL Server to Hadoop. When you run this command, you will “import” data into Hadoop from SQL Server. Presumably, this would provide a way for you to perform distributed processing and analysis of your data via MapReduce once you’ve copied the data to HDFS:

sqoop import –connect jdbc:sqlserver://localhost –username sqoop -password password –table customers -m 1

I have 1 record inserted into my customers table and the import command places that into my Hadoop cluster and I can view the data in a text file, which most things in Hadoop resolve to:

> hadoop fs -cat /user/mark/customers/part-m-00000

> 5,Bob Smith

My SQL Server table has 1 row (see below) so that row was imported into HDFS:

The more common action would likely move data into SQL Server from Hadoop and to do this, I will export from HDFS to SQL Server. I have a database schema for my data in Hadoop that I created with Hive that creates a table called Employees. I’m going to tranform those into Customer records in my SQL Server schema with Sqoop:

> sqoop export –connect jdbc:sqlserver://localhost –username sqoop -password password -m 1 –table customers –export-dir /user/mark/data/employees3

12/11/11 22:19:24 INFO mapreduce.ExportJobBase: Transferred 201 bytes in 32.6364 seconds (6.1588 bytes/sec)
12/11/11 22:19:24 INFO mapreduce.ExportJobBase: Exported 4 records.

Those MapReduce jobs extract my data from HDFS and send it to SQL Server so that now when I query my SQL Server Customers table, I have my original Bob Smith record plus these 4 new records that I transferred from Hadoop:

About these ads

6 responses to “Big Data with SQL Server, part 2: Sqoop

  1. Hi. Please let me know How can i connect from SQL server 2012 to Apache Hadoop/Hive on Linux. Microsoft SQL connector is available for sql server 2008 R2 alone. I guess. Is this statement right?

    • Hi Paresh – If you want to transfer data to/from Hadoop with Sqoop, then you can use the Microsoft Sqoop-based adapter for SQL Server that is available for download on Microsoft.com. For my demo, I was using SQL Server 2012 and it worked just fine for me.

  2. Also please let me know whether I can connect through Hive ODBC driver between SQL Server 2012 and Apache Hadoop/ive on Linux?

    • Hi Paresh – In my demo for the blog, I did not try using the Hive ODBC adapter for moving data to/from SQL Server. I only used Excel as the sample. But you should be able to use SSIS with the HIVE ODBC adapter. In fact, I found an MSDN blog post about doing just that here by Benjamin Guinebertiere.

      I will do some testing with that mechanism as well and post my findings to the MSSQLBLOG, using the Hive ODBC connector in SSIS.

      Br, Mark

  3. HI all,

    I am facing some issue regarding sqoop export in ubuntu

    Actually in ETL we are having relationships like:
    One to One
    One to Many
    Many to One
    Many to Many

    Now In the above 4 relationships we completely solved the first one as One to One relation. But, I have to export the data from one hdfs table to many sql server tables.

    If you know any result send me the query for sqoop export…

    Naveen

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s