I find it be an interesting thought because the currently available Microsoft cloud databsae, SQL Aure, is a complete SQL Server-based transactional database complete with support for SQL Server tools, T-SQL, ODBC, referential integrity, etc. The current maximium single stand-alone database size is 50 GB.
But Microsoft has recently shown a lot of interest in providing support for scaled-out large database workloads, first with SQL Server Parallel Data Warehouse (PDW) and then the recent announcement of PDW support for Hadoop. Scale-out applications built on traditional SQL Server have been around for some time. The typical mechanisms used to do this are based on partitioning or “sharding” the data to fan-out the data and queries across SQL Servers using MS-DTC, replication or Service Broker.
SQL Azure is coming out with a built-in capability to enable this style of “sharded” partitioning called Database Federations. This is a link to a terrific write-up of using these concepts in a Big-Data application, written by Roger Jennings for Visual Studio Magazine.
Note that this capability is not yet available even in CTP (beta) for SQL Azure yet at the time that I am writing this blog post. I like the fact that these capabilities are being surfaced as a built-in T-SQL command. There will be accompanying ADO.NET library changes with APIs and methods to manipulate the distributed data and to query it appropriately as well.
Very interesting, exciting ways that SQL Azure can be used. Once I get access to the CTPs, I’ll start building out distributed apps using that capability and blog my results here for you. In the meantime, that article link above gives you some code samples to start thinking about your Big Data architectures.