Monthly Archives: April 2014

Change the Game with APS and PolyBase

 Guest blog post by: James Rowland-Jones (JRJ), SQL Server MVP, PASS Board Director, SQLBits organiser and owner of The Big Bang Data Company (@BigBangDataCo). James specializes in Microsoft Analytics Platform System and delivers scale-out solutions that are both simple and elegant in their design. He is passionate about the community, sitting on both the PASS Board of Directors and the SQLBits organising committee. He recently co-authored a book on Microsoft Big Data Solutions and also authored the APS training course for Microsoft. You can find him on LinkedIn (JRJ) and Twitter (@jrowlandjones).

*     *     *     *     *

On April 15, 2014 Microsoft announced the next evolution of their Modern Data Warehouse strategy; launching the Analytics Platform System (APS). APS is an important step for many reasons. However, to me, the most important of those reasons is that it helps businesses complete the jigsaw on business data. In this blog post I am going to define what I mean by business data and explain how PolyBase has evolved; providing the bridge between heterogeneous data sources. In short, we are going to put the “Poly” in PolyBase.

Business Data

Business data comes in a variety of forms and exists in a diverse set of data sources. Those forms are sometimes described using terms such as relational, non-relational, structured, semi-structured or even un-structured. However, whatever term you choose to use doesn’t really matter. What matters is that the business has generated it and its employees (a.k.a. the users) need to be able to access said data, integrate it and draw data insights from it. This data is often disparate; spread liberally across the enterprise.

These users don’t see themselves as technical (although many are) and are often frustrated by the barriers created by having disparate data in a variety of forms. Having to write separate queries for different sources is difficult, time-consuming and raises many data quality challenges. I am sure you have seen this many times before. However, in the world of analytics the latency introduced by this kind of data integration is the real killer. By the time the data integration barrier has been solved the value of the insight has diminished. Consequently, business users need to have frictionless access to all of the data, all of the time.

In the modern world, there is only data, questions and a desire for answers. To enhance adoption we also need *something* that delivers using simple, familiar tools leveraging commodity technology and offering both high performance and low latency.

That *something* is PolyBase – underpinned by APS.


What is PolyBase, how does it work, and why is it such an important, innovative technology?

Put simply – it’s the bridge to your business data.

Why is it important? It is unique, innovative technology and it is available today in APS.

PolyBase was created by the team at the Jim Gray Systems Lab, led by Dr David DeWitt. Dr DeWitt is a technical fellow at Microsoft (i.e. he is important) and he’s also been a PASS Summit key-note speaker for several years. If you’ve never seen any of his presentations then you should absolutely address that. They are all free to watch and are available now; including a great session on PolyBase.

As I mentioned a moment ago, PolyBase is a bridge but it’s not just any old bridge. It is a fully parallelised super-highway for data. It’s like having your own fibre-optic bridge when everyone else has a copper ADSL bridge. It offers fast, run-time integration across relational data stored in APS and non-relational data stored in both Hadoop and Microsoft Azure Storage Blobs. 

Notice I didn’t just say the new Hadoop Region in APS – I just said Hadoop. That’s because PolyBase is different. It is agnostic, not proprietary, in its approach and in its architecture. PolyBase integrates with Hadoop clusters that reside outside the appliance just as it does with the new Hadoop Region that exists inside the appliance. This agnostic approach is also evident in its Hadoop distribution support; covering both Hortonworks (HDP) on both Windows and Linux and Cloudera (CDH) on Linux.

To achieve this unparalleled level of agnosticism, PolyBase uses a well-established enterprise pattern of employing “external tables” to provide the metadata of the external data. However, PolyBase takes this concept further by de-coupling the format and the data source from the definition of the external table.

This enables PolyBase to access data in a variety of sources and data formats, including RCFiles and Microsoft Azure Storage Blobs using wasb[s]. This is a key step. This process lays the foundation for other data sources to be plugged into the PolyBase architecture; putting the “Poly” in PolyBase.

Building Bridges

PolyBase builds the bridges to where the data is. Once the bridge has been defined (a simple case of a few DDL commands), PolyBase enables users to simply write queries using T-SQL. These queries can be against data in APS, Hadoop and/or Azure all at the same time. How amazing is that? I call this dynamic hybrid query execution. You can do some really clever things using hybrid queries. For example, you can read data from Hadoop, transform and enrich it in APS and persist the data back in Hadoop or Azure. That’s called round-tripping the data and that is just a taster of what is possible with hybrid query support.

There is more.

PolyBase can also leverage the computational resources available at the data source. In other words it can selectively issue MapReduce jobs against a Hadoop cluster. This is called split query execution. Like a true data surgeon, PolyBase is able to dissect a query into push-able and non-pushable expressions. The push-able ones are considered for submission as MapReduce jobs and the non-push-able parts are processed by APS.

It gets better.

The decision to push an expression is made on cost by the APS distributed query engine: Cost based split query execution against APS, Hadoop and Azure. Fantastic.

To achieve this feat PolyBase is able to hold detailed statistical information in the form of table and column level statistics. This level of knowledge about the data is lacking in Hadoop today. By having a mechanism for generating statistics APS and PolyBase can selectively assess when it is appropriate to use MapReduce and when it would be more cost-effective to simply import the data. 

The results can be dramatic. Even with “small” data you can see huge data volume reduction through the MapReduce split query process and significant delegation of computation to low-cost Hadoop clusters; providing maximum efficiency and business value. Plus if you are using the APS Hadoop Region you can also draw comfort from the ultra-low latency Infiniband connection between the two regions – leading to unparalleled data transfer speeds. This offers a completely new paradigm to the world of Hadoop.

Simple, Familiar Tools

Did I mention that all this is possible with just T-SQL? Literally there is nothing to really “learn” in order to be able to write PolyBase queries. If you can write T-SQL then you can query any PolyBase-enabled data source.

That is really important.

Think about how many users know T-SQL. Having a technology that is SQL-based is massive for adoption. Many projects have failed in the adoption phase only to wither on the vine. Imagine how many of your users would be able to simply access all of their data, gaining new insights, using nothing but their existing T-SQL skills thanks to PolyBase.

PolyBase changes the game and is available now in APS.

The Microsoft Infinity Room Photo Contest Has a Winner!

Congratulations to Edgar Rivera, whose Microsoft Infinity Room photo won the #InsightsAwait Photo Sweepstakes. “I never thought that stepping into some data visualization could be this cool,” Edgar tweeted. You can see his photo here.

Visitors to the Microsoft Infinity Room were invited to capture their experiences and tag their photos on Twitter or Instagram with the #InsightsAwait hashtag. You can view all of the contest entries here.

If you didn’t have a chance to visit the Infinity Room in San Francisco from April 15-17, take the 360-degree virtual tour and be inspired by the extraordinary found through data surrounding an ordinary object.

Also – want to learn more about Microsoft Big Data solutions? Hear CEO Satya Nadella discuss Microsoft’s drive towards a data culture during the Accelerate your insights event in San Francisco earlier this month. Watch the keynote on-demand now.

Truncate all tables in a SQL Server database

As a DBA I have found many occasions in testing of various SQL Server deployments and scripts where I need to load a database with data from a lower environment or where I need to alter a table that might already have data in it in such a way that I need to first eliminate the data before I can proceed. Depending on the foreign key constraints in place, clearing old data can be a tedious process.

ICYMI: Data platform momentum

The last couple months have seen the addition of several new products that extend Microsoft’s data platform offerings.  

At the end of January, Quentin Clark outlined his vision for the complete data platform, exploring the various inputs that are driving new application patterns, new considerations for handling data of all shapes and sizes, and ultimately changing the way we can reveal business insights from data.

 In February, we announced the general availability of Power BI for Office 365, and you heard from Kamal Hathi about how this exciting release simplifies business intelligence and how features like Power BI sites and Power BI Q&A, Power BI helps anyone, not just experts, gain value from their data. You also heard from Quentin Clark about how Power BI helps make big data work for everyone by bringing together easy access to data, robust tools that everyone can use, and a complete data platform.

In March, we announced that SQL Server 2014 would be general available beginning April 1, and shared how companies are already taking advantage of in-memory capabilities and hybrid cloud scenarios that SQL Server enables. Shawn Bice explored the platform continuum, and how with this latest release, developers can continue to use SQL Server on-premises while also dipping their toes into the possibilities with the cloud using Microsoft Azure. Additionally, Microsoft Azure HDInsight was made generally available to support Hadoop 2.2, making it easy to deploy Hadoop in the cloud.

 And earlier this month at the Accelerate your insights event in San Francisco, CEO Satya Nadella discussed Microsoft’s drive towards a data culture. In addition, we announced two other key capabilities to extend the robustness of our data platform: the Analytics Platform System, an evolution of the Parallel Data Warehouse with the addition of a Hadoop region for your unstructured data, and then a preview of the Microsoft Azure Intelligent Systems Service to help tap into the Internet of Your Things. In case you missed it, watch the keynotes on-demand, and don’t miss out on experiencing the Infinity Room, to inspire you with the extraordinary things that can be found in your data.

On top of our own announcements, we’ve been recently honored to be recognized by Gartner as a Leader in the 2014 Magic Quadrants for Data Warehouse Database Management Systems and Business Intelligence and Analytics Platforms. And SQL Server 2014, in partnership with Hewlett Packard, set two world records for data warehousing performance and price/performance.

With these enhancements across the entire Microsoft data platform, there is no better time than now to dig in. Learn more about our data platform offerings. Brush up on your technical skills for free on the Microsoft Virtual Academy. Connect with other SQL Server experts through the PASS community. Hear from Microsoft’s engineering leaders about Microsoft’s approach to developing the latest offerings. Read about the architecture of data-intensive applications in the cloud computing world from Mark Souza, which one commenter noted was a “great example for the future of application design/architecture in the Cloud and proof that the toolbox of the future for Application and Database Developers/DBAs is going to be bigger than the On-Prem one of the past.” And finally, come chat in-person – we’ll be hanging out at the upcoming PASS Business Analytics and TechEd events and are eager to hear more about your data opportunities, challenges, and of course, successes.

What can your data do for you?

Upcoming conference speeches and workshops #ssas #tabular #dax #powerpivot

Between May and July I and Alberto will be speaker at several conferences, and I think it could be useful to write a single blog post with a recap:

We will also deliver several courses:

See you around the world! 

Improve SSIS data flow buffer performance

I have a query that executes quite slowly in SQL Server Management Studio, but when I transfer the data with a SQL Server Integration Services data flow, it takes even more time! The source and destination are both fast and have no bottlenecks, and there are no significant transformations inside the data flow. I read some best practices on the Internet, and they told me to enlarge the buffer size used in the data flow. It is a bit faster now, but still too slow compared to the original query in SSMS. What is happening here?

SQL Saturday Chicago (#291) – Clash of the Row Goals – Demos

Thank you to everyone who joined me today at SQL Saturday Chicago for my “Clash of the Row Goals” session. I’m honored that so many people would attend a talk with such an incredibly arcane title — and always glad to find a curious, interested, and intelligent audience. The demos I showed during the session are attached to this post. If you have any followup questions, feel free to leave a comment below. Enjoy!…(read more)

Review of the book Learning Windows Azure Mobile Services for Windows 8 and Windows Phone 8

Recently I had the opportunity to read the book Learning Windows Azure Mobile Services for Windows 8 and Windows Phone 8 written by Geoff Webber-Cross (@webbercross) and published by Packt Publishing.

In the last year, Windows Azure has increased the offer of cloud-based services which are hosted on Windows Azure platform. One of those new services is Windows Azure Mobile Services that allows developers to build web-connected application easily.

Before reading this book my knowledge on Windows Azure was on other topics such as SQL Azure Database, Storage and Windows Azure Virtual Machine. When I have heard about the opportunity to read and review this book I have thought it was a great opportunity to learn something new about the services offered by Windows Azure for mobile application.

The book covers all features of Windows Azure Mobile Services starting from the activity to prepare the Windows Azure Mobile Services Portal up to the Best Practices for Web-Connected Applications development. When you start to develop an Apps for Windows 8 or Windows Phone 8 with Windows Azure Mobile Services you may want to know what software and hardware are needed, this topic is covered in the second chapter. Security, Customization, Notification and Scalability are topics covered in the chapters 3, 4, 5 and 6.

Another thing I have appreciated in this book is the attention to the cost of services; many times in the book I read sentences like this “At this point, if we choose the … we will start incurring costs”. As confirm, the concept “Choosing a pricing plan for services you wish to implement” is covered at the beginning of the first chapter.

There are lot of pictures in the book, which make it practical and easy to read. If you want to look inside the book you can download a sample chapter here and this is the table of contents:

  • Chapter 1: Preparing the Windows Azure Mobile Services Portal
  • Chapter 2: Start Developing with Windows Azure Mobile Services 19
  • Chapter 3: Securing Data and Protecting the User
  • Chapter 4: Service Customization with Scripts
  • Chapter 5: Implementing Push Notifications
  • Chapter 6: Scaling Up with the Notifications Hub
  • Best Practices for Chapter 7: Web-connected Apps 

This book cannot missing in your digital or physical library, enjoy!

SQL File Layout Viewer at SQL Saturday Chicago 2014

I’m very excited to be speaking at SQL Saturday Chicago again tomorrow, 4/26. I’ll be doing a deep dive session on SQL Server file and storage internals with this freeware tool, SQL File Layout Viewer: Also looking forward to seeing all my Chicago SQL Server friends. Come on out!…(read more)