Monthly Archives: May 2014

Azure now in the leader quadrant for IaaS from Gartner

Gartner tends to publish magic quadrant leader boards related to a variety of technology areas.

It was interesting to note that the latest leader board has Azure moved up into the Leader quadrant. The only other player in that quadrant is Amazon. That’s a big step up for the team, given the IaaS business really only went to GA in April last year.

You’ll find details here: Gartner Report

 

Enabling Cross DB Access to Contained SQL Server Users in Partial Contained Databases

Recently in one of our HR projects we implemented the contained SQL Server database concept. We implemented the policy that going forward we will create Contained SQL Server Users without login in contained databases which will ease the process of migrating the databases across data centers without worrying about the login scripts. For this project we created two partial contained databases called Empprofile and Empcompensation with the contained user ProfileUser and HrinfoUser in each respective database. Both the users were created using “SQL USER WITH PASSWORD” were granted read, write and execute privileges. This was sufficient for the HR application to work.

The updated Survey pattern for Power Pivot and Tabular #powerpivot #tabular #ssas #dax

One of the first models I created for the many-to-many revolution white paper was the Survey one. At the time, it was in Analysis Services Multidimensional, and then we implemented it in Analysis Services Tabular and in Power Pivot, using the DAX language.

I recently reviewed the data model and published it in the Survey article on DAX Patterns site. The Survey pattern is the foundation for others, such as the Basket Analysis, and it is widely used in many different business scenario. I was particularly happy to know it has been using to perform data analysis for cancer research!

In this article I did some maintenance on the DAX formulas, checking that the proper error handling is part of the formulas, and highlighting some differences in slicers behavior between Excel 2010 and Excel 2013, which could be particularly important for the Survey scenario. As usual, we provide sample workbooks for both Excel 2010 and Excel 2013, and we use DAX Formatter to make the DAX code easier to read. Any feedback will be appreciated!

Working with Temporal Data in SQL Server

My third Pluralsight course, Working with Temporal Data in SQL Server, is published. I am really proud on the second part of the course, where I discuss optimization of temporal queries. This was a nearly impossible task for decades. First solutions appeared only lately. I present all together six solutions (and one more that is not a solution), and I invented four of them. http://pluralsight.com/training/Courses/TableOfContents/working-with-temporal-data-sql-server

Virginia Tech Exec Q&A

Virginia Tech is using the Microsoft Azure Cloud to create cloud-based tools to assist with medical breakthroughs via next-generation sequence (NGS) analysis. This NGS analysis requires both big computing and big data resources. A team of computer scientists at Virginia Tech is addressing this challenge by developing an on-demand, cloud-computing model using the Azure HDInsight Service. By moving to an on-demand cloud computing model, researchers will now have easier, more cost-effective access to DNA sequencing tools and resources, which could lead to even faster, more exciting advancements in medical research.

We caught up with Wu Feng, Professor in the Department of Computer Science and Department of Electrical & Computer Engineering and the Health Sciences at Virginia Tech, to discuss the benefits he is seeing with cloud computing.

Q: What is the main goal of your work?

We are working on accelerating our ability to use computing to assist in the discovery of medical breakthroughs, including the holy grain of “computing a cure” for cancer. While we are just one piece of a giant pipeline in this research, we seek to use computing to more rapidly understand where cancer starts in the DNA. If we could identify where and when mutations are occurring, it could provide an indication of which pathways may be responsible for the cancer and could, in turn, help identify targets to help cure the cancer. It’s like finding a “needle in a haystack,” but in this case we are searching through massive amounts of genomic data to try to find these “needles” and how they connect and relate to each other “within the haystack.”

Q: What are some ways technology is helping you?

We want to enable the scientists, engineers, physicists and geneticists and equip them with tools so they can focus on their craft and not on the computing. There are many interesting computing and big data questions that we can help them with, along this journey of discovery.

Q: Why is cloud computing with Microsoft so important to you?

The cloud can accelerate discovery and innovation by computing answers faster, particularly when you don’t have bountiful computing resources at your disposal. It enables people to compute on data sets that they might not have otherwise tried because they didn’t have ready access to such resources.

For any institution, whether a company, government lab or university, the cost of creating or updating datacenter infrastructure, such as the building, the power and cooling, and the raised floors, just so a small group of people can use the resource, can outweigh the benefits. Having a cloud environment with Microsoft allows us to leverage the economies of scale to aggregate computational horsepower on demand and give users the ability to compute big data, while not having to incur the institutional overhead of personally housing, operating and maintaining such a facility.

Q: Do you see similar applications for businesses?

Just as the Internet leveled the playing field and served as a renaissance for small businesses, particularly those involved with e-commerce, so will the cloud. By commoditizing “big data” analytics in the cloud, small businesses will be able to intelligently mine data to extract insight with activities, such as supply-chain economics and personalized marketing and advertising.

Furthermore, quantitative analytic tools, such as Excel DataScope in the cloud, can enable financial advisors to accelerate data-driven decision-making via commoditized financial analytics and prediction. Specifically, Excel DataScope delivers data analytics, machine learning and information visualization to the Microsoft Azure Cloud.

In any case, just like in the life sciences, these financial entities have their own sources of data deluge. One example is trades and quotes (TAQ), where the amount of financial information is also increasing exponentially. Unfortunately, to make the analytics process on the TAQ data a more tractable one, the data is often triaged into summary format and thus could potentially and inadvertently filter out critical data that should not have been.

Q: Are you saving money or time or experiencing other benefits?

Back when we first thought of this approach, we were wondering if it would even a feasible solution for the cloud. For example, with so much data to upload to the cloud, would the cost of transferring data from the client to the cloud outweigh the benefits of computing in the cloud?  With our cloud-enabling of a popular genome analysis pipeline, combined with our synergistic co-design of the algorithms, software, and hardware in the genome analysis pipeline, we realized about a three-fold speed-up over the traditional client-based solution.

Q: What does the future look like?

There is big business in computing technology, whether it is explicit, as in the case of personal computers and laptops, or implicit, as in the case of smartphones, TVs or automobiles. Just look how far we have come over the past seven years with mobile devices. However, the real business isn’t in the devices themselves, it’s in the ecosystem and content that supports these devices: the electronic commerce that happens behind the scenes. In another five years, I foresee the same thing happening with cloud computing. It will become a democratized resource for the masses. It will get to the point where it will be just as easy to use storage in the cloud as it will be to flip a light switch; we won’t think twice about it. The future of computing and data lies in the cloud, and I’m excited to be there as it happens.

 

For more information about Azure HDInsight, check out the website and start a free trial today.

Detecting Hyper-Threading state

To interpret performance counters and execution statistics correctly, it is necessary to know state of Hyper-Threading (on or off). In principle, at low overall CPU utilization, for non-parallel execution plans, it should not matter whether HT is enabled or not. Of course, DBA life is never that simple (see my other blogs on HT). The state of HT does matter at high overall utilization and in parallel execution plans depending on the DOP. SQL Server does seem to try to allocate threads on distinct…(read more)

Columnstore Case Study #2: Columnstore faster than SSAS Cube at DevCon Security

Preamble

This is the second in a series of posts documenting big wins encountered using columnstore indexes in SQL Server 2012 & 2014.  Many of these can be found in my big deck along with details such as internals, best practices, caveats, etc.  The purpose of sharing the case studies in this context is to provide an easy-to-consume quick-reference alternative. See also Columnstore Case Study #1: MSIT SONAR Aggregations

Why Columnstore?

As stated previously, If we’re looking for a subset of columns from one or a few rows, given the right indexes, SQL Server can do a superlative job of providing an answer. If we’re asking a question which by design needs to hit lots of rows—DW, reporting, aggregations, grouping, scans, etc., SQL Server has never had a good mechanism—until columnstore.

Columnstore indexes were introduced in SQL Server 2012. However, they’re still largely unknown. Some adoption blockers existed; yet columnstore was nonetheless a game changer for many apps.  In SQL Server 2014, potential blockers have been largely removed & they’re going to profoundly change the way we interact with our data.  The purpose of this series is to share the performance benefits of columnstore & documenting columnstore is a compelling reason to upgrade to SQL Server 2014.

The Customer

DevCon Security provides home & business security services & has been in business for 135 years. I met DevCon personnel while speaking to the Utah County SQL User Group on 20 February 2012. (Thanks to TJ Belt (b|@tjaybelt) & Ben Miller (b|@DBADuck) for the invitation which serendipitously coincided with the height of ski season.)

The App: DevCon Security Reporting: Optimized & Ad Hoc Queries

DevCon users interrogate a SQL Server 2012 Analysis Services cube via SSRS. In addition, the SQL Server 2012 relational back end is the target of ad hoc queries; this DW back end is refreshed nightly during a brief maintenance window via conventional table partition switching.

SSRS, SSAS, & MDX

Conventional relational structures were unable to provide adequate performance for user interaction for the SSRS reports. An SSAS solution was implemented requiring personnel to ramp up technically, including learning enough MDX to satisfy requirements.

Ad Hoc Queries

Even though the fact table is relatively small—only 22 million rows & 33GB—the table was a typical DW table in terms of its width: 137 columns, any of which could be the target of ad hoc interrogation. As is common in DW reporting scenarios such as this, it is often nearly to optimize for such queries using conventional indexing.

DevCon DBAs & developers attended PASS 2012 & were introduced to the marvels of columnstore in a session presented by Klaus Aschenbrenner (b|@Aschenbrenner)

The Details

Classic vs. columnstore before-&-after metrics are impressive.

 

 

 

Scenario

Conventional Structures

Columnstore

Δ

SSRS via SSAS

10 – 12 seconds

1 second

>10x

Ad Hoc

5-7 minutes
(300 – 420 seconds)

1 – 2 seconds

>100x

Here are two charts characterizing this data graphically.  The first is a linear representation of Report Duration (in seconds) for Conventional Structures vs. Columnstore Indexes. 

image

As is so often the case when we chart such significant deltas, the linear scale doesn’t expose some the dramatically improved values corresponding to the columnstore metrics.  Just to make it fair here’s the same data represented logarithmically; yet even here the values corresponding to 1 –2 seconds aren’t visible. 

image

The Wins

  1. Performance: Even prior to columnstore implementation, at 10 – 12 seconds canned report performance against the SSAS cube was tolerable. Yet the 1 second performance afterward is clearly better. As significant as that is, imagine the user experience re: ad hoc interrogation. The difference between several minutes vs. one or two seconds is a game changer, literally changing the way users interact with their data—no mental context switching, no wondering when the results will appear, no preoccupation with the spinning mind-numbing hurry-up-&-wait indicators.  As we’ve commonly found elsewhere, columnstore indexes here provided performance improvements of one, two, or more orders of magnitude.
  2. Simplified Infrastructure: Because in this case a nonclustered columnstore index on a conventional DW table was faster than an Analysis Services cube, the entire SSAS infrastructure was rendered superfluous & was retired.
  3. PASS Rocks: Once again, the value of attending PASS is proven out. The trip to Charlotte combined with eager & enquiring minds let directly to this success story. Find out more about the next PASS Summit here, hosted this year in Seattle on November 4 – 7, 2014.

DevCon BI Team Lead Nathan Allan provided this unsolicited feedback:

“What we found was pretty awesome. It has been a game changer for us in terms of the flexibility we can offer people that would like to get to the data in different ways.”

Summary

For DW, reports, & other BI workloads, columnstore often provides significant performance enhancements relative to conventional indexing.  I have documented here, the second in a series of reports on columnstore implementations, results from DevCon Security, a live customer production app for which performance increased by factors of from 10x to 100x for all report queries, including canned queries as well as reducing time for results for ad hoc queries from 5 – 7 minutes to 1 – 2 seconds. As a result of columnstore performance, the customer retired their SSAS infrastructure.

I invite you to consider leveraging columnstore in your own environment. Let me know if you have any questions.