Category Archives: Business Intelligence

A Biml Case Study

Business Intelligence Markup Language (Biml) is a SQL Server Integration Services (SSIS) design patterns engine. A Story of Productivity A while back I was contacted by a client experiencing a familiar issue. The Production SQL Server instance was experiencing performance issues due to conflicts between customers and enterprise reporting needs. They contacted a friend who contacted me. When we spoke they asked, “Can you help?” “Yes,” I replied, “I can help.” We scheduled a three-day visit. On Day…(read more)

PASS Summit 2014 Pre-Con Preview: Davide Mauri

If you’re into Data Warehousing, you may be interested in attending to the workshop I’ll deliver at PASS Summit 2014 in Seattle on November 4th.

http://www.sqlpass.org/summit/2014/Sessions/PreConferenceSessions.aspx

The workshop is entirely dedicated to explaining why and how a *successful* Data Warehouse can be thought, designed, architected, built, loaded and tested, using the  Agile approach that, so far, has mainly be applied to the application development field and in the last year has gained traction also (and finally I would say) in the BI field. Both Gartner and Forrester also underline that the Agile is a key factor for success in modern BI world, since has been verified that 50% of the requirement change in the first year in a BI project.

If you want to read more about the workshop, we you read the Q&A just published here:

http://www.sqlpass.org/Community/PASSBlog/tabid/1476/entryid/676/PASS-Summit-2014-Pre-Con-Preview-Davide-Mauri.aspx

In addition to that I’d also like to share the agenda of the workshop, that will give you even more information on what we’ll discuss on that day:

  • Why a Data Warehouse?
  • The Agile Approach
  • Modeling the Data Warehouse
    • Kimball, Inmon & Data Vault
    • Dimensional Modeling
    • Dimension, Fact, Measures
    • Star & Snowflake Schema
    • Transactional, Snapshot and Temporal Fact Tables
    • Slowly Changing Dimensions
  • Engineering the Solution
    • Building the Data Warehouse
      • Solution Architecture
      • Naming conventions, mandatory columns and other rules
      • Views and Stored Procedure usage
    • Loading the Data Warehouse
      • ETL Patterns
      • Best Practices
    • Automating Extraction and Loading
      • Making the solution automatable
      • BIML
  • Unit Testing Data
  • The Complete Picture
    • Where Big Data comes into play?
  • After the Data Warehouse
    • Optimized Hardware & Software
  • Conclusions

As you can see it will be a fully packed day…so brings two cups of coffee and you’ll be good 🙂

See you in Seattle!

PASS SQL Saturday #356 Slovenia Pre-Conference Seminars

I am proud and glad I can announce two top pre-conference seminars at the PASS SQL Saturday #356 Slovenia conference. The speakers and the seminars titles are:

Both seminars will take place on Friday, December 12th, in the classrooms of our sponsor Kompas Xnet. The price for a seminar is € 149, with early bird price at € 119. Early bird price is valid until October 31st.

I am also using this opportunity to explain how and why we decided for these two seminars. The decision was made by the conference organizers, Matija Lah, Mladen Prajdič, and Dejan Sarka. There was a lot of discussion in different social networks about PASS Summit pre-conference seminars lately. If you have any objections for our seminars, please do not start big discussions in public; please tell them to the three of us directly.

First of all, unlike at the PASS Summit seminars, the speakers are not going to earn big money here, and therefore it is not really worth spending much time and energy on our decision. We think that any of the speakers who sent proposals for our SQL Saturday could present a top quality seminar. We would like to enable seminars for every speaker that wants to deliver one. However, in a small country, we will have already hard time to fill up the two seminar we have currently. Our intention is to reimburse at least part of the money the speakers spent on their own for travelling expenses and accommodation. In our opinion, it makes sense to do this for the speakers that spent the most for the travelling. Coming here from USA is expensive, and it also takes three days in both directions. That’s why we decided to organize the seminars for the first two speakers from USA.

Of course, this is not the last event. If everything goes well with SQL Saturday #356 and with the seminars, we will definitely try to organize more events in the future, and invite more speaker to deliver a seminar as well.

Thank you for understanding!

24 Hours of PASS (September 2014): Recordings Now Available!

Sessions of the event 24 Hours of PASS: Summit Preview Edition (which was held on last September 9th) were recorded and now they are available for online streaming!

If you have missed one session in particular or the entire event, you can view it or review your preferred sessions; you can find all details here.

What could you aspect from the next PASS Summit? Find it out on recorded sessions of this edition of 24 Hours of PASS.

Announcing the PASS 24 HOP Challenge

Calling all data junkies! How smart are you?  Want to get smarter?

Play along with #pass24hop Challenge on Twitter starting at 5:00 AM PT Tuesday, September 9, 2014 to win a free Microsoft Exam Voucher!  Simply watch 24 Hours of PASS and be the first to answer the question correctly. At the beginning of each 24 live 24 Hours of PASS sessions (approximately 5-8 minutes into each hour) a new question regarding the session will be posted online on the  @SQLServer Twitter account. The first tweet with the correct answer will win a prize.  Your answer must include hashtags #pass24hop and #24hopquiz.

To take part in the #pass24hop Challenge, you must:

  1. Sign in to your Twitter account. If you do not have an account, visit www.twitter.com to create one. Twitter accounts are free.
  2. Once logged into your Twitter account, follow the links and instructions to become a follower of @SQLServer.
  3. From your own account, reply your response to the question tweeted by @SQLServer.  
  4. Your tweet must contain both the #pass24hop and #24hopquiz hashtags to be eligible for entry.
  5. Your tweet must include the complete answer to the question, or it will be disqualified.
  6. The first person to correctly tweet a correct reply to the corresponding question will win the prize described below.  

Register now for 24 Hours of PASS and get ready for 24 hours of play!  

Learn more about the 24 Hours of PASS read the official rules below.

 

NO PURCHASE NECESSARY. COMMON TERMS USED IN THESE RULES:

These are the official rules that govern how the ’24 Hours of PASS Social Media Answer & Question Challenge (“Sweepstakes”) promotion will operate. This promotion will be simply referred to as the “Sweepstakes” throughout the rest of these rules. In these rules, “we,” “our,” and “us” refer to Microsoft Corporation, the sponsor of the Sweepstakes. “You” refers to an eligible Sweepstakes entrant.

WHAT ARE THE START AND END DATES?

This Sweepstakes starts at 5:00 AM PT Tuesday, September 9, 2014 and ends at 5:00 AM PT Tuesday, September 9, 2014 and ends at 7:00 AM PT Wednesday, September 10, 2014 (“Entry Period”). The Sweepstakes consists of 24 prizes. Each Prize Period will begin immediately following each of the 24 session and run for 60 minutes.  

CAN I ENTER?

You are eligible to enter this Sweepstakes if you meet the following requirements at time of entry:

· You are professional or enthusiast with expertise in SQL Server or Business Intelligence and are 18 years of age or older; and

o If you are 18 of age or older, but are considered a minor in your place of residence, you should ask your parent’s or legal guardian’s permission prior to submitting an entry into this Sweepstakes; and

· You are NOT a resident of any of the following countries: Cuba, Iran, North Korea, Sudan, and Syria.

PLEASE NOTE: U.S. export regulations prohibit the export of goods and services to Cuba, Iran, North Korea, Sudan and Syria. Therefore residents of these countries / regions are not eligible to participate

• You are NOT an employee of Microsoft Corporation or an employee of a Microsoft subsidiary; and

• You are NOT involved in any part of the administration and execution of this Sweepstakes; and

• You are NOT an immediate family (parent, sibling, spouse, child) or household member of a Microsoft employee, an employee of a Microsoft subsidiary, or a person involved in any part of the administration and execution of this Sweepstakes.

This Sweepstakes is void wherever prohibited by law.

HOW DO I ENTER?  

At the beginning of each 24 live 24 Hours of PASS sessions (approximately 5-8 minutes into each hour) a new question regarding the session will be posted online on the  @SQLServer Twitter account. The first tweet with the correct answer will win a prize.  Your answer must include hashtags #pass24hop and #24hopquiz.  Failure to use this hashtag will automatically disqualify you.

To enter, you must do all of the following:

  1. Sign in to your Twitter account. If you do not have an account, visit www.twitter.com to create one. Twitter accounts are free.
  2. Once logged into your Twitter account, follow the links and instructions to become a follower of @SQLServer
  3. From your own account, reply your response to the question tweeted by @SQLServer  
  4. Your tweet must contain both the #pass24hop and #24hopquiz hashtags to be eligible for entry
  5. Your tweet must include the complete answer to the question, or it will be disqualified.
  6. The first person to correctly tweet a correct reply to the corresponding question will win the prize described below.  

Limit one entry per person, per session.  For the purposes of these Official Rules, a “day” begins 5:00 AM PT Tuesday, September 9, 2014 and ends at 7:00 AM PT Wednesday, September 10, 2014 (“Entry Period”). If you reply with more than one answer per session, all replies received from you for that session will be automatically disqualified.  You may submit one answer to each session, but will be eligible to win only one prize within the 24 hour contest period.

We are not responsible for entries that we do not receive for any reason, or for entries that we receive but are not decipherable for any reason, or for entries that do not include your Twitter handle.

We will automatically disqualify:

  • Any incomplete or illegible entry; and
  • Any entries that we receive from you that do not meet the requirements described above.

WINNER SELECTION AND PRIZES

The first person to correctly respond will receive a Microsoft Exam Voucher.  Approximate Retail Value each $150.  A total of twenty four prizes are available.

Within 48 hours following the Entry Period, we, or a company acting under our authorization, will select one winner per session to win one free Microsoft Certification Exam.  Voucher has a retail value of $ $150.  Prize eligibility is limited to one prize within the contest period.  If you are selected as a winner for a session, you will be ineligible for additional prizes for any other session.  In the event that you are the first to answer correctly on multiple session, the prize will go to the next person with the correct answer. 

If there is a dispute as to who is the potential winner, we reserve the right to make final decisions on who is the winner based on the accuracy of the answer provided, ensuring that the rules of including hashtags is followed, and the times the answers arrives based on what times are listed on www.twitter.com.

Selected winners will be notified via a Direct Message (DM) on Twitter within 48 business hours of the daily drawing. The winner must reply to our Direct Message (DM) within 48 hours of notification via DM on Twitter. If the notification that we send is returned as undeliverable, or you are otherwise unreachable for any reason, or you do not respond within 48 business hours, we will award the prize to an alternate winner as randomly selected. Only one alternate winner will be selected and notified; after which, if unclaimed, the prize will remain unclaimed.

If you are a potential winner, we may require you to sign an Affidavit of Eligibility, Liability/Publicity Release within 10 days of notification. If you are a potential winner and you are 18 or older, but are considered a minor in your place of legal residence, we may require your parent or legal guardian to sign all required forms on your behalf. If you do not complete the required forms as instructed and/or return the required forms within the time period listed on the winner notification message, we may disqualify you and select an alternate winner.

If you are confirmed as a winner of this Sweepstakes:

  • You may not exchange your prize for cash or any other merchandise or services. However, if for any reason an advertised prize is unavailable, we reserve the right to substitute a prize of equal or greater value; and
  • You may not designate someone else as the winner. If you are unable or unwilling to accept your prize, we will award it to an alternate potential winner; and
  • If you accept a prize, you will be solely responsible for all applicable taxes related to accepting the prize; and
  • If you are otherwise eligible for this Sweepstakes, but are considered a minor in your place of residence, we may award the prize to your parent/legal guardian on your behalf.

WHAT ARE YOUR ODDS OF WINNING? 
There will be 24 opportunities to respond with the correct answer. Your odds of winning this Challenge depend on the number of responses and being the first to answer with the correct answer.

WHAT OTHER CONDITIONS ARE YOU AGREEING TO BY ENTERING THIS CHALLENGE? 
By entering this Challenge you agree:

· To abide by these Official Rules; and

· To release and hold harmless Microsoft, and its respective parents, subsidiaries, affiliates, employees and agents from any and all liability or any injury, loss or damage of any kind arising from or in connection with this Challenge or any prize won; and

· That Microsoft’s decisions will be final and binding on all matters related to this Challenge; and

· That by accepting a prize, Microsoft may use of your proper name and state of residence online and in print, or in any other media, in connection with this Challenge, without payment or compensation to you, except where prohibited by law

WHAT LAWS GOVERN THE WAY THIS CHALLENGE IS EXECUTED AND ADMINISTRATED? 
This Challenge will be governed by the laws of the State of Washington, and you consent to the exclusive jurisdiction and venue of the courts of the State of Washington for any disputes arising out of this Challenge.

WHAT IF SOMETHING UNEXPECTED HAPPENS AND THE CHALLENGE CAN’T RUN AS PLANNED? 
If cheating, a virus, bug, catastrophic event, or any other unforeseen or unexpected event that cannot be reasonably anticipated or controlled, (also referred to as force majeure) affects the fairness and / or integrity of this Challenge, we reserve the right to cancel, change or suspend this Challenge. This right is reserved whether the event is due to human or technical error. If a solution cannot be found to restore the integrity of the Challenge, we reserve the right to select winners from among all eligible entries received before we had to cancel, change or suspend the Challenge. If you attempt to compromise the integrity or the legitimate operation of this Challenge by hacking or by cheating or committing fraud in ANY way, we may seek damages from you to the fullest extent permitted by law. Further, we may ban you from participating in any of our future Challenge, so please play fairly.

HOW CAN YOU FIND OUT WHO WON? 
To find out who won, send an email to v-daconn@microsoft.com by September 15, 2014 with the subject line: “SQL Server QQ Winners

WHO IS SPONSORING THIS CHALLENGE? 
Microsoft Corporation 
One Microsoft Way 
Redmond, WA 98052

On Agile Data Warehousing

In the last month, I’ve been working on the slide deck of my Agile Data Warehousing workshop. In order to give to it additional value that goes beyond the pure technical aspects, and since now the “Agile” approach is becoming more and more mainstream also (and finally!) on BI, I did a small research to check what one can find on the web regarding this topic. Many things happened from the very first time I presented at PASS 2010, where I first mentioned the need to be Agile (or “Adaptive” as I prefer to say when talking about BI & Agility). In 2011 Gartner, at their BI Summit, stated through the voice of Andy Bitterer that

50% of requirements change in the first year of a BI project

and, as a result, the only possible way to succeed in a BI project is to be able to adapt quickly to the new requirements and requests. The doors to Agile BI were opened.

Agile BI as grown from that point on, until the point that Forrester even started to evaluate Agile Business Intelligence Platform, even nominating Microsoft as one of the Leaders:

Microsoft named a Leader in Agile Business Intelligence by Forrester

I must say I’m not 100% with the definition of Agile BI the Forrester gives, since it puts together to many things (Data Visualization, Automation, Self-Service BI just to name a few), but I understand that they see the things from the end user perspective, that simply wants to “do everything, immediately, easily and nicely” with its data. There is also a definition on Wikipedia (page created on January 2013) that is better, more complete and less marketing-oriented:

http://en.wikipedia.org/wiki/Agile_Business_Intelligence

Beside those definitions, terms like Agile BI and Lean BI became quite common. Of course, with them, came also the idea of Agile Project Management and Agile Modeling. Especially this latter subject seems to be very hot and of course is something that is also close to my interests. Now, I won’t want to go into a deep discussion of the topic, telling you what it good and what is bad. There is already a lot on the web for or against any possible modeling solutions. Data Vault, BEAM, Model-Storming…a simple search on the web and you’ll find thousands of articles. Who’s the best? Should we go for Data Vault? Or for an Inmon-style DWH? Or Kimball? Or something else?

Well…I don’t really care. Or, to be honest, a care just a little bit.

Now, since “Ideas without execution are hallucinations”, and models are ideas after all, it’s my strong opinion that you don’t model the be agile: you “engineer” to be agile. Why? It’s simple: all models are agile…since they are models, and nothing more. Is not a problem to change a model, since it’s “just” a formal definition of a system…(of course, I’m bringing the idea to the extreme here)  and, since we’re assuming that business requirement will be changing, you known in advance that no model that will satisfy them all (immediately) exists (yeah, you can try to model the “Universal Data Model” but it’s going to be *very* complex…). So, the main point is to be able to bring changes quickly, with a measurable quality, in a controlled and economic way.

We all know that the one and only one model that should be presented to the end user is the Dimensional Model. This is how your Data Mart should look like. But how do you model your Data Warehouse is completely up to you. And it will change over time, for sure. So how you implement the process in order to extract, transform and load the data, is the key point. That implementation must be agile. What lies behind the scenes, following the information hiding principle, should be considered a simple “technical enabler” that could change at any time. So, if one prefer to use Data Vault, or Inmon, or just store anything in some hundreds Hadoop server…I don’t see any problem with that. As soon as you have defined an engineered approach with naming conventions, design pattern automation, quality checks, metadata and all the stuff in order to make sure that when you have to change something, you can do the smallest change possible, measure its impact, and test the result.

I’ve been trying to apply Agile principles to BI since 2003…I’ve been through any possible changes that you can imagine (even a complete change of an ERP that was the main source of data) and the most important thing I’ve learned is that the only model that works is the one that is liquid and is able to adapt quickly to changing requirements. I usually start modeling in the easiest way possible, and thus I apply the Dimension Model, and then I make all the changes to it in order to be able to keep

  • All the data at the highest granularity
  • Optimal performances
  • Historical Informations (that may not be visible to end user, but may be needed to correctly transform data)

Which, for complex DWH, means that at the beginning the DWH and the Data Mart overlaps, and that they diverge as the project goes on. In one project we even decided to go for a normalized model of data since the DWH became the source not only for reporting and analysis but also for other, more operative, duties.

Now, in order to be really agile, it’s mandatory to have an engineered approach that make sure that from agility the project doesn’t fall into anarchy. Because this is the biggest risk. The line that separates the two realities it’s very thin and crossing it is very easy. When you have a team of people, or they work as one, or Agile BI is not for you. Otherwise chaos will reign. And to make sure this does not happen, you have to have a streamlined building process, tools and methods (design patterns, frameworks and so on) so that everyone can technically do a good job and technical quality of the outcome is not only proportional to the experience of the person doing it.

It’s really important that everyone who wants to approach Agile BI understand the “engineering” part. I found it always underestimated and in all post I’ve found on the web, I never read someone stressing the importance of that part. That’s why I felt the urge to write this post, and that’s why I’ll go very deep in this topic during my PASS Workshop.

Now, before finishing the post, there is still one thing missing, but vital, for the success of an Agile VI solution: testing. Agility cannot exist if you don’t have an automated (or semi-automated) testing framework that assures you and your users that no errors will be introduced in the data as a result of a change done to satisfy some new or changed requirements. This is mandatory and I’m quite disappointed to see that almost no-one underline this point enough. Forrester doesn’t even took into consideration this point when evaluating the existing “Agile BI Platforms”. That’s a very big mistake in my opinion…since everyone give for granted data quality, but it’s actually the most difficult thing to obtain and maintain.

Testing frameworks are quite common in development, even Visual Studio has a testing engine integrated, and they should become common in BI to. Something is starting to appear (https://nbi.codeplex.com/), but I wish that also big players (Microsoft above all) start to take this subject more seriously. How cool and useful will be a strong integration of testing in SSIS? After DWH/DM/Cube loading one could launch all the tests (maybe done right from Excel, from a power-user, or even created automatically if certain conditions are met…say the generation of year balance) and make sure that the freshly produced data are of good quality.

Just like water. Because data IS water. I won’t drink it if not tested.

SSIS Design Patterns Training in London 8-11 Sep!

A few seats remain for my course SQL Server Integration Services 2012 Design Patterns to be delivered in London 8-11 Sep 2014. Register today to learn more about: New features in SSIS 2012 and 2014 Advanced patterns for loading data warehouses Error handling The (new) Project Deployment Model Scripting in SSIS The (new) SSIS Catalog Designing custom SSIS tasks Executing, managing, monitoring, and administering SSIS in the enterprise Business Intelligence Markup Language (Biml) BimlScript ETL Instrumentation…(read more)

24 Hours of PASS (September 2014): Summit Preview Edition

Which sessions you can expect to find at the next PASS Summit 2014 ? Find it out on September 09, 2014 (12:00 GMT) at the free online event: 24 Hours of PASS: Summit Preview Edition.

Register now at this link.

No matter from what part of the world you will follow the event, the important thing is to know that they will be 24 hours of continuous training on SQL Server and Business Intelligence on your computer!

24 Hours of PASS (June 2014): Recordings Now Available!

Sessions of the event 24 Hours of PASS: SQL Server 2014 (which was held on last June at 25th and 26th) were recorded and now they are available for online streaming!


 


If you have missed one session (in particular) or the entire live event, you can view it or review your preferred sessions here.


What could you aspect from the next PASS Summit? Find it out on September 9 at 24 Hours of PASS: Summit 2014 Preview Edition!

Updateable Column Store Indexes in SQL Server 2014

Introduction

Column store indexes had been released with SQL Server 2012 to optimize data warehouse workloads that have specific patterns, data are loaded through T-SQL scripts or through SSIS packages, several times a day, in any case the frequency is not important, only that the available data is loaded in the same execution. At the end of ETL process, data is read with reporting tools. Usually data is written one time, then read multiple times.

In SQL Server 2012 there was the non-clustered column store index only; like a traditional B-Tree non-clustered index, it was a secondary index. However, it differs from a traditional B-Tree index because it is based on a columnar structure, though, the base table remains organized by row (in a structure called row-store and saved in 8K data pages).

The column store indexes are part of Microsoft In-Memory Technologies because they use xVelocity engine for data compression optimization and its implementation is based on a columnar structure such as PowerPivot and SSAS Tabular. Data in column store indexes are organized by column, each memory page stores data from a single column, so each column can be accessed independently. This means that SQL Server Storage Engine will be able to fetch the only columns it needs. In addition, data is highly compressed, so more data will fit in memory and the I/O operations can greatly decrease.

Column store indexes structure

Before talking about new feature of column store indexes in SQL Server 2014, it is important to introduce three keywords: Segment, Row Group and Compression. In a column store index, a segment contains values for one column of a particular set of rows called row group. As it is possible to see in the following picture, each red and gray portions are segments. When you create a column store index, the rows in the table will be divided in groups and each row group contains about 1 million rows (the exact number of rows in each row group is 1,048,576; in other word there are 2^20 rows in each row group). Column store transforms the internal index organization from row organization to columnar organization and there will be one segment for each column and for each row group. Column store indexes are part of Microsoft In-Memory technologies in which data is compressed and the compression plays a very important role, so each segment is compressed and stored in a separate LOB.

This article does not detail the algorithms to compress data in column store index. At any rate, keep in mind that each segment stores the same type of data, so in a segment, there will be homogeneous data and in this scenario, the compression algorithms will be more efficient than the one used to compress table rows because row usually contains different type of data. In general, data compression can be implemented using different techniques such as:

  • Dictionary Encoding
  • Run-length Encoding
  • Bit Packing
  • Archival Compression (only in SQL Server 2014)
    • It can reduce disk space up to 27%

The techniques used by SQL Server to compress data are undocumented.

The following picture shows an example of row groups and segments in a column store index… continue reading full article here.

 

Thanks to my friend Viviana Viola for the grammar review.