Monthly Archives: March 2014

Calculate Distinct Count in a Group By operation in Power Query #powerquery #powerbi

The current version of Power Query does not have a user interface to create a Distinct Count calculation in a Group By operation. However, you can do this in “M” with a simple edit of the code generated by the Power Query window.

Consider the following table in Excel:

DistinctPowerQuery_01

You want to obtain a table containing the number of distinct products bought by every customer. You create a query starting from a table

DistinctPowerQuery_02

You keep in the query only the columns required for the group by and the distinct count calculation, removing the others. For example, select Products and Customers and right-click the Remove Other Columns menu choice.

DistinctPowerQuery_03

Select the Customer column and click the Group By transformation. You see a dialog box that by default creates a count rows column.

DistinctPowerQuery_04

This query counts how many transactions have been made by each customer, and you don’t have a way to apply a distinct count calculation. At this point, simply change the query from this:

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    R
emovedOtherColumns = Table.SelectColumns(Source,{"Product", "Customer"}),
    GroupedRows = Table.Group(RemovedOtherColumns, {"Customer"}, {{"Count", each Table.RowCount(_), type number}})
in
    GroupedRows

To this:

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    RemovedOtherColumns = Table.SelectColumns(Source,{"Product", "Customer"}),
    GroupedRows = Table.Group(RemovedOtherColumns, {"Customer"}, {{"Count", each Table.RowCount(Table.Distinct(_)), type number}})
in
    GroupedRows

The Table.RowCount function counts how many rows exist in the group. By calling Table.DistinctCount here, you reduce the number of rows in the table to a list of distinct count values, returning a correct value.

DistinctPowerQuery_05

I hope Power Query team will implement a distinct count option in the user interface. In the meantime, you can apply this easy workaround.

Demos: SQL Saturday Boston (#262) – Clash of the Row Goals!

Thanks to everyone who joined me yesterday afternoon in Cambridge for my Clash of the Row Goals talk. It was great to receive such a welcome reception for a rather out there topic. Three things I should clarify for those who attended: The Many() function appeared to create an invalid estimate in one case; but that was due to auto-parameterization (which one of the attendees in the front row correctly called out at the time; but I’m not sure everyone heard her) Regarding the demo that failed at the…(read more)

SQL Saturday #262 : Boston

Today I gave my T-SQL : Bad Habits & Best Practices session at SQL Saturday #262 at Microsoft in Cambridge. Fun event with a lot of friendly faces. A few mishaps along the way: had to wake up at 5:00 AM to take the only sensible commuter rail option to get me there on time; spilled, then proceeded to smear, bagel droppings all over my pants on said train; blew up the projector (literally!) pressing F5 during one of my demos; had to proceed with just hand-waving and a little interpretive dance…(read more)

BI Beginner: Power Query and OData

I’ve observed that not very many data professionals are familiar with OData . It’s an open data access protocol built on AtomPub and JSON . It provides a REST ful means of retrieving data, which is what this post is about. One of the things that makes OData important is that it is recommended by the Open Government Data Initiative. Although this post uses the familiar SQL Server AdventureWorks database, remember, OData is open. It’s not limited to SQL Server or even SQL databases in general. A small…(read more)

Unlocking insights from Big Data can be simple with the right tools

Data enthusiasts & Microsoft talk about Big Data in the enterprise at Gigaom Structure Data 2014

Last week in New York, our data solutions experts spent a few days with more than 900 big data practitioners, technologists and executives at Structure Data for a conversation how big data can drive business success.

The rich conversations with attendees at the event were inspiring, and the broad range of speakers was impressive.  Our discussions over the two days in New York centered on what the Big Data solution looks like inside an enterprise and the challenges around accessing and processing big data to make better data-driven decisions. 

Structure Data attendees want to combine data from multiple sources to do a couple key things — to gain deeper insights and to ask new questions and get answers.  However, without the right technology to support that, it can be very challenging to do this.  That's where Microsoft comes in — and where we continued the dialog with attendees as our data experts used a huge Microsoft touchscreens to show how easy it can be to transform big data to insights using simple, front-end tools (like Excel or Power BI for Office 365) and back-end technology for scale, power and speed (like Windows Azure HDInsight and SQL Server). 

Microsoft Research Distinguished Scientist John Platt also spoke at Structure Data and shared the latest on our work in machine learning, which is quite pervasive throughout many Microsoft products. If you missed it, take a moment to watch the short chat here

Our data experts also gave attendees an insiders’ view at how Microsoft’s Cybercrime Center is using data to fight worldwide organized crime and BotNets. (See the video below for more.) 

Take the first step and learn more about Microsoft Big Data solutions:

Or, connect with us on Facebook.com/sqlserver and Twitter @SQLServer and learn how Microsoft’s approach to data, big and small, helps employees, IT professionals and data scientists quickly transform data into insight and action.

And don't forget about the April 15th Accelerate your insights event where Microsoft will unveil the details of new capabilities for the appliance that has both scale-out relational data warehouse and Hadoop in the same box, thus evolving PDW from a solution built for high performance relational data warehousing to a true turnkey Big Data Analytics appliance.

Truncate Table – DDL or DML Statement?

Many times, categories of concepts and things overlap. It can be hard to categorize some items in a single category. The SQL TRUNCATE TABLE statement is an example of an item that is not so easy to categorize. Is it a DDL (Data Definition Language) or DML (Data Manipulation Language) statement?

There is an ongoing discussion about this topic. However, if you quickly bingle for this question, you get the impression that the majority is somehow leaning more toward defining the TRUNCATE TABLE statement as a DDL statement. For example, Wikipedia clearly states: “In SQL, the TRUNCATE TABLE statement is a Data Definition Language (DDL) operation that marks the extents of a table for deallocation (empty for reuse).” Disclaimer: please note that I do not find Wikipedia as the “ultimate, trustworthy source” – I prefer sources that are signed!

Some of the reasons why many people define the statement as a DDL statement include:

  • It requests schema locks in some systems
  • It is not possible to rollback it in some systems
  • It does not include a WHERE clause
  • It does not fire triggers in some systems
  • It resets the autonumbering column value in some systems
  • It deallocates system pages directly, not through an internal table operation
  • and more.

On the other hand, it looks like there is only one reason to treat the statement as a DML statement:

  • Logically, you just get rid of the data, like with the DELETE statement.

Even the Wikipedia article that I referred to says “The TRUNCATE TABLE mytable statement is logically (though not physically) equivalent to the DELETE FROM mytable statement (without a WHERE clause).”

Like many times, I have to disagree with the majority. I understand that the categorization is somehow confusing, and might even be overlapping. However, the only reason for categorizing the TRUNCATE TABLE statement in the DML category is “THE” reason in my understanding. One of the most important ideas in the Relational Model is the separation between the logical and the physical level. We, users, or people, if you wish, are manipulating with data on the logical level; the physical implementation is left to the database management system. And this is the important part – logically, when you truncate table, you don’t care how this statement is implemented internally, you just want to get rid of the data. It really does not matter what kind of locks a system uses, does it allow WHERE clause or not, etc. The logical point is what matters. Therefore, I would categorize the TRUNCATE TABLE statement as a DML statement.

Of course, this is a purely theoretical question, and is really not important for your practical implementation. As long as your app is doing what it should do, you don’t care too much about these nuances. However, IMO in general there is not enough of theoretical knowledge spread around, and therefore it makes sense to try to get the correct understanding.

But there is always a “but”. Of course, I have immediately another question. What about the ALTER TABLE mytable SWITCH [PARTITION…] TO… statement? ALTER statements have been defined as DDL statements forever. however, again, logically you are just moving the data from one table to another. Therefore – what? What do you think?

Join us on April 15 and Accelerate Your Insights

Mark your calendar now to join us online on April 15 for the Accelerate your insights event, streaming live from San Francisco, California at 10:00 AM PDT.

Wondering what your data can do for you? Join us to online find out how to drive your business in real-time, from apps to insights. You’ll hear from several of Microsoft’s top executives, including Chief Executive Officer Satya Nadella, Chief Operating Officer Kevin Turner, and Corporate Vice President of the Data Platform Group, Quentin Clark.

Save the date to watch the keynotes streamed live on 4/15:

–        Mark your calendar

–        RSVP on Facebook

Join us as we share Microsoft’s data platform vision, and how the latest advancements in our data technologies can help you transform data into action.

See you there.

Simple Text Mining with the SSIS Term Extraction Component

Advanced analytics, big data, data science; they’re all hyped topics nowadays. They’re not new however, and one specific subject area has always had some presence in Microsoft’s ETL tool: text mining. Integration Services offers two transformations that can help you to do some rudimentary analysis on text: the Term Extraction and the Term Lookup components. This tip will introduce you to the Term Extraction component using a simple use case.