Category Archives: Power Query

Power Query now connects to Analysis Services #ssas #mdx #dax

The last update of Power Query finally supports Analysis Services Multidimensional and Tabular. I waited this version for a very long time, but now it’s finally here!

Chris Webb already wrote an interesting blog post with several useful link and informations. 

You can connect to both Multidimensional and Tabular, but queries are generated in MDX. For this reason, I consider Multidimensional as a primary source by now. Many improvement can be done for multidimensional, whereas Tabular would benefit from DAX support at first.

I want to share my feedback and I already look forward to future improvements: please use the comment to this blog post to share your priorities for these features I would like to see.

Expose Keys

Each Hierarchy in Multidimensional has one or more levels, corresponding to dimension attributes.
Each attribute has up to three properties:

  • Key
    • Can be multipart, which means it has one or more columns, each column can have a different data type
  • Name
    • This is always a string. If not defined, it corresponds to the Key, which must have a single column. If the attribute has multipart key, the Name has to be defined in the Multidimensional model
  • Value
    • This property is optional and is not used often. However, it could represent the name in a numeric value and it’s used when the key cannot be used to represent the corresponding value. This property is not shown to the user but can be used in MDX calculations.

The Name is the one shown in the user interface of power Query. However, when you extract data from a cube, most of the times you need the key column(s) in order to create relationships with other query. For example, a very common scenario is creating three tables, corresponding to two dimensions and a fact table. The keys used to connect the dimension tables to the fact table are mandatory, but are not exposed in the attribute name visible in the user interface of a cube.
Thus, exposing the Key is very important. Please remember that the Key can be a multipart key, so you might have one or more columns to add.
If a user request the Key, by default I would show all the columns of a multipart key so he can remove those columns that are not required (but usually you’ll need all of them). Since the Value is not requested so often, I would expose it as a separate menu.

Surrogate Keys and Other Hidden Attributes

Depending on the cube design, it would be nice exposing those attributes that are hidden in the cube.
For example, in a well-designed cube, the model should not expose the surrogate keys in a visible attribute, because this would create a dependency in reports that would break a filter based on a surrogate key in case the relational tables are reprocessed and surrogate keys are regenerated. The general idea is that a surrogate key does not have a semantic meaning. Thus, it shouldn’t be exposed to an end user.
However, if you are importing several tables from a Multidimensional model, trying to create one table per dimension and one table per fact table (maybe by changing the cardinality, reducing the number of dimensions used), then you should import the surrogate keys too, at least for SCD type 2. Unfortunately, there are no info in the cube model that could help you discriminating between SCD1 and SCD2, so I’m not sure about what could be the best practice in this case. Probably, hidden attributes should be accessible only by advanced users, exposing them by default could be really confusing and I would avoid that.
This is an area where it’s hard to make a right choice, a compromise is required.

Related Properties

An attribute can have other properties related to it. By default, all browsable parent attributes can be considered attribute’s properties. However, when we talk about attribute’s properties we usually refer to the non-browsable attribute. Each non-browsable attribute is just another attribute. Non-browsable attributes are not shown in the list of attributes of a dimension, but they should be available to the user that want to import data for a certain column. The user interface could be designed in several ways for that:

  • Show attribute’s properties in a window from which the user can drag&drop – but maybe confusing – the UI should show only properties existing for a particular attribute and each attribute might have different properties. User interface might be a concern here.
  • Automatically import all the properties of an attribute (maybe by asking confirmation to the end user?) when adding that attribute to a query. Then the user can remove the columns that are not required in the Query.

Show Formatted Measures

Sometime it might be useful to import the formatted value of a measure. However, I would not replace the number imported today with the formatted value, because the latter could be a string that does not have any semantic meaning. Providing the option of importing the formatted measure as an additional column in the PowerQuery window would be very welcome, but don’t just replace one with the other.

Invisible attributes

Currently, Power Query shows all the attributes of a dimension, which in general is a good thing. However, I would put a checkbox that shows/hides invisible attributes. By default, I would show only visible attributes, because this is what user would be more familiar with. THe “show invisible attributes/columns” should be an advanced view.

Multiple selection without measures

if you select attributes from several dimensions without selecting a measure, you obtain as a result the Crossjoin between the tables you selected. In my opinion, this is counterintuitive and useless: I cannot imagine a use case where this would be meaningful. A much better solution would be importing every dimension as a single table, just as you do when you select many tables from SQL Server. It is the user that will handle joins between table, if necessary. My suggestion is to keep the existing behavior (import a single table) only when you import also a measure, even if I would like to be able to import all the dimensions and the set of measures as separate tables in the data model – creating one query for each dimension and one query for each measure group (or for each cube – not sure about the better approach here).
 

Power Query support for Analysis Services (MDX)

Today at TechEd Europe 2014 Miguel Llopis made the first public show of Power Query support for Analysis Services.

This is still not available, but it should be released soon (hopefully it will be our Christmas gift!).

Here is a list of features shown:

  • It should be able to query both Multidimensional and Tabular
  • Generates query in MDX (no DAX by now)
  • Load one table at a time (but a query can mix dimensions and measures)
  • Shows dimensions, measures, hierarchies and attributes in Navigator
  • Use the typical Power Query transformations working on a “table” result
  • You import one table at a time

I think the last point deserves an explanation. When you write a query in Power Query, the result is a single table. If I want to build a Power Pivot data model getting data from an existing cube in Analysis Services, but with a different granularity, I have to run one query for each dimension and one query for the fact table. Depending on the definition of the cube, this could be easier or harder, because original columns could have been hidden because measures are exposed instead. Moreover, the result of a measure that is not aggregated with a sum (imagine just an average) could be impossible to aggregate in Power Pivot in the right way.

Thus, if you want your user to take advantage of Power Query, make sure you expose in a model measures that can be aggregated to compute non-additive calculations (such as an average!)

Now I look forward for receiving this Christmas gift!

Power Query in Modern Corporate BI–Copenhagen, June 3, 2014–#powerquery

I will be in Copenhagen to deliver the SSAS Tabular Workshop on June 2-4, 2014 (few seats still available, but hurry up!).

In the same week I will be a speaker in an evening community event, MsBIP møde nr. 21, delivering the Power Query in Modern Corporate BI session that I also presented at TechEd North America 2014 last week. It’s not just a session about Power Query, there is a broader scope related to Corporate BI vs. Self-Service BI, which could be open to many consideration. I think that the two worlds can (and should) collaborate, instead of fighting against each other, especially when there is an existing investment in Corporate BI. I hope to meet many of you there!

First steps with Scheduled Data Refresh for Power Query #powerbi #powerquery

Just a few days before my session about Power Query at TechEd 2014, Microsoft released a new update that enables the scheduled data refresh of a Power Pivot workbook containing Power Query transformations.

This is a very good news, because it enables the data refresh of a number of different data sources. Even if the number of providers supported by this release is limited (only SQL Server and Oracle), you can use a SQL Server database as a bridge to access different data sources through views using Linked Server connections.

If you want to use this feature, first of all read carefully the Scheduled Data Refresh for Power Query blog post on MSDN web site. It guides you through are the steps required in order to enable the data source connection through the Data Management Gateway. As you will see, in reality you need to create the data source connections corresponding to the Power Query databases you use. Thus, in reality you might skip the data source configuration if you already have the corresponding databases enabled in the Power BI admin center. However, I suggest you to go through the steps described in that blog post at the beginning, because if the same database has two different drivers, it needs two different data sources. For this reason, I have a number of notes that might be helpful to avoid certain issues.

  • Power Query uses the .NET Framework Data Provider for SQL Server and Oracle Data Provider for .NET, whereas Power Pivot by default creates a SQL Server connection using the SQL Server Native Client 11.0 (SQLNCLI11).
    • Even if you already created a data source for a SQL Server database you refresh in a Power Pivot workbook, you have to create another data source for the same SQL Server database for Power Query, because you use two different drivers.
    • You might consolidate these data sources to only one, by changing the data provide in the advanced options of a Power Pivot configuration, but I am not sure this is a good idea. I would keep the two version of data sources, one for each provider, in case I use the same database in both connections
  • Power Query creates one connection string in Excel for each query you create. The connection string contains the entire transformation and when you copy it in the New Data Source page in Power BI admin, the internal query is analyzed to extract the required connection to SQL Server. If these connections are already configured as Power BI data sources, then you don’t need to do anything else. I suggest you to iterate all the queries you have following this step until you are confident of the internals and you are sure the required data sources are already available.
    • Even if you create a single query in M language accessing to different databases, the referenced connections will be found and each database will have a separate data source configuration in Power BI. I was worried that loading multiple tables from different database on the same server would have produced a single data source enabling to access all the databases on the server, but luckily this does not happen and security is preserved!
  • I spot an issue using certain DateTimeZone functions (DateTimeZone.FixedLocalNow, DateTimeZone.FixedUtcNow, DateTimeZone.LocalNow, and DateTimeZone.UtcNow) that seem not working with scheduled data refresh. You can read more about such issue in this thread on Power Query MSDN forum. I found a workaround using the Table.Buffer function, so that by stopping query folding the expression is not translated in SQL but evaluated directly by the Power Query engine. However, I hope this will be fixed soon.
  • A Power Query transformation that contains only a script, without accessing to any data source, currently is not refreshed. This would be useful for generating a Date table, I opened this other thread about this issue on the forum, I hope there will be news on that, too.
    • In the same thread you will find another tip: the literal in the form #literal, such as #table, are being mis-analyzed by scheduled refresh, but at least for this issue there are workarounds available, until the issue is not fixed by Microsoft.
  • You can use SQL Server views based on linked servers to overcome the limitation of providers currently supported by Data Management Gateway (which is the component used by scheduled data refresh).
  • Now that it is possible to publish SSIS packages as OData Feed Sources, you can expose a SQL Server view to Power BI, and accessing it from Power Pivot or Power Query, you can execute SSIS packages at refresh time. If the package is not too long to execute (it would timeout the connection), this is a smart way to arrange execution of some small “corporate ETL” in sync with the data refresh on Power BI, without relying on synchronized scheduling dates (which is always one more thing to maintain). This further extends the range of providers you can use with scheduled data refresh.

I would like to get more detailed errors when something goes wrong and scheduled data refresh stops, but this is a good start.

Calculate Distinct Count in a Group By operation in Power Query #powerquery #powerbi

The current version of Power Query does not have a user interface to create a Distinct Count calculation in a Group By operation. However, you can do this in “M” with a simple edit of the code generated by the Power Query window.

Consider the following table in Excel:

DistinctPowerQuery_01

You want to obtain a table containing the number of distinct products bought by every customer. You create a query starting from a table

DistinctPowerQuery_02

You keep in the query only the columns required for the group by and the distinct count calculation, removing the others. For example, select Products and Customers and right-click the Remove Other Columns menu choice.

DistinctPowerQuery_03

Select the Customer column and click the Group By transformation. You see a dialog box that by default creates a count rows column.

DistinctPowerQuery_04

This query counts how many transactions have been made by each customer, and you don’t have a way to apply a distinct count calculation. At this point, simply change the query from this:

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    R
emovedOtherColumns = Table.SelectColumns(Source,{"Product", "Customer"}),
    GroupedRows = Table.Group(RemovedOtherColumns, {"Customer"}, {{"Count", each Table.RowCount(_), type number}})
in
    GroupedRows

To this:

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    RemovedOtherColumns = Table.SelectColumns(Source,{"Product", "Customer"}),
    GroupedRows = Table.Group(RemovedOtherColumns, {"Customer"}, {{"Count", each Table.RowCount(Table.Distinct(_)), type number}})
in
    GroupedRows

The Table.RowCount function counts how many rows exist in the group. By calling Table.DistinctCount here, you reduce the number of rows in the table to a list of distinct count values, returning a correct value.

DistinctPowerQuery_05

I hope Power Query team will implement a distinct count option in the user interface. In the meantime, you can apply this easy workaround.

Power Query now imports relationships #powerquery #powerpivot

The December release of Power Query preview (version 2.9.3502.261) introduced an interesting feature: when you import multiple tables in the data model, relationships existing between tables in the data source are automatically detected and added to the data model as well. This is similar to the behavior you have in Power Pivot, but there is an important difference. When you import one or more tables in a data model where you already imported other tables before, the relationships are detected also between tables already imported and new tables selected for import. As you might know, Power Pivot detects relationships only between tables imported at once, not between tables imported and other tables already in the data model, even when they comes from the same data source.

The detection in Power Query doesn’t work between tables imported in data model directly by Power Pivot. It only works between tables that have been imported using Power Query.

I didn’t test the performance (I can imagine there is some price to pay using Power Query instead of a direct connection), but it’s clear that in the long term Power Query should be THE tool to import data in Power Pivot, or better in Excel, removing the overriding with existing functionalities that do similar things. I really like the progress that Power Query is doing, and I hope that the best is yet to come…

Microsoft Updates Power BI for Office 365 Preview with New Natural Language Search, Mapping Capabilities

Today we’re pleased to announce the addition of significant new features to the Power BI for Office 365 preview, including natural language search with Q&A and improved experiences in two preview add-ins for Excel with 3D mapping visualizations through Power Map and improved data search in Power Query.

Introduced in July and currently in preview, Power BI for Office 365 is a self-service business intelligence (BI) service delivered through Office 365. Complementing Excel, it arms information workers with data analysis and visualization capabilities, enabling them to identify deeper business insights either on premises or within a trusted cloud environment. With Power BI for Office 365, customers can connect to data in the cloud or extend their existing on premises data sources and systems to quickly build and deploy self-service BI solutions hosted in Microsoft’s enterprise cloud. You can sign up to register for the preview here.

We’ve had the preview open to an initial wave of customers for the past month and are encouraged by the enthusiastic response we’ve received. Today we’re excited to share some of the new features we’ve added recently to both Excel and the Power BI for Office 365 service.

Search-driven data visualization with Q&A

One of the Power BI features users have been most interested in is Q&A, which takes enterprise data search and exploration to a whole new level. With Q&A, we looked at how consumers experienced Bing search and used that knowledge to enable customers to query their enterprise data and generate stunning visual results. The search experience is instantaneous and uses natural language query – Q&A interprets the question the user is asking and serves up the correct interactive chart or graph. We’ve received great responses from customers who have tested this capability and look forward to hearing what you think. To see Q&A in action, check out this video:

Storytelling through 3D mapping with Power Map

First previewed a few months ago, Power Map (formerly GeoFlow) is an add-in for Excel which gives users the ability to plot geographic and temporal data visually on Bing Maps, analyze that data in 3D, and create interactive tours to share with others. This month, we made some significant updates to Power Map on the Download Center including immediate geo-coding of geospatial elements of data coupled with new region-based visualization that color-codes these geo-political areas: zip code, county, state, country/region. Users can also take the interactive tours designed in Power Map to create videos optimized for mobile, tablets/computer, and HD displays. These videos can be shared anywhere, including social media, PowerPoint slides, and Office 365. To read more about the new features if Power Map, check out the Excel blog.

Power Map

Simplifying data discovery with Power Query

We’ve also updated Power Query an add-in to Excel which helps customers easily discover, combine and transform their data. We have improved the online search experience and expanded the number of available datasets including popular datasets from data.gov and the Windows Azure Marketplace, in addition to Wikipedia. We’ve also improved the external data import for SQL Server/Windows Azure SQL Database, as well as the overall filter capabilities across all supported data sources. Additionally, Power Query now supports different merge options for more flexibility in building your queries. We’re offering better integration with Excel so users can share queries with others in their organizations.

Power Query

These new features compliment the current capabilities already included in the Power BI preview, such as:

  • Power BI Sites – Quickly create collaborative BI sites in Office 365 for teams to share reports and data views. Larger workbook viewing is now supported (up to 250MB) so users can view and interact with larger workbooks through the browser.
  • Data Stewardship – Users can now not only share their workbooks but also the data queries they create in Excel.
  • Data Catalog – IT departments now have a new way to provision users by enabling data search. IT departments can register corporate data with the Data Catalog so that users can discover this data with the new online search feature introduced with Power Query for Excel.
  • Mobile Access – Mobile BI access to reports in Office 365 is provided through new HTML 5 support and a native mobile application for Windows 8 tablets available in the Windows Store.

Bringing big data to a billion users

Power BI for Office 365 is just one way we are delivering on our vision to enable the broadest set of people to gain actionable insights from big data, at any time and from anywhere. With Power BI we are providing access to powerful business analytics tools, built into our existing products including Excel and Office 365 to make data analysis engaging and impactful.

To learn more and register for the preview visit www.powerbi.com. You can also download Power Map and Power Query along with sample datasets on the Power BI add-in Getting Started page. To see Power BI for Office 365 in action, check out this demonstration. Tell us what you think by posting in the comments below or tweeting us at @SQLServer #MSBI #PowerBI. And check out the Power BI blog for more detailed information on the features and functionality in Power BI.

Power BI for Office 365

Declarative ETL

T-SQL is a declarative programming language. This means that a T-SQL developer defines what (s)he wants the query to do, not how to do it. The secret sauce that turns the “what” into the “how” is a piece of software called the query optimiser and there are tomes dedicated to how one can make the query optimiser dance to their will. Generally the existence of a query optimiser is a good thing as in most cases it will do a better job of figuring out the “how” than a human being could*.

SSIS dataflows on the other hand are an imperative programming language**, the data integration developer dataflow builds a data pipeline to move data exactly as (s)he desires it to happen. In SSIS, There is no equivalent of the query optimiser.

I’ve often pondered whether there is an opportunity for someone to build a declarative data pipeline, that is, a method for a data integration developer to define what data should be moved rather than how to move it. Over the last few months I’ve come to the realisation that Power Query (formerly known as Data Explorer) actually fits that description pretty well. If you don’t believe me go and read Matt Masson’s blog post Filtering in Data Explorer in which he talks about query folding:

[Power Query] will automatically push filters directly to the source query

Even though we selected the full table in the UI before hiding the columns certain columns, we can see the source query only contains the columns we want

“filters” aren’t the only type of query folding that Data Explorer can do. If you watch the queries, you’ll see that other operations, such as column removal, renaming, joins, and type conversions are pushed as close to the source as possible.

Let’s not idly dismiss this. Query folding is very impressive and is (I think) analogous to predicate pushdown that is done by the query optimiser (for more on predicate pushdown read Nested Views, Performance, and Predicate Pushdown by Dave Wentzel). Does Power Query then have the equivalent of a query optimiser? I would say yes, it does although its more accurate to say that its the underlying query language called M for which this query optimiser exists!

So, we have a declarative data integration language which is surfaced in Power Query and hence can only (currently) push data into an Excel spreadsheet. Imagine if M were extended were extended into a fully-fledged data integration tool that could move data between any heterogeneous source, would that constitute “declarative ETL” and is there a need for such a tool?

I’ll leave that one out there for you to ponder. Comments would be most welcome.

@Jamiet 

*OK, you might debate whether the query optimiser can do a better job than a human but let’s save that for another day.

**We can debate whether or not SSIS dataflows are a programming language or not but again, let’s save that for another day. For the purposes of this discussion just go with me, OK.