Monthly Archives: February 2013

Traversing the Facebook Graph using Data Explorer

As I mentioned yesterday Microsoft Data Explorer Preview for Excel has hit the streets and one new feature is a built-in adapter for talking to the Facebook Graph API:

image

I’ve been taking it for a spin for today and was quite interested in its capabilities. Even without landing any data into Excel this is still a handy interactive Facebook graph explorer as I’ll demonstrate herein.


A good place to start is with a list of statuses. I have chosen “me” as the “Username or object id” (because frankly I didn’t know what else to put in there) and “statuses” as the dataset to return:

image

After the standard merry authentication dance with Facebook Connect one’s history of Facebook statuses is returned which on its own might make interesting reading:

SNAGHTMLc6a91d4

however let’s do something a bit more interesting. I remove the columns I’m not interested in:

SNAGHTMLc6db2eb

to leave just my statuses, likes and comments:

SNAGHTMLc6e96f2

Notice that the likes and comments fields contain either a hyperlinked “Table” or null, “Table” means we have a set of likes or comments respectively – in other words Data Explorer is letting me browse through the one-to-many relationships inherent in the data and I happen to think that is very cool indeed. For example, selecting a link shows me the likes or comments respectively for the status in question (“Want to hear my Elton John joke? Its a little bit funny…”):

SNAGHTMLcb762f9

What would be more interesting would be if I could expand to show all comments against a status and also who made that comment. Turns out we can do just that, first click on this button called “Expand”:

image

which displays a list of columns that I want to show in the resultset; I choose “from” & “message”:

image

and now I can see all the comments against the relevant status. For all you out there whose heads are in SQL-land, I’ve effectively LEFT JOINed statuses to comments:

image

Last step, the comments.from field tells me who left the comment:

image

So let’s do a similar trick to show the name of the commenter in my resultset and voila:

SNAGHTMLc870b67

we have a list of my statuses along with all the comments and the name of the person that made the comment! We have effectively joined three linked datasets (statuses, comments, person), all inside Excel, and all without writing a scrap of code. Personally I think that’s very cool.


Now let’s go back to just our list of statuses:

SNAGHTMLc6e96f2

Let’s say we now want to count the number of likes and comments on each status and sort the data accordingly. I use the expand feature as before except this time I choose to aggregate rather than expand:

image

That gives me the number of likes and comments per status which I can then sort however I choose:

image

and with a couple of quick aesthetic column renames I have all my statuses with number of likes and comments in descending order of popularity (as measured by the count of likes):

SNAGHTMLca00a5e

I can then pull that resultset into Excel and chart it to see if there is any correlation between number of likes and number of comments:

image

Imagine aggregating over something more interesting than Facebook statuses and I think you can realise the potential here.


Hopefully this has served as a useful introduction to Data Explorer’s ability to query Facebook data. What interests me most about this is not the ability to query Facebook per se, its the ability to query and traverse over graph data using a tool that isn’t aimed at developers. Now imagine being being able to do the same with the growing corpus of linked data graphs that exist on the web in RDF format and you start to realise the potential here; I truly hope that the ability to query RDF data is high on the agenda of the Data Explorer team*.

If you want to have a go with Data Explorer yourself then go ahead and download it from http://www.microsoft.com/en-us/download/details.aspx?id=36803; its free, although you will need Excel 2010 or Excel 2013 installed.

@Jamiet 

* For what its worth I’ve had a watching brief on RDF, the Semantic Web and Linked Data since way back in November 2004 and I’ve been waiting since then for it to hit the big time. I don’t think that’s going to happen any time soon but having support from tools like Data Explorer would be a step in the right direction.

O’Reilly Strata: Busting Big Data Adoption Myths with Halo 4–Part 3

Structured or unstructured – what to store? How about all of it?

I spoke at the O’Reilly Strata Conference last year and am back again this year to share thoughts about big data. It’s been a remarkable year for Microsoft’s data platform team. While we’ve had a long history in the data space, last year was noteworthy for several reasons.

  1. Dave Campbell Keynotes at Strata Santa Clara 2013We announced our commitment to the Hadoop community and we partnered with Hortonworks to provide enterprises with an Apache-Hadoop compatible implementation on Windows.
  2. We released new versions of our business intelligence (BI) tools, providing business users access to powerful analytics through familiar tools including Excel 2013, Power View and PowerPivot. The columnar in-memory analytics engine technology in PowerPivot, available in Excel, is the same technology which is inside SQL Server Analysis Services Tabular model. We’ve made it very easy in SQL Server 2012 to migrate PowerPivot models into Analysis Services so you can take solutions and insights developed in Excel and scale them up to server based deployments which can serve many users.
  3. And tomorrow you can start ordering SQL Server 2012 Parallel Data Warehouse (PDW), the next generation of our parallel processing data warehousing appliance. The latest version of PDW includes a technology we call PolyBase. PolyBase makes it very easy to meld Apache Hadoop with structured relational data using a standard T-SQL query, enabling large businesses to do efficient information production at scale with the existing skills and knowledge they already have in-house.

While that may read like a list of features, here’s why I think they are noteworthy. A recent Microsoft study revealed that 38% of respondents’ current data stores contain unstructured data and 53% rated increased amounts of unstructured data to analyze as extremely important. This trend is only increasing as businesses realize their unstructured data holds the key to new value not accessible in their existing structured data. In my personal experience, over the last two years, many businesses have shifted from thinking of big data as a challenge to perceiving it as an opportunity.

I’m often asked, “Where is the ultimate value in big data and how do I tap into it?” There are two key measures in my mind: 1) time to insight, and, 2) return on accessible data. These measures are, in turn, enabled through a process I call information production.

Information production is the process of converting data or information from one domain into another. Consider the following example. Assume you have an ambulance fleet which is equipped with GPS units which collect telemetry. Information production techniques allow you to convert the raw GPS telemetry – a sequence of records containing {Timestamp, Latitude, Longitude} elements – into an incident response time. The magic of information production is that it takes data which is difficult to deal with in traditional information systems, such as raw GPS telemetry, and transforms it into information, (incident ID and response time), which is both more structured and more business relevant. Once we’ve produced the incident response time, we can logically join it with models which predict patient outcome as a function of response time.

Great information production tools allow you reduce the time to insight. They allow you to get from a hunch to validation very quickly. In fact, there is an emerging class of information production tools which stimulate hypothesis generation by finding correlations in diverse data sets which may hold the key to new value.

Valuable answers require logically joining different data sets – something every database person is familiar with. In traditional databases the “accessible” data is constrained to data which is contained within the database. This data has been normalized, cleaned, and indexed so it can be used to efficiently answer a fixed set of questions over that data domain.

Big data and information production enable a much larger definition of accessible data though. Going back to the ambulance example, where would you get the patient outcome model to determine how many lives you could save by reducing response time? By using accessible demographic and population data, you could determine how many heart attack victims lives could be saved by moving or adding ambulances.

We will know big data has made it big when it makes every day experiences better. In fact, one of the things I spoke about in my Strata talk today is how our Halo 4 team is using our HDInsight and big data tools to create a better gaming experience. By using a preview of the Windows Azure HDInsight Service to do Hadoop-based analysis on their unstructured data, the team gained invaluable insight on usage patterns and as a result had the agility to make changes to improve the overall gaming experience.

Information production is a key part of our vision and product offerings and enables you to achieve fast time to insight and greater return on accessible data. Our BI tools, like PowerPivot and Power View, are geared to making it easier for a certain class of users to reduce their time to insight and then to be able to effectively share those insights with others. Data Explorer, which we released in preview yesterday, makes it easy to find, transform, and join information to both ease information production and increase the range of accessible data. HDInsight Service, our Apache Hadoop based service, available in preview on Windows Azure, is being used by our Halo 4 team and external customers to realize new value from their unstructured data. SQL Server 2012 PDW with its PolyBase technology enables extremely large scale information production for the largest business needs.

For all of us involved in big data, and me personally, it is an incredibly exciting time. The next 5-10 years are going to be breathtaking.

You can find out more about our big data solutions by visiting www.microsoft.com/bigdata, or for those interested in reading the Halo 4 case study on their cloud-based big data solution  – it’s now available online here.

Dave Campbell
Technical Fellow, STB

See more from the Busting Big Data Adoption Myth blog series:

Geek City: A Grammar Geek in the Cloud

Those of you who know me well know that I am usually a stickler for spelling, grammar and proper word usage. I may have even lost a few friends because of what some people see as obsession. I am not infallible, and I do make typos, but I like to try to correct them if possible as soon as I discover them (or as soon as they are brought to my attention.) So now I will admit that I made a mistake in my usage of of the word ‘premise’, when talking about Cloud vs. non-Cloud databases. I was using the…(read more)

Don’t let the facts interfere with the truth

When you are joining N tables, it is true that unless you have N-1 joins, you’re going to have a Cartesian product. Today I saw a case where there were N-2 joins and no evidence of a Cartesian product in the result set. No additional rows in the result set – that’s a fact. But what about the truth, that a Cartesian product occurs when you have less than N-1 joins? In the code sample below where there are N-2 joins, a Cartesian product does occur, but the where clause prevents you from seeing it….(read more)

O’Reilly Strata: Busting Big Data Adoption Myths–Part 2

If concerns about having the right skillsets on staff are stopping you from trying big data, Microsoft’s BI tools may hold the key to busting down those barriers.

Microsoft at Strata Santa ClaraIt’s the second day of the O’Reilly Strata Conference and one common theme I expect to hear about is staffing and skillsets related to big data. Due to the industry hype around big data, a majority of business leaders believe they need employees with specialized experience to get the highest return from their big data investments. IDC recently found that 24% of organizations are concerned they lack employees with the right skillsets to do big data analytics and CIOs are overwhelmed by the amount of data their company manages.† While it is true that the industry needs more data scientists, it is equally true that most organizations are equipped with the employees they need today to help them gather the valuable insights from their data that will better their business.

According to the Gartner* Magic Quadrant for Business Intelligence and Analytics Platforms, the market for BI and analytics platforms will remain one of the fastest growing software markets.” And that, “although this is a mature market and has been a top CIO priority for years, there is still a lot of unmet demand. Every company has numerous subject areas — such as HR, marketing, social and so on — that have yet to even start with BI and analytics.”[1] That’s why Microsoft’s approach to business analytics is to make it accessible and easy for employees to analyze and extract meaningful results from big data, small data – really any data. Microsoft has long argued for the “democratization” of BI, where individuals are empowered to do their own data analysis using familiar tools and without IT assistance. To help our customers do that, we have been investing in making Excel a true self service BI client. With the general availability of Office 2013 today, Excel continues its journey to become a full featured BI client and we continue to invest in self-service BI by also releasing a preview of Data Explorer for Excel.

Data Explorer is an Excel add-in that enhances the self-service BI experience by simplifying data discovery and access. It also supports a broad range of data types including relational, structured and semi-structured, Odata, the Web, Hadoop, Windows Azure Marketplace, and more. In short Data Explorer helps connect customers with the data they need – try it out for yourself and download the Data Explorer Preview for Excel at: http://www.microsoft.com/en-us/bi/Products/OfficePreview.aspx.

Be sure to check back here tomorrow morning as we bust our final big data myth in part 3 of this series or read part 1 where we bust our first big data myth. And if you’re attending Strata, be sure to stop by Server and Tools Business (STB) Technical Fellow Dave Campbell’s keynote tomorrow at 9:05AM PT.

Eron Kelly
General Manager
SQL Server

*Disclaimer: Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.


†Source: IDC presentation Big Data: Exploiting the Opportunities That Exist NOW for ICT Vendors, December 11, 2012

[1] Gartner, Magic Quadrant for Business Intelligence and Analytics Platforms, Kurt Schlegel et al, 5 February 2013

Data Explorer hits full preview

A short recap

At the PASS Summit 2011 a project that existed as part of the now-defunct SQL Azure Labs was announced – Data Explorer. I posted some initial thoughts at Thoughts on Data Explorer and continued to post some walkthroughs and feedback. Data Explorer was an interesting technology to me which in October 2011 I described as:

Data Explorer is an ETL tool and given my obvious SSIS affiliations that makes it very interesting to me. That it runs as a cloud service and will be available to non-developers only makes it more intriguing and I can’t wait until Data Explorer becomes available for us to tinker with later this year.

That being said Data Explorer always felt uncomfortably as if it was an interesting piece of tech trying to find both a problem to solve and a place to live.

Now in preview

Today the second of those conundrums has been answered by the release of a new public preview; Data Explorer is going to live in Excel and its official title is now “Microsoft Data Explorer Preview for Excel” (I still dislike the name by the way):

SNAGHTML6084ac5

Here’s a video that describes Data Explorer’s pitch today:

Microsoft Data Explorer Preview for Excel

If the video doesn’t render properly check it out on YouTube instead.

From the video:

Data Explorer enhances the self-service experience in Excel by simplifying data discovery and access to a broad range of public and enterprise data sources enabling richer insights from data that has traditionally been difficult for users to get at.
(emphasis is mine)

In short, Data Explorer is targeted at Excel users (Microsoft calls them “Information Workers”) and now looks like this:

image
The data in this screenshot was imported from http://en.wikipedia.org/wiki/Super_league.

which (I think) is neat and tidy and pretty intuitive whereas a year ago it looked like this:



and didn’t seemingly know what it wanted to be other than yet another tool for developers. It didn’t have an identity and it didn’t have a raison d’etre. Today’s preview release goes some way to solving those things.

Online Search

One very interesting new feature is Online Search which enables you to search the web for datasets. In this example I use the feature to search for data on Apollo space missions:

image

I find a list of all the missions and with one click its in Excel:

image

where I can do all the normal stuff that one would normally do in Excel. Here I have used Excel 2013’s Flash Fill feature to extract the year and pivot the data to discover which rocket was most popular and in which year the most missions were launched:

image

You get the idea. This is powerful stuff! The workbook that I built for this simple demo is available for viewing online or download at 20130227 Apollo Missions – Simple Data Explorer Demo.xlsx.

Today’s release is also the first from a product team that has a number of other interesting projects in the pipeline that you will see more of in 2013 – watch this space.

Data Explorer. Is it A2P?

One last point I’ll make is that Data Explorer as it exists today ably fulfils the “Acquire” and “Processing” aspects of Mike Reich’s A2P approach which I talked about only twelve days ago at ETL is dead, long live AP2 ?. Interestingly Data Explorer in its October 2011 guise also provided the last part of A2P – “Publishing” (I demo’d the Publishing ability at Data Explorer walkthrough – Parsing a Twitter list) however that ability has been removed in this release in favour of landing the data into Excel. I remain hopeful that the ability to publish a Data Explorer feed returns in the future.

Get the bits

Download “Microsoft Data Explorer Preview for Excel” from http://www.microsoft.com/en-us/download/details.aspx?id=36803

Read more

I’m not the only one blogging about this. Check out:

@Jamiet

Update to the PowerPivot for Excel 2013 licensing #powerpivot #excel

Rob Collie Ken Puls (thanks Bob!) wrote a very interesting post about the PowerPivot for Excel 2013 licensing.

Short recap:

  • PowerPivot for Excel 2013 is only available in a few editions of Excel/Office (more details here)
  • If you want to buy a perpetual license of Excel with PowerPivot, you need to be included in a Volume Licensing program
  • The easiest way to enter a volume licensing program is buying 5 licenses for Microsoft products, even different ones
  • The cheapest license you can buy is around 7$ (i.e. Microsoft DVD Playback Pack for Windows Vista Business)
  • Buying 4 of them grants you the ability to buy Office 2013 Pro Plus license as fifth product to enter the volume license agreement

This really does not make any sense to me, and like everyone I hope Microsoft will fix the licensing issues that could limit the adoption of PowerPivot for Excel 2013. In the meantime, there is a way to get the product spending just 30$ more.

Using Excel to interact with a SSAS cube

Sometimes we need to access and work with a SSAS cube using Excel. In this tip we will show how to: access SSAS with Excel, add Dimensions to a cube, work with SSAS KPIs, add SSAS Named Sets, see the MDX Query that Excel creates, create new calculations and how to find Multidimensional Members.