Category Archives: big data

Data Science Perspectives: Q&A with Microsoft Data Scientists Val Fontama and Wee Hyong Tok

You can’t read the tech press without seeing news of exciting advancements or opportunities in data science and advanced analytics. We sat down with two of our own Microsoft Data Scientists to learn more about their role in the field, some of the real-world successes they’ve seen, and get their perspective on today’s opportunities in these evolving areas of data analytics.

If you want to learn more about predictive analytics in the cloud or hear more from Val and Wee Hyong, check out their new book, Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes.

First, tell us about your roles at Microsoft?

 [Val] Principal Data Scientist in the Data and Decision Sciences Group (DDSG) at Microsoft

 [Wee Hyong] Senior Program Manager, Azure Data Factory team at Microsoft

 And how did you get here? What’s your background in data science?

[Val] I started in data science over 20 years ago when I did a PhD in Artificial Intelligence. I used Artificial Neural Networks to solve challenging engineering problems, such as the measurement of fluid velocities and heat transfer. After my PhD, I applied data mining in the environmental science and credit industry: I did a year’s post-doctoral fellowship before joining Equifax as a New Technology Consultant in their London office. There, I pioneered the application of data mining to risk assessment and marketing in the consumer credit industry. I hand coded over ten machine learning algorithms, including neural networks, genetic algorithms, and Bayesian belief networks in C++ and applied them to fraud detection, predicting risk of default, and customer segmentation.    

[Wee Hyong] I’ve worked on database systems for over 10 years, from academia to industry.  I joined Microsoft after I completed my PhD in Data Streaming Systems. When I started, I worked on shaping the SSIS server from concept to release in SQL Server 2012. I have been super passionate about data science before joining Microsoft. Prior to joining Microsoft, I wrote code on integrating association rule mining into a relational database management system, which allows users to combine association rule mining queries with SQL queries. I was a SQL Server Most Valuable Professional (MVP), where I was running data mining boot camps for IT professionals in Southeast Asia, and showed how to transform raw data into insights using data mining capabilities in Analysis Services.

What are the common challenges you see with people, companies, or other organizations who are building out their data science skills and practices?

[Val] The first challenge is finding the right talent. Many of the executives we talk to are keen to form their own data science teams but may not know where to start. First, they are not clear what skills to hire – should they hire PhDs in math, statistics, computer science or other? Should the data scientist also have strong programming skills? If so, in what programming languages? What domain knowledge is required? We have learned that data science is a team sport, because it spans so many disciplines including math, statistics, computer science, etc. Hence it is hard to find all the requisite skills in a single person. So you need to hire people with complementary skills across these disciplines to build a complete team.

The next challenge arises once there is a data science team in place – what’s the best way to organize this team? Should the team be centralized or decentralized? Where should it sit relative to the BI team? Should data scientists be part of the BI team or separate? In our experience at Microsoft, we recommend having a hybrid model with a centralized team of data scientists, plus additional data scientists embedded in the business units. Through the embedded data scientists, the team can build good domain knowledge in specific lines of business. In addition, the central team allows them to share knowledge and best practices easily. Our experience also shows that it is better to have the data science team separate from the BI team. The BI team can focus on descriptive and diagnostic analysis, while the data science team focuses on predictive and prescriptive analysis. Together they will span the full continuum of analytics.

The last major challenge I often hear about is the actual practice of deploying models in production. Once a model is built, it takes time and effort to deploy it in production. Today many organizations rewrite the models to run on their production environments. We’ve found success using Azure Machine Learning, as it simplifies this process significantly and allows you to deploy models to run as web services that can be invoked from any device.

[Wee Hyong] I also hear about challenges in identifying tools and resource to help build these data science skills. There are a significant number of online and printed resources that provide a wide spectrum of data science topics – from theoretical foundations for machine learning, to practical applications of machine learning. One of the challenges is trying to navigate amongst the sea of resources, and selecting the right resources that can be used to help them begin.

Another challenge I have seen often is identifying and figuring out the right set of tools that can be used to model the predictive analytics scenario. Once they have figured out the right set of tools to use, it is equally important for people/companies to be able to easily operationalize the predictive analytics solutions that they have built to create new value for their organization.

What is your favorite data science success story?

[Val] My two favorite projects are the predictive analytics projects for ThyssenKrupp and Pier 1 Imports. I’ll speak today about the Pier 1 project. Last spring my team worked with Pier 1 Imports and their partner, MAX451, to improve cross-selling and upselling with predictive analytics. We built models that predict the next logical product category once a customer makes a purchase. Based on Azure Machine Learning, this solution will lead to a much better experience for Pier 1 customers.

[Wee Hyong] One of my favorite data science success story is how OSIsoft collaborated with the Carnegie Mellon University (CMU) Center for Building Performance and Diagnostics to build an end-to-end solution that addresses several predictive analytics scenarios. With predictive analytics, they were able to solve many of their business challenges ranging from predicting energy consumption in different buildings to fault detection. The team was able to effectively operationalize the machine learning models that are built using Azure Machine Learning, which led to better energy utilization in the buildings at CMU.

What advice would you give to developers looking to grow their data science skills?
[Val] I would highly recommend learning multiple subjects: statistics, machine learning, and data visualization. Statistics is a critical skill for data scientists that offers a good grounding in correct data analysis and interpretation. With good statistical skills we learn best practices that help us avoid pitfalls and wrong interpretation of data. This is critical because it is too easy to unwittingly draw the wrong conclusions from data. Statistics provides the tools to avoid this. Machine learning is a critical data science skill that offers great techniques and algorithms for data pre-processing and modeling. And last, data visualization is a very important way to share the results of analysis. A good picture is worth a thousand words – the right chart can help to translate the results of complex modeling into your stakeholder’s language. So it is an important skill for a budding data scientist.

[Wee Hyong] Be obsessed with data, and acquire a good understanding of the problems that can be solved by the different algorithms in the data science toolbox. It is a good exercise to jumpstart by modeling a business problem in your organization where predictive analytics can help to create value. You might not get it right in the first try, but it’s OK. Keep iterating and figuring out how you can improve the quality of the model. Over time, you will see that these early experiences help build up your data science skills.

Besides your own book, what else are you reading to help sharpen your data science skills?

[Val] I am reading the following books:

  • Data Mining and Business Analytics with R by Johannes Ledolter
  • Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) by Ian H. Witten, Eibe Frank, and Mark A. Hall
  • Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or Die by Eric Siegel

[Wee Hyong] I am reading the following books:

  • Super Crunchers: Why Thinking-By-Numbers Is the New Way to Be Smart by Ian Ayres
  • Competing on Analytics: The New Science of Winning by Thomas H. Davenport and Jeanne G. Harris.

Any closing thoughts?

[Val]  One of the things we share in the book is that, despite the current hype, data science is not new. In fact, the term data science has been around since 1960. That said, I believe we have many lessons and best practices to learn from other quantitative analytics professions, such as actuarial science. These include the value of peer reviews, the role of domain knowledge, etc. More on this later.

[Wee Hyong] One of the reasons that motivated us to write the book is we wanted to contribute back to the data science community, and have a good, concise data science resource that can help fellow data scientists get started with Azure Machine Learning. We hope you find it helpful. 

Results are Beautiful: 4 Best Practices for Big Data in Healthcare

When you put big data to work, results can be beautiful. Especially when those results are as impactful as saving lives. Here are four best practice examples of how big data is being used in healthcare to improve, and often save, lives.

Aerocrine improves asthma care with near-real-time data

Millions of asthma sufferers worldwide depend on Aerocrine monitoring devices to diagnose and treat their disease effectively. But those devices are sensitive to small changes in ambient environment. That’s why Aerocrine is using a cloud analytics solution to boost reliability. Read more.

Virginia Tech advances DNA sequencing with cloud big data solution

DNA sequencing analysis is a form of life sciences research that has the potential to lead to a wide range of medical and pharmaceutical breakthroughs. However, this type of analysis requires supercomputing resources and Big Data storage that many researchers lack. Working through a grant provided by the National Science Foundation in partnership with Microsoft, a team of computer scientists at Virginia Tech addressed this challenge by developing an on-demand, cloud-computing model using the Windows Azure HDInsight Service. By moving to an on-demand cloud computing model, researchers will now have easier, more cost-effective access to DNA sequencing tools and resources, which could lead to even faster, more exciting advancements in medical research. Read more.

The Grameen Foundation expands global humanitarian efforts with cloud BI

Global nonprofit Grameen Foundation is dedicated to helping as many impoverished people as possible, which means continually improving the way Grameen works. To do so, it needed an ongoing sense of its programs’ performance. Grameen and Microsoft brought people and technology together to create a BI solution that helps program managers and financial staff: glean insights in minutes, not hours; expand services to more people; and make the best use of the foundation’s funding. Read more.

Ascribe transforms healthcare with faster access to information

Ascribe, a leading provider of IT solutions for the healthcare industry, wanted to help clinicians identify trends and improve services by supplying faster access to information. However, exploding volumes of structured and unstructured data hindered insight. To solve the problem, Ascribe designed a hybrid-cloud solution with built-in business intelligence (BI) tools based on Microsoft SQL Server 2012 and Windows Azure. Now, clinicians can respond faster with self-service BI tools. Read more.

Learn more about Microsoft’s big data solutions

Best of 2014: Top 10 Data Exposed Channel 9 Videos for Data Devs

Have you been watching Data Exposed over on Channel 9? If you’re a data developer, Data Exposed is a great place to learn more about what you can do with data: relational and non-relational, on-premises and in the cloud, big and small.

On the show, Scott Klein and his guests demonstrate features, discuss the latest news, and share their love for data technology – from SQL Server, to Azure HDInsight, and more!

We rounded up the year’s top 10 most-watched videos from Data Exposed. Check them out below – we hope you learn something new!

  • Introducing Azure Data Factory: Learn about Azure Data Factory, a new service for data developers and IT pros to easily transform raw data into trusted data assets for their organization at scale.
  • Introduction to Azure DocumentDB: Get an introduction to Azure DocumentDB, a NoSQL document database-as-a-service that provides rich querying, transactional processing over schema free data, and query processing and transaction semantics that are common to relational database systems.
  • Introduction to Azure Search: Learn about Azure Search, a new fully-managed, full-text search service in Microsoft Azure which provides powerful and sophisticated search capabilities to your applications.
  • Azure SQL Database Elastic Scale: Learn about Azure SQL Database Elastic Scale, .NET client libraries and Azure cloud service packages that provide the ability to easily develop, scale, and manage the stateful data tiers of your SQL Server applications.
  • Hadoop Meets the Cloud: Scenarios for HDInsight: Explore real-life customer scenarios for big data in the cloud, and gain some ideas of how you can use Hadoop in your environment to solve some of the big data challenges many people face today.
  • Azure Stream Analytics: See the capabilities of Azure Stream Analytics and how it helps make working with mass volumes of data more manageable.
  • The Top Reasons People Call Bob Ward: Scott Klein is joined by Bob Ward, Principle Escalation Engineer for SQL Server, to talk about the top two reasons why people want to talk to Bob Ward and the rest of his SQL Server Services and Support team.
  • SQL Server 2014 In-Memory OLTP Logging: Learn about In-Memory OLTP, a memory-optimized and OLTP-optimized database engine integrated into SQL Server. See how transactions and logging work on memory-optimized-tables, and how a system can recover in-memory data in case of a system failure.
  • Insights into Azure SQL Database: Get a candid and insightful behind-the-scenes look at Azure SQL Database, the new service tiers, and the process around determining the right set of capabilities at each tier.
  • Using SQL Server Integration Services to Control the Power of Azure HDInsight: Join Scott and several members of the #sqlfamily to talk about how to control cloud from on-premises SQL Server.

Interested in taking your learning to the next level? Try SQL Server or Microsoft Azure now.

How to Hadoop: 4 Resources to Learn and Try Cloud Big Data

Are you curious about how to begin working with big data using Hadoop? Perhaps you know you should be looking into big data analytics to power your business, but you’re not quite sure about the various big data technologies available to you, or you need a tutorial to get started.  

  1. If you want a quick overview on why you should consider cloud Hadoop: read this short article from MSDN Magazine that explores the implications of combining big data and the cloud and provides an overview of where Microsoft Azure HDInsight sits within the broader ecosystem.  

  1. If you’re a technical leader who is new to Hadoop: check out this webinar about Hadoop in the cloud, and learn how you can take advantage of the new world of data and gain insights that were not possible before.  

  1. If you’re on the front lines of IT or data science and want to begin or expand your big data capabilities: check out the ‘Working with big data on Azure’ Microsoft Press eBook, which provides an overview of the impact of big data on businesses, a step-by-step guide for deploying Hadoop clusters and running MapReduce in the cloud, and covers several use cases and helpful techniques.  

  1. If you want a deeper tutorial for taking your big data capabilities to the next level: Master the ins and outs of Hadoop for free on the Microsoft Virtual Academy with this ‘Implementing Big Data Analysis’ training series

What question do you have about big data or Hadoop? Are there any other resources you might find helpful as you learn and experiment? Let us know. And if you haven’t yet, don’t forget to claim your free one month Microsoft Azure trial

Azure previews fully-managed NoSQL database and search services

I am pleased to announce previews of new NoSQL database and search services and the evolution of our Hadoop-based service. Available as previews today are Azure DocumentDB, a fully-managed transactional NoSQL document database-as-a-service, and Azure Search, which enables developers to easily add search capabilities to mobile and cloud applications. Generally available today, Azure HDInsight, our Hadoop-based solution for the cloud, now supports Apache HBase clusters.

With these new and updated services, we’re continuing to make it easier for customers to work with data of any type and size – using the tools, languages and frameworks they want to — in a trusted cloud environment. From Microsoft products like Azure Machine Learning, Azure SQL Database and Azure HDInsight to data services from our partners, we’re committed to supporting the broadest data platform so our customers get data benefits, in the cloud, on their terms.

Preview of Azure DocumentDB

Applications today must support multiple devices, multiple platforms with rapid iterations from the same data source, and also deliver high-scale and reliable performance. NoSQL has emerged as the leading database technology to address these needs. According to Gartner inquiries, flexible data schemas and application development velocity are cited as primary factors influencing adoption. Secondary factors attracting enterprises are global replication capabilities, high performance and developer interest.*

However, while NoSQL technologies address some document database needs, we’ve been hearing feedback that customers want a way to bridge document database functionality with the transactional capabilities of relational databases. Azure DocumentDB is our answer to that feedback – it’s a NoSQL document database-as-a-service that provides the benefits of a NoSQL document database but also adds the query processing and transaction semantics common to relational database systems.

Built for the cloud, Azure DocumentDB natively supports JSON documents enabling easy object mapping and iteration of data models for application development. Azure DocumentDB offers programming libraries for several popular languages and platforms, including.Net, Node.js, JavaScript, and Python. We will be contributing the client libraries to the open source community, so they can incorporate improvements into the versions published on Azure.com.

One DocumentDB customer, Additive Labs, builds online services to help their customers move to the cloud. "DocumentDB is the NoSQL database I am expecting today,” said Additive Labs Founder Thomas Weiss. “The ease and power of SQL-like queries had me started in a matter of minutes. And the ability to augment the engine’s behavior with custom JavaScript makes it way easier to adapt to our customers’ new requirements.”

Preview of Azure Search

Search has become a natural way for users to interact with applications that manage volumes of data.  However, managing search infrastructure at scale can be difficult and time consuming and often requires specialized skills and knowledge. Azure Search is a fully-managed search-as-a-service that customers can use to integrate complete search experiences into applications and connect search results to business objectives through fine-tuned, ranking profiles. Customers do not have to worry about the complexities of full-text search or deploying, maintaining or managing a search infrastructure.

With Azure Search developers can easily provision a search service, quickly create and tune one or more indexes, upload data to be indexed and start issuing searches. The service offers a simple API that’s usable from any platform or development environment and makes it easy to integrate search into new or existing applications. With Azure Search, developers can use the Azure portal or management APIs to increase or decrease capacity in terms of queries per second and document count as load changes, delivering a more cost effective solution for search scenarios. 

Retail platform provider Xomni is already using Azure Search to help the company manage its cloud infrastructure. "We have the most technically advanced SaaS solution for delivering product catalogue data to the retail industry in the market today,” said Xomni CTO Daron Yondem. “Integrating Azure Search into our platform will help solidify our leadership as datasets and faceted search requirements evolve over time."

General availability of Apache HBase for HDInsight

In partnership with Hortonworks, we’ve invested in the Hadoop ecosystem through contributions across projects like Tez, Stinger and Hive. Azure HDInsight, our Hadoop-based service, is another outcome of that partnership.

Azure HDInsight combines the best of Hadoop open source technology with the elasticity and manageability that enterprises require. Today, we’re making generally available HBase as a managed cluster inside HDInsight. HBase clusters are configured to store data directly in Azure Blob storage. For example, customers can use HDInsight to analyze large datasets in Azure Blobs generated from highly-interactive websites or can use it to analyze sensor and telemetry data from millions of end points.

Microsoft data services

Azure data services provide unparalleled choice for businesses, data scientists, developers and IT pros with a variety of managed services from Microsoft and our partners that work together seamlessly and connect to our customers’ data platform investments– from relational data to non-relational data, structured data to unstructured data, constant and evolving data models. I encourage you to try out our new and expanded Azure data services and let us know what you think.

*Gartner, Hype Cycle for Information Infrastructure, 2014, Mark Beyer and Roxane Edjlali, 06 August 2014

 

###

Real world use cases of the Microsoft Analytics Platform System

This blog post was authored by: Murshed Zaman, AzureCAT PM and Sumin Mohanan, DS SDET

With the advent of SQL Server Parallel Data Warehouse (the MPP version of SQL Server) V2 AU1 (Appliance Update 1), PDW got a new name: the Analytics Platform System [Appliance] or APS. The name changed with the addition of Microsoft’s Windows distribution of Hadoop (HDInsight or HDI) and PDW sharing the same communication fabric in one appliance. Customers can buy an APS appliance with PDW or with PDW and HDI in configurable combinations.

Used in current versions of PDW, Polybase is a technology that allows PDW users to query HDFS data. SQL users can quickly get results from Hadoop data without learning Java or C#.

Features of Polybase include:

  1. Schematization of Hadoop data in PDW as external tables
  2. Querying Hadoop data
  3. Querying Hadoop data and joining with PDW tables
  4. High speed export and archival of PDW data into Hadoop
  5. Creating persisted tables in PDW from Hadoop data 

In V2AU1 Polybase improvements include:

  1. Predicate push-down for queries in Hadoop as Map/Reduce jobs
  2. Statistics on Hadoop data in PDW

Another new feature introduced in PDW V2AU1 is the capability to query data that resides in Microsoft Azure Storage Accounts. Just like HDFS data, PDW can place a schema on data in Microsoft Azure Storage Accounts and move data from PDW to Azure and back.

The APS with these new features and improvements has become a first-class citizen in analytics for any type of data. Any company that has Big Data requirements and wants a highly scale-out Data Warehouse appliance can use APS.

Here are four cases that illustrate how different industries are leveraging APS:

One: Retail brand vs. Name brand

Retail companies that use PDW who also want to harvest and curate data from their social analytics sites. This data provides insights into their products and understand the behaviors of the customers. Using APS, the company can offer the right promotion at the right time and to the right demographics. Data also allows the companies to find brand recommendation coming from a friend, relative or a trusted support group that can be much more effective than marketing literature alone. By monitoring and profiling social media, these companies can also gain a competitive advantage.

Today’s empowered shoppers want personalized offers that appeal to their emotional needs. Using social media retailers offer promotions that are tailored to individuals using real-time analytics. This process starts by ranking blogs, forums, Twitter feed and Facebook posts for predetermined KPIs revealed in these posts and conversations. Retail organizations analyze and use the data to profile shoppers to personalize future marketing campaigns. Measureable or sale data reveals the effectiveness of the campaign and the whole process starts again with the insight gained.

In this example, PDW houses the relational sale data and Hadoop houses the social emotions. PDW with built in HDI region gives the company the arsenal to analyze both data sources in a timely manner to be able to react and make changes.  

Retail store APS diagram:

Two: Computer Component Manufacturing

Companies that generate massive amounts of electronic test data can get valuable insights from APS. Test data are usually a good candidate for Hadoop due to its key-value type (JSON or XML) structure.

One example in this space is a computer component manufacturer. Due to the volume, velocity and variety of these (ie: Sort/Class) data a conventional ETL process can be very resource expensive. Using APS, companies can gain insight from their data by putting the semi-structured (key-value pair) data into an HDI-Region and other complementary structured data sources (ie: Wafer Electrical Test) into PDW. With the Polybase query feature these two types of data can easily be combined and evaluated for success/failure rates.

Computer Component Manufacturing Diagram:

Three: Game Analytic Platform for online game vendors

The PDW with HDI regions can offer a complete solution for online game companies, to derive insights from their data. MMORPG’s (Massively Multiplayer Online Role Playing Games) are good examples where APS can deliver value.  Game engines produce many transactional data (events like which avatar got killed in the current active game) and a lot of semi-structured data such as activity logs containing chat data and historical logs. PDW is well-suited to loading the transactional data in to the PDW workload and semi-structured data to the HDI region of APS. The data can then be used to derive insights such as: 

  1. Customer retention – Discovering when to give customers offers and incentives to keep them in the game
  2. Improving game experience – Discovering where customers are spending more time in the game, and improving in-game experience
  3. Detecting fraudulent gaming activities

Currently these companies deal with multiple solutions and products to achieve the goal. APS provides a single solution to power both their transactional and non-transactional analytics.

Four: Click stream analysis of product websites for targeted advertisement.

In the past, a relational database system was sufficient to satisfy the data requirements of a medium-scale production website. Ever-increasing competition and advancements in technology have changed the way in which websites interact with customers. Apart from storing data that customers explicitly provide the company, sites now record how customers interact with their website.  As an example, when a registered user browses a particular car model, additional targeted advertisements and offers can be sent to the user.

This scenario can be captured using collected clickstream data and the Hadoop eco-system. APS acts as the complete solution to these companies by offering the PDW workload to store and analyze transactional data, combined with HDI region to derive insights from the click-stream data.

This solution also applies to Third party companies that specialize in targeted advertising campaigns for their clients.

While “Big Data” is a hot topic, we very often receive questions from customers about the actual use cases that apply to them and how they can derive new business value from “Big Data.” Hopefully these use cases highlight how various industries can truly leverage their data to mine insights that deliver business value in addition to showcasing how traditional data warehouse capabilities work together with Hadoop

Visit the Microsoft Analytics Platform System page to learn more. 

Virginia Tech Exec Q&A

Virginia Tech is using the Microsoft Azure Cloud to create cloud-based tools to assist with medical breakthroughs via next-generation sequence (NGS) analysis. This NGS analysis requires both big computing and big data resources. A team of computer scientists at Virginia Tech is addressing this challenge by developing an on-demand, cloud-computing model using the Azure HDInsight Service. By moving to an on-demand cloud computing model, researchers will now have easier, more cost-effective access to DNA sequencing tools and resources, which could lead to even faster, more exciting advancements in medical research.

We caught up with Wu Feng, Professor in the Department of Computer Science and Department of Electrical & Computer Engineering and the Health Sciences at Virginia Tech, to discuss the benefits he is seeing with cloud computing.

Q: What is the main goal of your work?

We are working on accelerating our ability to use computing to assist in the discovery of medical breakthroughs, including the holy grain of “computing a cure” for cancer. While we are just one piece of a giant pipeline in this research, we seek to use computing to more rapidly understand where cancer starts in the DNA. If we could identify where and when mutations are occurring, it could provide an indication of which pathways may be responsible for the cancer and could, in turn, help identify targets to help cure the cancer. It’s like finding a “needle in a haystack,” but in this case we are searching through massive amounts of genomic data to try to find these “needles” and how they connect and relate to each other “within the haystack.”

Q: What are some ways technology is helping you?

We want to enable the scientists, engineers, physicists and geneticists and equip them with tools so they can focus on their craft and not on the computing. There are many interesting computing and big data questions that we can help them with, along this journey of discovery.

Q: Why is cloud computing with Microsoft so important to you?

The cloud can accelerate discovery and innovation by computing answers faster, particularly when you don’t have bountiful computing resources at your disposal. It enables people to compute on data sets that they might not have otherwise tried because they didn’t have ready access to such resources.

For any institution, whether a company, government lab or university, the cost of creating or updating datacenter infrastructure, such as the building, the power and cooling, and the raised floors, just so a small group of people can use the resource, can outweigh the benefits. Having a cloud environment with Microsoft allows us to leverage the economies of scale to aggregate computational horsepower on demand and give users the ability to compute big data, while not having to incur the institutional overhead of personally housing, operating and maintaining such a facility.

Q: Do you see similar applications for businesses?

Just as the Internet leveled the playing field and served as a renaissance for small businesses, particularly those involved with e-commerce, so will the cloud. By commoditizing “big data” analytics in the cloud, small businesses will be able to intelligently mine data to extract insight with activities, such as supply-chain economics and personalized marketing and advertising.

Furthermore, quantitative analytic tools, such as Excel DataScope in the cloud, can enable financial advisors to accelerate data-driven decision-making via commoditized financial analytics and prediction. Specifically, Excel DataScope delivers data analytics, machine learning and information visualization to the Microsoft Azure Cloud.

In any case, just like in the life sciences, these financial entities have their own sources of data deluge. One example is trades and quotes (TAQ), where the amount of financial information is also increasing exponentially. Unfortunately, to make the analytics process on the TAQ data a more tractable one, the data is often triaged into summary format and thus could potentially and inadvertently filter out critical data that should not have been.

Q: Are you saving money or time or experiencing other benefits?

Back when we first thought of this approach, we were wondering if it would even a feasible solution for the cloud. For example, with so much data to upload to the cloud, would the cost of transferring data from the client to the cloud outweigh the benefits of computing in the cloud?  With our cloud-enabling of a popular genome analysis pipeline, combined with our synergistic co-design of the algorithms, software, and hardware in the genome analysis pipeline, we realized about a three-fold speed-up over the traditional client-based solution.

Q: What does the future look like?

There is big business in computing technology, whether it is explicit, as in the case of personal computers and laptops, or implicit, as in the case of smartphones, TVs or automobiles. Just look how far we have come over the past seven years with mobile devices. However, the real business isn’t in the devices themselves, it’s in the ecosystem and content that supports these devices: the electronic commerce that happens behind the scenes. In another five years, I foresee the same thing happening with cloud computing. It will become a democratized resource for the masses. It will get to the point where it will be just as easy to use storage in the cloud as it will be to flip a light switch; we won’t think twice about it. The future of computing and data lies in the cloud, and I’m excited to be there as it happens.

 

For more information about Azure HDInsight, check out the website and start a free trial today.

The Microsoft Infinity Room Photo Contest Has a Winner!

Congratulations to Edgar Rivera, whose Microsoft Infinity Room photo won the #InsightsAwait Photo Sweepstakes. “I never thought that stepping into some data visualization could be this cool,” Edgar tweeted. You can see his photo here.

Visitors to the Microsoft Infinity Room were invited to capture their experiences and tag their photos on Twitter or Instagram with the #InsightsAwait hashtag. You can view all of the contest entries here.

If you didn’t have a chance to visit the Infinity Room in San Francisco from April 15-17, take the 360-degree virtual tour and be inspired by the extraordinary found through data surrounding an ordinary object.

Also – want to learn more about Microsoft Big Data solutions? Hear CEO Satya Nadella discuss Microsoft’s drive towards a data culture during the Accelerate your insights event in San Francisco earlier this month. Watch the keynote on-demand now.

ICYMI: Data platform momentum

The last couple months have seen the addition of several new products that extend Microsoft’s data platform offerings.  

At the end of January, Quentin Clark outlined his vision for the complete data platform, exploring the various inputs that are driving new application patterns, new considerations for handling data of all shapes and sizes, and ultimately changing the way we can reveal business insights from data.

 In February, we announced the general availability of Power BI for Office 365, and you heard from Kamal Hathi about how this exciting release simplifies business intelligence and how features like Power BI sites and Power BI Q&A, Power BI helps anyone, not just experts, gain value from their data. You also heard from Quentin Clark about how Power BI helps make big data work for everyone by bringing together easy access to data, robust tools that everyone can use, and a complete data platform.

In March, we announced that SQL Server 2014 would be general available beginning April 1, and shared how companies are already taking advantage of in-memory capabilities and hybrid cloud scenarios that SQL Server enables. Shawn Bice explored the platform continuum, and how with this latest release, developers can continue to use SQL Server on-premises while also dipping their toes into the possibilities with the cloud using Microsoft Azure. Additionally, Microsoft Azure HDInsight was made generally available to support Hadoop 2.2, making it easy to deploy Hadoop in the cloud.

 And earlier this month at the Accelerate your insights event in San Francisco, CEO Satya Nadella discussed Microsoft’s drive towards a data culture. In addition, we announced two other key capabilities to extend the robustness of our data platform: the Analytics Platform System, an evolution of the Parallel Data Warehouse with the addition of a Hadoop region for your unstructured data, and then a preview of the Microsoft Azure Intelligent Systems Service to help tap into the Internet of Your Things. In case you missed it, watch the keynotes on-demand, and don’t miss out on experiencing the Infinity Room, to inspire you with the extraordinary things that can be found in your data.

On top of our own announcements, we’ve been recently honored to be recognized by Gartner as a Leader in the 2014 Magic Quadrants for Data Warehouse Database Management Systems and Business Intelligence and Analytics Platforms. And SQL Server 2014, in partnership with Hewlett Packard, set two world records for data warehousing performance and price/performance.

With these enhancements across the entire Microsoft data platform, there is no better time than now to dig in. Learn more about our data platform offerings. Brush up on your technical skills for free on the Microsoft Virtual Academy. Connect with other SQL Server experts through the PASS community. Hear from Microsoft’s engineering leaders about Microsoft’s approach to developing the latest offerings. Read about the architecture of data-intensive applications in the cloud computing world from Mark Souza, which one commenter noted was a “great example for the future of application design/architecture in the Cloud and proof that the toolbox of the future for Application and Database Developers/DBAs is going to be bigger than the On-Prem one of the past.” And finally, come chat in-person – we’ll be hanging out at the upcoming PASS Business Analytics and TechEd events and are eager to hear more about your data opportunities, challenges, and of course, successes.

What can your data do for you?