Category Archives: Data

PASS SQL Saturday #356 Slovenia Recapitulation

So the event is over. I think I can say for all three organizers, Mladen Prajdić, Matija Lah, and me, that we are tired now. However, we are extremely satisfied. It was a great event. First few numbers and comparison with SQL Saturday #274, the first SQL Saturday Slovenia event that took place last year.

SQL Saturday #274

SQL Saturday #356

People

135

220

Show rate

~87%

~95%

Proposed sessions

40

82

Selected sessions

15

24

Selected speakers

14

23

Countries

12

16

The numbers nearly doubled. We are especially proud of the show rate; with 95%, this is much better than average for a free event, and probably the highest so far for a SQL Saturday. We asked registered attendees to be fair and to unregister if they know they can’t attend the event in order to make room for those from the waiting list. An old Slovenian proverb says “A nice word finds a nice place”, and it works. 36 registered attendees unregistered. Therefore, we have to thank to both, the attendees of the event and those who unregistered.

Of course, as always, we also need to thank to all of the speakers, sponsors and volunteers. All volunteers were very helpful; however, I would like to especially point out Saša Mašič. Her work goes well beyond simple volunteering. I must mention also the FRI, the Faculty of Computer and Information Science, where the event was hosted for free. It is also worth mentioning that we are lucky to live in Ljubljana, such a beautiful city with extremely nice inhabitants who like to enjoy good food, hanging around and mingling, and long parties. Because of that we could be sure in advance that both speakers and attendees from other countries would enjoy spending time here also outside the event, that they would feel safe, and get help whenever they would need it.

From the organizational perspective, we tried to do our best, and we hope that everything was OK for speakers, sponsors, volunteers, and attendees. Thank you all!

Data Modeling Resources

You can find many different data modeling resources. It is impossible to list all of them. I selected only the most valuable ones for me, and, of course, the ones I contributed to.

  • Books
    • Chris J. Date: An Introduction to Database Systems – IMO a “must” to understand the relational model correctly.
    • Terry Halpin, Tony Morgan: Information Modeling and Relational Databases – meet the object-role modeling leaders.
    • Chris J. Date, Nikos Lorentzos and Hugh Darwen: Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL – all theory needed to manage temporal data.
    • Louis Davidson, Jessica M. Moss: Pro SQL Server 2012 Relational Database Design and Implementation – the best SQL Server focused data modeling book I know by two of my friends.
    • Dejan Sarka, et al.: MCITP Self-Paced Training Kit (Exam 70-441): Designing Database Solutions by Using Microsoft® SQL Server™ 2005 – SQL Server 2005 data modeling training kit. Most of the text is still valid for SQL Server 2008, 2008 R2, 2012 and 2014.
    • Itzik Ben-Gan, Lubor Kollar, Dejan Sarka, Steve Kass: Inside Microsoft SQL Server 2008 T-SQL Querying – Steve wrote a chapter with mathematical background, and I added a chapter with theoretical introduction to the relational model.
    • Itzik Ben-Gan, Dejan Sarka, Roger Wolter, Greg Low, Ed Katibah, Isaac Kunen: Inside Microsoft SQL Server 2008 T-SQL Programming – I added three chapters with theoretical introduction and practical solutions for the user-defined data types, dynamic schema and temporal data.
    • Dejan Sarka, Matija Lah, Grega Jerkič: Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft SQL Server 2012 – my first two chapters are about data warehouse design and implementation.
  • Courses
    • Data Modeling Essentials – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses dsarka@solidq.com or dsarka@siol.net. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well.
    • Logical and Physical Modeling for Analytical Applications – online course I wrote for Pluralsight.
    • Working with Temporal data in SQL Server – my latest Pluralsight course, where besides theory and implementation I introduce many original ways how to optimize temporal queries.
  • Forthcoming presentations
    • SQL Bits 12, July 17th – 19th, Telford, UK – I have a full-day pre-conference seminar Advanced Data Modeling Topics there.

Working with Temporal Data in SQL Server

My third Pluralsight course, Working with Temporal Data in SQL Server, is published. I am really proud on the second part of the course, where I discuss optimization of temporal queries. This was a nearly impossible task for decades. First solutions appeared only lately. I present all together six solutions (and one more that is not a solution), and I invented four of them. http://pluralsight.com/training/Courses/TableOfContents/working-with-temporal-data-sql-server

SQL Server 2012 Reporting Services Blueprints Review

I had opportunity to read the SQL Server 2012 Reporting Services Blueprints book by Marlon Ribunal (@MarlonRibunal) and Mickey Stuewe (@SQLMickey), Packt Publishing. Here is my short review.

5085EN

I find the book very practical. The authors guide you right to the point, without unnecessary obstructions. Step by step, you create more and more complex reports, and learn SQL Server Reporting Services (SSRS) 2012 features. If you are more hands-on guy, this is the right book for you. Well, to be honest, there is not much theory in reporting. Reporting is the starting point of business intelligence (BI), with on-line analytical processing as the next step, and data mining as the I in BI.

The book correctly presents all SSRS features. In some chapters and appendices, the authors also went beyond basics. I especially enjoyed advanced topics in chapter 5, “Location, Location, Locations!”, appendix A, “SSRS Best Practices”, and appendix B, “Transactional Replication for Reporting Services”. All together, the book is more than sufficient for creating nice reports without much previous knowledge in a short time, and for avoiding common pitfalls at the same time.

The only thing I am missing is a bit more of theory. Yes, as a data mining person, I like to learn things a bit more in depth. I usually don’t deal with presentation; I prefer numbers. And this is my point – I would like to see more guidelines about proper usage of report elements – when to use which graph type, when to use maps, how to properly present different kinds of data…

Anyway, all together, the book is very useful, and I would recommend it to anybody that wants to learn SSRS in a short time.

SQL Saturday #274 Slovenia Recapitulation

Pure success!

I could simply stop here. However, I want to mention again everybody involved in this, and also some who were unfortunately missing.

First of all, PASS is the organization that defined SQL Saturdays. And apparently the idea worksSmile

I have to thank again to all of the speakers. Coming to share your amazing knowledge is something we really appreciate. The presentations were great, from the technical and other perspectives.

Of course, we could not do the event without sponsors. I am not going to enlist all of them again; I will just mention the host, pixi* labs, the company that hosted the event and who’s  employees helped with all of the organization. In addition, I need to mention Vina Kukovec. Boštjan Kukovec, an old member of Slovenian SQL Server and Developers users group, organized free wine tasting after the event. And what a wine it is!

Finally, thanks to all attendees for coming. We had approximately 85% show up rate; only 15% or the registered attendees didn’t come. This is an incredible result, worldwide! And from the applause after the raffle, when we closed the vent (and started wine tasting), I conclude that the attendees were very satisfied as well.

I want to mention three people that wanted to come, but run out of luck this time. Peter Stegnar from pixi* labs was the one that immediately offered the venue, and permeated with his enthusiasm also other pixi* labs members. Due to family reasons he couldn’t join us to see the results of his help. Tobiasz Janusz Koprowski wanted to speak, organized his trip, looked forward to join us; however, just couple of days before the event he had to cancel because of some urgent work at customer’s site. And what to say about Kevin Boles? He really tried hard to come. Think of it, he was prepared to come from USA! he was already on the airport, when his flight got cancelled due to technical problems. We were in constant touch Friday evening. He managed to change the flight, went to the gate, but was not admitted to the plane. because there was only seven minutes left till take off. Catching next flights would not make any sense anymore, because he would come too late anyway. He really did the best he could do, he just didn’t have enough luck this time. Peter, Tobiasz, and Kevin, thank you for your enthusiasm, we seriously missed you, and we certainly hope we will meet on our next event!

Data Science and the Cloud

More than perhaps any other computing discipline, Data Science lends itself best to Cloud Computing in general, and Windows Azure in specific. That’s a big claim, but before I offer some evidence, I need to explain what I mean by “Data Science”. I’ve written before on Data Science (http://blogs.msdn.com/b/buckwoody/archive/2012/10/16/is-data-science-science.aspx, and https://www.simple-talk.com/cloud/data-science/data-science-laboratory-system—keyvalue-pair-systems/ ), but since it’s an evolving field, here’s what I’ve observed as the areas that a Data Scientist focuses on:

  • Research – Standard researching techniques such as domain knowledge, data sources and impact analysis
  • Statistics – Probability and descriptive statistics-focused
  • Programming – At least one functional or object-oriented language, often Python, F#, LISP, Haskell or Java and Javascript
  • Sources of data – Internal organizational data as well as external sources such as weather, economics, spatial, geo-political sources and more
  • Data movement – Traditional Extract, Transform and Load (ETL), along with ingress or referencing external data sources
  • Complex Event Processing (CEP) – Analyzing or triggering computing as data moves through a source
  • Data storage – Storage systems including distributed storage and remote storage
  • Data processing – Both single-node and distributed processing systems, RDBMS, NoSQL (Hadoop, Key/Value Pair, Document Store, Graph databases, etc)
  • Machine learning – Data-instructive programming as well as Artificial Intelligence and Natural Language Processing
  • Decision analysis – Interpreting the processing of data to identify a pattern, make a prediction, and data mining
  • Business Intelligence – Design of exploratory data, visualizations, business and organization impacts and communication to the stakeholders of the use of data and visualization tools

There are of course other aspects of data science, but I believe this list covers the majority of skills I’ve seen in individuals with the Data Scientist title. And it is normally an individual, or at least a very limited group of people. as you examine the list above, you can see this person requires a fairly extensive technical background, and in the domain knowledge area in specific, there’s a pretty large time element. That isn’t to say a very bright person couldn’t ramp up on these areas, just that having all of that in your portfolio takes time.

Given that these are the skillsets, why is cloud computing well suited to assisting in the data science function?

It’s obvious that a researcher needs good Internet skills, beyond simply referencing a Wikipedia article – although that’s certainly a good thing to include from time to time. While searching isn’t specific to Windows Azure, there are platform components that allow the programming function to call out to the web for data access. Windows Azure includes a platform that allows languages from Python to F#, JavaScript (Including NodeJS), Java and more.

Cloud computing allows the data scientist to access data stored in Windows Azure (Blobs, Tables, Queues, RDBMS’s as a service such as SQL Server and MySQL) as well as IaaS systems that can run full RDBMS systems such as SQL Server, Oracle, PostGreSQL and others. In addition, the Windows Azure Marketplace contains “Data as a Service” which has free and fee-based data to include in a single application.

The Windows Azure Service Bus allows architecting a CEP system, and using SQL Server allows the StreamInsight feature, and can communicate from on-premises, Windows Azure IaaS and PaaS, and other data sources.

For data storage and computing, Windows Azure allows everything from traditional RDBMS’s as described to any NoSQL in IaaS, on both Windows and Linux operating systems. Statistical packages such as “R” are also supported. The elasticity allows the data scientist to spin up huge clusters, such as Hadoop or other NoSQL offerings, perform some analysis, and then stop the process when complete, saving cost, and bypassing the internal IT systems (which may have its own dangers, to be sure).  Windows Azure also offer the High Performance Computing (HPC) computing version of Windows Server on Windows Azure, for large-scale massively parallel data processing, in constant and “burst” modes.

In addition, Windows Azure has many services, such as the HDInsight Service (Hadoop on demand) and other analysis offerings that don’t even require the data scientist to stand up and manage a Virtual Machine in IaaS. For visualization, Microsoft has included the ability to use Excel with the HDInight Service, and of course that works with all Microsoft Business Intelligence functions, and there are several other data visualization tools such as Power View . You can enter the tools you have in the Microsoft stack in this tool (http://www.microsoft.com/en-us/bi/Products/bi-solution-builder.aspx) for more on the visualization options you have. The data scientist can also build visualizations in web pages, on iPhone, Android or Windows mobile devices, or in full client-code installations.

Because the need for elasticity, multiple operating systems, and changing landscapes for data and processing, data science is well served by cloud computing – and in Windows Azure in particular because of the services and features offered, not only on Microsoft Windows but Open Source.

 

How Does the Cloud Change a Developer’s Job?

I’ve recently posted a blog on how cloud computing would change the Systems Architect’s role in an organization, another on how the cloud changes a Database Administrator’s job, and the last post dealt with the Systems Administrator. In this post I’ll cover the changes facing the Software Developer when using the cloud.

The software developer role was the earliest adopter of cloud computing. This makes perfect sense, because the software developer has always used computing “as a service” – they (most often) don’t buy and configure servers, platforms and the like, they write code that runs on those platforms. And there’s probably not a simpler definition of a software developer to be found, but as with all simple statements, you lose fidelity and detail.  I’ll offer a more complete list in a moment.

Because the software developer’s process involves designing, testing and writing code locally and then migrating it to a production environment, all of the paradigms in cloud computing – from IaaS to PaaS to SaaS – come naturally.

The Software Developer’s Role

The software developer has evolved since the earliest days of programming.The software developer not only “writes code”  – there are far more tasks involved in modern systems development:

  • Assisting the Business Role(s) in developing software specifications
  • Planning software system components and modules
  • Designing system components
  • Working in teams writing classes, modules, interfaces and software endpoints
  • Designing data layouts, architectures, access and other data controls
  • Designing and implementing security, either programmatic, declarative, or referential
  • Mixing and matching various languages, scripting and other constructs within the system
  • Designing and implementing user and account security rights and restrictions
  • Designing various software code tests – unit, functional, fuzz, integration, regression, performance and others
  • Deploying systems
  • Managing and maintaining code updates and changes

Like most of the previous roles, those tasks also unpacks into a larger set of tasks, and no single developer has exactly that same list. And like the DBA, the role is often more, or less of that list based on where the developer works. Smaller companies may include the development platform in the duties so that a developer is also a systems administrator. In larger organizations I’ve seen developers that specialized on User Interfaces, Engine Components, Data Controls or other specific areas.

How the Cloud Changes Things

The software developer role obviously has the same concerns and impacts of “the cloud” as the Systems Architect. They need to educate themselves on the options within this new option (Knowledge), try a few test solutions out (Experience) and of course work with others on various parts of the implementation (Coordination).

The big changes for a developer include three major areas: Hybrid Software Design, Security, and Distributed Computing.

Hybrid Software Design

After the PC revolution, software developers designed systems that ran primarily on a single computer. From there the industry moved to “client/server”, where most of the code still lived on the user’s workstation, and various levels of state (such as the data layer) moved to a server over fast connected lines. After than followed the Internet phase, which had less to do with HTML coding than it did with state-less architectures. While no architecture is truly stateless, there are ways of allowing the client to be in a different state than the server of the application at any one time – this is the way the Web works.

Even so, the developer often simply moved one the primary layers (such as Model, View or Controller) to the server, using the User Interface merely as the View or Presentation layer. While technically stateless, this doesn’t require a great deal of architecture change – there are various software modules that run on a server, and perhaps that connects to a remote data server. In the end, it’s still a single paradigm. 

We now have the ability to run IaaS (hardware abstraction), PaaS (hardware, operating system and runtime abstraction) and SaaS (everything abstracted, API calls only) in a single environment such as Windows Azure. A single application might have a Web-based Interface Server with federated processes  (using a PaaS set of roles), a database service (using a SaaS provider such as Windows Azure SQL Database), a specialized process in Linux (using an IaaS role in Windows Azure) and a translator API (from the Windows Azure Marketplace). This example involves only one vendor – Microsoft. I’ve seen applications that use multiple vendors in this same way.

Thinking this way opens up a great deal of flexibility – and complexity. Complexity isn’t evil; it’s how complicated things get done many times. The modern developer  needs to understand how to build hybrid software architectures.

Resources: Hybrid Architectures with step-by-step instructions and examples: http://msdn.microsoft.com/en-us/library/hh871440.aspx and Windows Azure Hybrid Systemshttp://msdn.microsoft.com/en-us/library/hh871440.aspx?AnnouncementFeed 

Security

Having a single security boundary, such as “everyone who works in my company”, is a relatively simple problem to solve. Normally the System Administrators configure and control a security provider, such as Active Directory, and developers can access that security layer programmatically.  That allows for good separation of duties and role-based control.

In modern applications, clients, managers, and users both internal and external need various levels of access to the same objects, code and data. A client should be able to enter an order, a store should be able to accept the order, the credit-card company should be able to check the order and authorize payment, and the managers should be able to report on the order or change it if needed. Using role-based security across multiple domains would be impossible to maintain.

Enter “claims-based” authentication. In this paradigm, the user logs in with whatever security they use – corporate or other Active Directory, Facebook, Google, whatever. The application (using Windows Identity Foundation or WIF) can accept a “claim” from that provider, and the developer can match whatever parts of that claim they wish to the objects, code and data. And example might be useful.

Buck logs in to his corporate Active Directory (AD), and attempts to use a program based in Windows Azure. Windows Azure rejects the login silently, and is configured to check with Buck’s AD. Buck’s AD says “yes, I know Buck, and he has been granted the following claims: “partner”, “manager”, “approver”. The developer does not need to know about Buck’s AD, Buck, his login, or anything else. She simply codes the proper data access to allow “approver” to approve a sale. 

This allows a lot of control, at a very fine level, without having to get into the details of each security provider. .

Resources: Overview of using claims-based Azure Security: http://adnanboz.wordpress.com/2011/02/06/claims-based-access-and-windows-azure/

Distributed Computing

Is there a difference between stateless computing, or even the hybrid programming I mentioned earlier, and “Distributed Computing”? Yes – the primary difference is latency. Even stateless code can have too small a tolerance for latency. 

Dealing with slow connectivity, or breaks in connections has many impacts. One method of dealing with this is to locate data and computing of that data as closely as possible, even if this means relaxing consistency or duplicating data. Another method is to go back to a great paradigm from the past that is possible underused today is a Service Oriented Architecture. The Windows Azure Service Bus is possibly one of the fastest and easiest way to adopt cloud computing without completely rearchitecting your application.

References: Great breakdown of the thought process around a distributed architecture: http://msdn.microsoft.com/en-us/magazine/jj553517.aspx and using a Windows Azure Relay Service: http://www.windowsazure.com/en-us/develop/net/how-to-guides/service-bus-relay/ 

How Does the Cloud Change a Database Administrator’s Job?

I recently posted a blog entry on how cloud computing would change the Systems Architect’s role in an organization. In a way, the Systems Architect has the easiest transition to a new way of using computing technologies. In fact, that’s actually part of the job description. I mentioned that a Systems Architect has three primary vectors to think about for cloud computing, as it applies to what they should do:

  1. Knowledge – Which options are available to solve problems, and what are their strengths and weaknesses.
  2. Experience – What has the System Architect seen and worked with in the past.
  3. Coordination – A system design is based on multiple factors, and one person can’t make all the choices. There will need to be others involved at every level of the solution, and the Systems Architect will need to know who those people are and how to work with them.

The Database Administrator Role

But a Database Administrator (DBA) is probably one of the harder roles to think about when it comes to cloud computing. First, let’s define what a Database Administrator usually thinks about as part of their job:

  • Planning, Installing and Configuring a Database Platform
  • Planning, designing and creating databases
  • Planning, designing and implementing High Availability and Disaster Recovery for each database (HADR) based on requirements for its workload
  • Maintaining and monitoring the database platform
  • Implementing performance tuning on the databases based on monitoring
  • Re-balancing workloads across database servers based on monitoring
  • Securing databases platforms and individual databases based on requirements and implementation

That’s just a short list, and each of those unpacks into a larger set of tasks.

The issue is that I’ve never actually met a DBA that does all of those things, or just all of those things. Many times they do much more, sometimes the systems are so large they specialize on just a few of them.

And as you can see from the list, some of these areas are shared with other roles. For instance, in some shops, the DBA plans, purchases, sets up and configures the hardware for database servers. In others that’s done
by the Infrastructure Team. In some shops the DBA designs databases from software requirements, and in others the developers do that – or perhaps it’s done as a joint effort. The same holds true for database code – sometimes the
DBA does it, other times the developer, and still others it’s a shared task.

In fact, you could argue that there are few other roles in IT where the roles are so intermixed. Also, the DBA works with software the company develops, and software the company buys. They work with hardware, networking, security and software. There are certain aspects of design and tuning that are outside the purview of some of those things, and inside the others.

With all of these variables, simply telling a DBA that they should “use the cloud” is not the proper approach.

How the Cloud Changes Things

To be sure, the DBA has the same vectors as the Systems Architect. They need to educate themselves on the options within this new option (Knowledge), try a few test solutions out (Experience) and of course work with others on various parts of the implementation (Coordination). But it goes beyond that.

There are three big buckets of cloud computing, dealing with simply using a Virtual Machine (IaaS) to writing code without worrying about the virtualization or even the operating system (PaaS) and using software that’s already written and being delivered via an Application Programming Interface (API). Each of these has so many options and configurations that it’s often better to think about the problem you’re trying to solve rather than all of the technology within a given area – although some of that is certainly necessary anyway. 

Database Platform Architecture

I’ll start with when the DBA should even consider cloud computing for a solution. Once again, it’s not an “all or nothing” paradigm, where you either run something on premises or in the cloud – it’s often a matter of selecting the right components to solve a problem.  In my design sessions with DBA’s I break these down into three big areas where they might want to consider the cloud –and then we talk about how to implement each one:

  1. Audiences
  2. HADR
  3. Data Services

Audiences

If the users of your database systems all sit in the same facility, you own the servers and networking, and the application servers are separate from the database server, it doesn’t usually make sense to take that database workload and place it on Windows Azure – or any other cloud provider. The latency alone prevents a satisfactory performance profile, and in some cases won’t work at all. It doesn’t matter if the cloud solution is cheaper or easier – if you’re moving a lot of data every second between an on-premises system and the cloud it won’t work well.

However – if your users are in multiple locations, especially globally, or you have a mix of company and external customer users, it might make sense to evaluate a shared data location. You still need to consider the implications of how much data the application server pushes back and forth, but you may be able to locate both the application server and SQL Server in an IaaS role. Assuming the data sent to the final client will work across public Internet channels, there may be a fit. There are security implications, but unless you have point-to-point connections for your current solution you’re faced with the same security questions on both options.

Your audience might also be developers looking for a way to quickly spin up a server and then turn it down when they are done, paying for the time and not the hardware or licenses. This is also a prime case for evaluating IaaS. And there are others that you’ll find in your own organization as you work through the requirements you have.  

Resources: Windows Azure Virtual Machines: http://www.windowsazure.com/en-us/manage/windows/tutorials/virtual-machine-from-gallery/ and Windows Azure SQL Server Virtual Machines: http://www.windowsazure.com/en-us/manage/windows/common-tasks/install-sql-server/

HADR

The next possible place to consider using cloud computing with SQL Server is as a part of your High Availability and Disaster Recovery plans. In fact, this is the most common use I see for cloud computing and the Database Administrator. The key is the Recovery Point Objective (RPO) and Recovery Time Objective (RTO). Based on each application’s requirements, you may find that using Windows Azure or even supplementing your current plan is
the right place to evaluate options. I’ve covered this use-case in more detail in another article.

References: SQL Server High Availability and Disaster Recovery options with Windows Azure: http://blogs.msdn.com/b/buckwoody/archive/2013/01/08/microsoft-windows-azure-disaster-recovery-options-for-on-premises-sql-server.aspx

Data Services

Windows Azure, along with other cloud providers, offers another way to design, create and consume data. In this use-case, however, the tasks DBA’s normally perform for sizing, ordering and configuring a system don’t apply.

With Windows Azure SQL Databases (the artist formerly known as SQL Azure), you can simply create a database and begin using it. There are places where this fits and others where it doesn’t, and there are differences, limitations and enhancements, so it isn’t meant as replacement for what you could do with “Full-up” SQL Server on a Windows Azure Virtual Machine or an on-premises Instance. If a developer needs an Relational Database Management
(RDBMS) data store for a web-based application, then this might be a perfect fit.

But there is more to data services than Windows Azure SQL Databases. Windows Azure also offers MySQL as a service, RIAK and MongoDB (among others) and even Hadoop for larger distributed data sets. In addition you can use Windows Azure Reporting Services, and also tap into datasets and data functions in the Windows Azure Marketplace.

The key for the DBA with this option is that you will have to do a little investigation this time, and potentially without a specific workload in mind this time. I think that’s acceptable thing to ask – DBA’s constantly keep up with data processing trends, and most will consider different ways to solve a problem.

References:

Windows Azure SQL Databases: http://www.windowsazure.com/en-us/home/features/data-management/

Windows Azure Reporting Services: http://www.windowsazure.com/en-us/manage/services/other/sql-reporting/

HDInsight Service (Hadoop on Azure): https://www.hadooponazure.com/

MongoDB Offerings on Windows Azure: http://www.windowsazure.com/en-us/manage/linux/common-tasks/mongodb-on-a-linux-vm/

Windows Azure Marketplace: http://www.windowsazure.com/en-us/store/overview/

 

How Does the Cloud Change a Systems Architect’s Job?

I know – I said I didn’t like the “cloud” term, but my better-phrased “Distributed Systems” moniker just never took off like I had hoped. So I’ll stick with the “c” word for now, at least until the search engines catch up with my more accurate term.

I thought I might spend a little time on how the cloud affects the way we work – from Systems Architects to Database Administrators and Developers, and Systems Administrators – a group often referred to as “IT Pro’s”. But each role within these groups have different aspects when using cloud computing. In this post we’ll take a look at the role of the Systems Architect, and in the posts that follow I’ll talk more about the other roles in the IT Pro area.

The Systems Architect Role

What does a “Systems Architect” do? Like most IT roles, it depends on the company or organization where they work. In fact, the term isn’t even specific to technology, but I’ll use it in that context here. In general, a Systems Architect takes the requirements for a given system, and assembles the relevant technology areas that best fulfill those requirements. That’s a single-sentence explanation, and needs further unpacking.

As an example, a Systems Architect at a medical firm is presented with a set of requirements for tracking a patient through the entire care cycle. The Systems Architect first looks at all of the requirements for the data that needs to be collected based on business, financial, regulations, and other requirements, and then how that data needs to flow from one system to another. They check the security requirements, performance, location and other aspects of the system. They then check to see which options are available for processing that data, and which parts they should “build or buy”.

For instance, the requirements might be so specific that only custom code is the proper solution – but even there, choices still exist, such as which language(s) to use, what type of data persistence (a Relational Database Management System or or other data storage and processing) will be used, what talent within the company is available for the system and a myriad of other decision.

All of this boils down to three primary vectors:

  1. Knowledge – Which options are available to solve problems, and what are their strengths and weaknesses.
  2. Experience – What has the System Architect seen and worked with in the past.
  3. Coordination – A system design is based on multiple factors, and one person can’t make all the choices. There will need to be others involved at every level of the solution, and the Systems Architect will need to know who those people are and how to work with them.

How the Cloud Changes Things

From the outset, it doesn’t seem that using a distributed system would change anything in the Systems Architect role. Isn’t the cloud simply another option that the Systems Architect needs to learn and apply? Yes, that is true – but it goes a bit deeper. Let’s return to those vectors a moment to see what a Systems Architect needs to take into account.

Knowledge

The first and probably most obvious impact is learning about cloud technologies. But the important part of that knowledge is to learn when and where to use each service. It’s a common misconception that the cloud should be an “all or nothing” approach. That’s just not true – every Windows Azure project I work on has some element of on-premises interaction, and in some cases only one small part of a solution is placed on the Windows Azure architecture. Since Windows Azure contains IaaS (VM’s) PaaS (you write code, we run it)  and even SaaS (Such as Hadoop or Media Services), a given architecture can use multiple components even within just one provider. And I’ve worked on several projects where the customer used not only Windows Azure and On-Premises environments, but also components from other providers. That’s not only acceptable, but often the best way to solve a given problem.

As part of the learning experience, it’s vital to keep in mind what you need to pick as key decision points. In your organization, cost could be ranked higher than performance, or perhaps security is the highest decision point.

To stay educated, there are various journals, websites and conferences that Systems Architects use to keep current. Almost all of those are talking about “cloud” – but there is no substitute for learning from the vendor about their solution. I’m speaking here of the technical information, not the marketing information. The marketing information is also useful, at least from a familiarity standpoint, but the technical information is what you need.

Resource: For Windows Azure, the Systems Architect can start here: http://blogs.msdn.com/b/buckwoody/archive/2012/06/13/windows-azure-write-run-or-use-software.aspx 

Experience

Cloud computing is relatively new – it’s only been out a few years, and the main competitors are only now settling in to their respective areas. It might not be common for a Systems Architect to have a lot of hands-on experience with cloud projects.

Even so, there are ways to leverage the experience of others, such as direct contact or even attending conferences where customers present findings from their experiences.

You can also gain hands-on experience by setting up pilots and proof-of-concept projects yourself. Most all vendors – Microsoft included – have free time available on their systems. The key to an experiment like this is choosing some problem you are familiar with that exercises as many features in the platform as possible. There is no substitute for working with a platform when you want to design a solution.

Coordination

Probably one of the largest changes in the Systems Architect role that the cloud brings is in the area of coordination. When a Systems Architect deals with the business and other technical professionals, there is a 20+ year history of technology that we are all familiar with. When you mention “the cloud”, those audiences may not have spent the time you have in understanding what that means – and often they think it means the “all or nothing” approach I mentioned earlier.

I’ve found that a series of “lunch and learns” for the technical staff is useful to explain to each role-group how the cloud is used in their area is useful. In the posts that follow this one, I’ll give you some material for those. For managers and business professionals, you’ll want to go a different route. I’ve found that an “Executive Briefing” e-mail, consisting of about a page, with headings that are applicable to your audience.

Resource: Writing Executive Summaries: http://writing.colostate.edu/guides/guide.cfm?guideid=76