Category Archives: Data modeling

Data Modeling Resources

You can find many different data modeling resources. It is impossible to list all of them. I selected only the most valuable ones for me, and, of course, the ones I contributed to.

  • Books
    • Chris J. Date: An Introduction to Database Systems – IMO a “must” to understand the relational model correctly.
    • Terry Halpin, Tony Morgan: Information Modeling and Relational Databases – meet the object-role modeling leaders.
    • Chris J. Date, Nikos Lorentzos and Hugh Darwen: Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL – all theory needed to manage temporal data.
    • Louis Davidson, Jessica M. Moss: Pro SQL Server 2012 Relational Database Design and Implementation – the best SQL Server focused data modeling book I know by two of my friends.
    • Dejan Sarka, et al.: MCITP Self-Paced Training Kit (Exam 70-441): Designing Database Solutions by Using Microsoft® SQL Server™ 2005 – SQL Server 2005 data modeling training kit. Most of the text is still valid for SQL Server 2008, 2008 R2, 2012 and 2014.
    • Itzik Ben-Gan, Lubor Kollar, Dejan Sarka, Steve Kass: Inside Microsoft SQL Server 2008 T-SQL Querying – Steve wrote a chapter with mathematical background, and I added a chapter with theoretical introduction to the relational model.
    • Itzik Ben-Gan, Dejan Sarka, Roger Wolter, Greg Low, Ed Katibah, Isaac Kunen: Inside Microsoft SQL Server 2008 T-SQL Programming – I added three chapters with theoretical introduction and practical solutions for the user-defined data types, dynamic schema and temporal data.
    • Dejan Sarka, Matija Lah, Grega Jerkič: Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft SQL Server 2012 – my first two chapters are about data warehouse design and implementation.
  • Courses
    • Data Modeling Essentials – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses dsarka@solidq.com or dsarka@siol.net. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well.
    • Logical and Physical Modeling for Analytical Applications – online course I wrote for Pluralsight.
    • Working with Temporal data in SQL Server – my latest Pluralsight course, where besides theory and implementation I introduce many original ways how to optimize temporal queries.
  • Forthcoming presentations
    • SQL Bits 12, July 17th – 19th, Telford, UK – I have a full-day pre-conference seminar Advanced Data Modeling Topics there.

Working with Temporal Data in SQL Server

My third Pluralsight course, Working with Temporal Data in SQL Server, is published. I am really proud on the second part of the course, where I discuss optimization of temporal queries. This was a nearly impossible task for decades. First solutions appeared only lately. I present all together six solutions (and one more that is not a solution), and I invented four of them. http://pluralsight.com/training/Courses/TableOfContents/working-with-temporal-data-sql-server

Truncate Table – DDL or DML Statement?

Many times, categories of concepts and things overlap. It can be hard to categorize some items in a single category. The SQL TRUNCATE TABLE statement is an example of an item that is not so easy to categorize. Is it a DDL (Data Definition Language) or DML (Data Manipulation Language) statement?

There is an ongoing discussion about this topic. However, if you quickly bingle for this question, you get the impression that the majority is somehow leaning more toward defining the TRUNCATE TABLE statement as a DDL statement. For example, Wikipedia clearly states: “In SQL, the TRUNCATE TABLE statement is a Data Definition Language (DDL) operation that marks the extents of a table for deallocation (empty for reuse).” Disclaimer: please note that I do not find Wikipedia as the “ultimate, trustworthy source” – I prefer sources that are signed!

Some of the reasons why many people define the statement as a DDL statement include:

  • It requests schema locks in some systems
  • It is not possible to rollback it in some systems
  • It does not include a WHERE clause
  • It does not fire triggers in some systems
  • It resets the autonumbering column value in some systems
  • It deallocates system pages directly, not through an internal table operation
  • and more.

On the other hand, it looks like there is only one reason to treat the statement as a DML statement:

  • Logically, you just get rid of the data, like with the DELETE statement.

Even the Wikipedia article that I referred to says “The TRUNCATE TABLE mytable statement is logically (though not physically) equivalent to the DELETE FROM mytable statement (without a WHERE clause).”

Like many times, I have to disagree with the majority. I understand that the categorization is somehow confusing, and might even be overlapping. However, the only reason for categorizing the TRUNCATE TABLE statement in the DML category is “THE” reason in my understanding. One of the most important ideas in the Relational Model is the separation between the logical and the physical level. We, users, or people, if you wish, are manipulating with data on the logical level; the physical implementation is left to the database management system. And this is the important part – logically, when you truncate table, you don’t care how this statement is implemented internally, you just want to get rid of the data. It really does not matter what kind of locks a system uses, does it allow WHERE clause or not, etc. The logical point is what matters. Therefore, I would categorize the TRUNCATE TABLE statement as a DML statement.

Of course, this is a purely theoretical question, and is really not important for your practical implementation. As long as your app is doing what it should do, you don’t care too much about these nuances. However, IMO in general there is not enough of theoretical knowledge spread around, and therefore it makes sense to try to get the correct understanding.

But there is always a “but”. Of course, I have immediately another question. What about the ALTER TABLE mytable SWITCH [PARTITION…] TO… statement? ALTER statements have been defined as DDL statements forever. however, again, logically you are just moving the data from one table to another. Therefore – what? What do you think?

Central Europe Trip with a Reason

Why wouldn’t you make a nice trip in Central Europe? Besides spending some nice days in beautiful cities, you have also a business-type excuse for the trip. You can join the two SQL Saturday events. The first one is going to take place in Budapest, March 1st, and the second one in Vienna, March 6th.

I am very happy to meet the SQL community again, and proud to speak at both events. Although I would prefer to have the dates turned around. The two sessions selected are Temporal Data in SQL Server, which I will preset in Vienna, and Optimizing Temporal Queries, which I will present in Budapest. The sessions are more or less independent; however, I see the Vienna session as a prequel to the Budapest one.

Come, learn and enjoy Central Europe!

Database Design training – for free?

When I started this blog, my plan was to focus on two main subject areas. One of them is SQL Server, T-SQL, and performance. The other is database design. Looking back over my post history, I honestly cannot say I delivered on the second area. Not because I have nothing to say about database design, but because I found it doesn’t lend itself for a blog. In the SQL Server / T-SQL area, there are subjects that can be isolated and described in a single post, or in a short series. In database design,…(read more)

SQL Set Operators. Set? Really?

UNION, INTERSECT and EXCEPT operators are commonly called Set Operators. For example, in Books Online you can find a topic “Set Operators”, where  these three operators are explained. They should represent set operations UNION, INTERSECT and MINUS (synonym for EXCEPT DISTINCT). Also Wikipedia has a topic called “SET OPERATIONS (SQL)”, where these three operators are introduced. And these operators are commonly represented by Venn diagrams. Logically, Venn diagrams are also called Set diagrams. Here are the three operators presented with Venn diagrams:

image

However, is the name “Set Operators” really correct? The first question I asked myself was very simple: why would we have 10 and more relational operator and three set operators in the relational algebra? Well, makes no sense. The relational algebra comprise relational operators only, of course.

So what exactly is a relation? A relation is a special kind of set, set of entities that are related, i.e that are of the same kind. How do we know that the two entities are of the same kind and can thus be grouped in a single entity set, i.e. in a relation? Of course, two entities are of the same kind if they have the same attributes. Therefore, every relation is a set; however, not every set is a relation.

Set operators work on sets and produce a set. Relational operators work on relations and produce a relation. SQL operators UNION, INTERSECT and EXCEPT produce relations, i.e. special kind of sets. Set operator UNION can combine a set of differential equations and a set of hammers into a single set. Relational operator UNION can’t combine a relation of differential equations and a relation of hammers into a single relation, because elements of these two relations have nothing in common. And don’t think that if you take only keys of both relation, and both have a single-column integer key, that a UNION of this would be a relation. First of all, you can do such an union because we don’t use strong types in a relational database (each key should be of its own type – in this case, you should have a “hammer” and a “differential equation” key types, which would disallow such operations). In addition, in the case I mentioned, you would get two a relations that has with a single attribute, the key only, which would probably be meaningless from the business perspective. A relation without a meaningful attribute is not really an entity set, as defined by Peter Chen. An entity is something we can identify and is of interest. If we don’t have any real attribute, then this “thing” (whatever) is definitely not an entity, because without attributes it can’t be of any interest.

To summarize: SQL UNION, INTERSECT and EXCEPT are simply relational operators. Talking about them as of set operators is at least imprecise. However, representing them with Venn diagrams is not just imprecise, it is wrong. Here is a better presentation of there three relational operators.

image

Logical and Physical Modeling for Analytical Applications

I am proud to announce that my first course for Pluralsight is released. The course title is Logical and Physical Modeling for Analytical Applications. Here is the description of the course.

A bad data model leads to an application that does not perform well. Therefore, when developing an application, you should create a good data model from the start. However, even the best logical model can’t help when the physical implementation is bad. It is also important to know how SQL Server stores and accesses data, and how to optimize the data access. Database optimization starts by splitting transactional and analytical applications. In this course, you learn how to support analytical applications with logical design, get understanding of the problems with data access for queries that deal with large amounts of data, and learn about SQL Server optimizations that help solving these problems.

Enjoy the course!

Jetzt geht’s los – speaking in Germany!

It feels just like yesterday that I went to Munich for the very first German edition of SQL Saturday – and it was a great event. An agenda that was packed with three tracks of great sessions, and lots of fun with the organization, attendees, and other speakers. That was such a great time that I didn’t have to hesitate long before deciding that I wanted to repeat this event this year. Especially when I heard that it will be held in Rheinland, on July 13 – that is a distance I can travel by car! The…(read more)