Category Archives: In-Memory

Database Backup with Memory-Optimized Tables

The memory-optimized tables are backed up as part of regular database backups so you don’t need to do anything special to manage backups on databases with memory-optimized tables. Like for disk-based tables, the CHECKSUM of data/delta files is validated as part of database backup to proactively detect any storage corruption. However, as described in the blog storage allocation and management, the storage used by memory-optimized tables can be much larger than its footprint in the memory.  A full backup of a database with one or more memory-optimized tables consist of the allocated storage for disk-based tables, active transaction log and the data/delta file pairs (i.e. checkpoint file pairs) for memory-optimized tables. This blog focuses on the size of database backups that you can expect in database with memory-optimized tables.  

Full Database Backup

For the discussion here, we will focus on the database backups for databases with just durable memory-optimized tables because the backup part for the disk-based tables is the same irrespective of the existence of memory-optimized tables. The data/delta file pairs, also referred to as Checkpoint File Pairs or CFPs residing in the filegroup could be in various states at a given time. Please refer to merge-operation-in-memory-optimized-tables for details. The table below describes what part of the files is backed up both in CTP2 and in RTM.  

CFP State

Backup in CTP2

Backup in RTM

PRECREATED

File metadata only

File metadata only

UNDER CONSTRUCTION

File metadata + allocated bytes

File metadata only

ACTIVE

File metadata + allocated bytes

File metadata + used bytes

MERGE SOURCE

File metadata + allocated bytes

File metadata + used bytes

MERGE TARGET

File metadata + allocated bytes

File metadata only

REQUIRED FOR BACKUP/HA

File metadata + allocated bytes

File metadata + used bytes

IN TRANSITION TO TOMBSTONE

File metadata + allocated bytes

File metadata only

TOMBSTONE

File metadata + allocated bytes

File metadata only

Table - 1: CFP and Database Backup

You will find that the size of database backup in SQL Server 2014 RTMis relatively smaller than what you had in CTP2.

Let us walk through a few examples to show the size of the backups. All these examples are based on the following database and the table schema using pre-RTM bits. The state of checkpoint file pairs (i.e. CFPs) in the example here please refer to the blog state-transition-of-checkpoint-files-in-databases-with-memory-optimized-tables.

CREATE DATABASE imoltp

GO

ALTER DATABASE imoltp ADD FILEGROUP imoltp_mod CONTAINS MEMORY_OPTIMIZED_DATA

ALTER DATABASE imoltp ADD FILE (name='imoltp_mod', filename='c:\data\imoltp_mod') TO FILEGROUP imoltp_mod

GO

use imoltp

go

-- create the table with each row around 8K

CREATE TABLE dbo.t_memopt (

       c1 int NOT NULL,

       c2 char(40) NOT NULL,

       c3 char(8000) NOT NULL,

       CONSTRAINT [pk_t_memopt_c1] PRIMARY KEY NONCLUSTERED HASH (c1)

WITH (BUCKET_COUNT = 100000)

) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)

Go

We will use the following query to look at the states of CFPs

select file_type_desc, state, state_desc, internal_storage_slot,file_size_in_bytes, file_size_used_in_bytes, inserted_row_count, deleted_row_count, lower_bound_tsn, upper_bound_tsn

from sys.dm_db_xtp_checkpoint_files

order by file_type_desc, upper_bound_tsn

Example-1: Backup a database with no rows in memory-optimized tables

  • Backup Command:

BACKUP DATABASE [imoltp] TO DISK = N'C:\data\imoltp-empty-data.bak'

WITH NOFORMAT, INIT, NAME = N'imoltp-Full Database Backup', SKIP, NOREWIND, NOUNLOAD,STATS = 10

  • Result: The size of the full database backup is around 5.3MB.  As you can see even though the total storage taken on the disk is (16MB*9) + (1MB*9) = 153MB, but the size of the backup is much smaller. Other thing to note is that even though there are no data rows inserted, still significant storage (i.e. 144 MB) was taken primarily because of the fixed storage overhead.

Example-2: Backup the database after loading 8000 rows

-- load 8000 rows. This should use 5 16MB files

declare @i int = 0

while (@i < 8000)

begin

       insert t_memopt values (@i, 'a', replicate ('b', 8000))

       set @i += 1;

end

Checkpoint files: Besides 8 CFPs in ‘PRECREATED’, there are now 5 CFPs ‘UNDER CONSTRUCTION’, each storing up to 1870 rows, to contain the 8000 data rows just inserted. Please refer to blog state-transition-of-checkpoint-files-in-databases-with-memory-optimized-tables.

  • Backup Command:

BACKUP DATABASE [imoltp] TO  DISK = N'C:\data\imoltp-full-data.bak'

WITH NOFORMAT, INIT, NAME = N'imoltp-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10

  • Output:

Processed 304 pages for database 'imoltp', file 'imoltp' on file 1.

Processed 0 pages for database 'imoltp', file 'imoltp_mod' on file 1.

10 percent processed.

20 percent processed.

Processed543 pages for database 'imoltp', file 'imoltp_log' on file 1.

100 percent processed.

BACKUP DATABASE successfully processed 8847 pages in 1.191 seconds (58.027 MB/sec).

  • Result: The size of the full database backup is around 73.5MB.  Note that the size of the transactional log is 70MB and there is no data actually backed up because the CFPs are either in ‘PRECREATED’ or ‘UNDER CONSTRUCTION’ as show in the Table-1. The data is only guaranteed to be in the data/delta files after CFP transitions into ‘ACTIVE’ state.

Example-3: Backup the database after taking an explicit checkpoint

-- do an manual checkpoint

checkpoint

  • Checkpoint files: Besides 8 CFPs in ‘PRECREATED’, there are now 5 CFPs with state ‘ACTIVE’. The manual checkpoint transitions the CFPs in ‘UNDER CONSTRUCTION’ state to ‘ACTIVE’.
  • Backup Command:

-- the backup will include full log and the data. So the size is double - 138MB

BACKUP DATABASE [imoltp] TO DISK = N'C:\data\imoltp-full-data-chkpt.bak'

WITH NOFORMAT, INIT, NAME = N'imoltp-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10

GO

  • Output:

Processed 304 pages for database 'imoltp', file 'imoltp' on file 1.

10 percent processed.

20 percent processed.

Processed 8064 pages for database 'imoltp', file 'imoltp_mod' on file 1.

30 percent processed.

40 percent processed.

Processed 8548 pages for database 'imoltp', file 'imoltp_log' on file 1.

100 percent processed.

BACKUP DATABASE successfully processed 16916 pages in 1.872 seconds (70.594 MB/sec).

  • Result: The size of the full database backup is around 138MB. It consists of 70MB of transaction log and also 66MB (8064 8k pages) of contents from data/delta files.  

Example-4: Backup the database after deleting 50% rows and merging the CFPs

-- now delete 50% rows

declare @i int = 0

while (@i <= 8000)

begin

       delete t_memopt where c1 = @i

       set @i += 2;

end

checkpoint

-- Do the manual merge. It generates merge-target and other files stay as regular files

-- The transaction range here is picked up by querying the DMV

-- sys.dm_db_xtp_checkpoint_files. Please find the appropriate range for your test.

--

exec sys.sp_xtp_merge_checkpoint_files 'imoltp',  1877, 12004

go

-- This installs the merge leading to 5 CFPs as MERGE SOURCE and the merge target

-- transitions to ACTIVE state

Checkpoint

go

BACKUP DATABASE [imoltp]

TO DISK = N'C:\data\imoltp-full-data-chkpt-del50-merge-chkpt.bak'

WITH NOFORMAT, INIT, NAME = N'imoltp-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10

GO

  • Checkpoint files: Besides 8 CFPs in ‘PRECREATED’, there are now 5 CFPs with state ‘MERGE SOURCE’ and 2 CFPs as ‘ACTIVE’.
  • Backup Command:

-- the backup will include full log and the data.

BACKUP DATABASE [imoltp] TO DISK = N'C:\data\imoltp-full-data-chkpt.bak'

WITH NOFORMAT, INIT, NAME = N'imoltp-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10

GO

  • Output:

Processed 304 pages for database 'imoltp', file 'imoltp' on file 1.

10 percent processed.

20 percent processed.

Processed 12143 pages for database 'imoltp', file 'imoltp_mod' on file 1.

30 percent processed.

40 percent processed.

Processed 8815 pages for database 'imoltp', file 'imoltp_log' on file 1.

100 percent processed.

BACKUP DATABASE successfully processed 21262 pages in 1.794 seconds (92.588 MB/sec).

  • Result: The size of the full database backup is around 172 MB. It consists of 72MB of transaction log and also 100MB of data/delta files. The reason the size has grown around 50% even though we have deleted 4000 rows is because ‘MERGE SOURCE’ CFPs still have all 8000 rows.

Example-5: Backup after MERGE SOURCE CFPs transition to TOMBSTONE state

checkpoint

go

-- Do the log backup. This log backup is around 71MB

BACKUP LOG [imoltp] TO DISK = N'C:\data\imoltp-log-1.bak'

WITH NOFORMAT, INIT, NAME = N'imoltp-log Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10

GO

checkpoint

go

-- Do one more log backup. This backup reduces active transaction log size

-- to 7MB

BACKUP LOG [imoltp] TO DISK = N'C:\data\imoltp-log-2.bak'

WITH NOFORMAT, INIT, NAME = N'imoltp-log Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10

GO

checkpoint

go

-- do one more log backup

BACKUP LOG [imoltp] TO DISK = N'C:\data\imoltp-log-3.bak'

WITH NOFORMAT, INIT, NAME = N'imoltp-log Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10

GO

  • Checkpoint files: Besides 8 CFPs in ‘PRECREATED’, there are now 6 CFPs ‘IN TRANSITION TO TOMBSTONE’ and there 5 CFPs ‘ACTIVE’.[JdB1][SA2]  Only one ACTIVE CFP contains 4000 data rows other ACTIVE CFPs were created due to manual checkpoint. In a production environment, the checkpoints will occur automatically and log backups will be taken as part of regular operational process.
  • Backup Command:

Since the CFPs are converted to either TOMBSTONE or are in transition to it, the size of database backup is now 38MB (only 1 copy of data as transaction log has been backed up and freed)

BACKUP DATABASE [imoltp]

TO DISK = N'C:\data\imoltp-full-data-chkpt-del50-merge-chkpt-logbkup3.bak'

WITH NOFORMAT, INIT, NAME = N'imoltp-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10

GO

  • Output:

Processed 288 pages for database 'imoltp', file 'imoltp' on file 1.

10 percent processed.

Processed 4128 pages for database 'imoltp', file 'imoltp_mod' on file 1.

Processed 23 pages for database 'imoltp', file 'imoltp_log' on file 1.

100 percent processed.

BACKUP DATABASE successfully processed 4439 pages in 0.394 seconds (88.012 MB/sec).

  • Result: The size of the full database backup is around 38 MB. This is because we have around 32MB of data in ACTIVE CFP. There are no CFPs in ‘MERGE SOURCE’ or ‘REQUIRED FOR BACKUP/HA’ states.

Summary

The size of backup of databases with one or more memory-optimized tables is typically bigger than the in-memory size of memory-optimized tables but smaller than the on-disk storage. The extra size will depend upon number of Checkpoint File Pairs (i.e. CFPs) in the states ‘MERGE SOURCE’ and ‘REQUIRED FOR BACKUP/HA’ which indirectly depends upon the workload.

In-Memory OLTP Index Troubleshooting, Part II

With the In-Memory OLTP feature in SQL Server 2014 we introduce two new types of indexes for memory-optimized tables: the memory-optimized NONCLUSTERED indexes, and the memory-optimized HASH indexes.

Memory-optimized NONCLUSTERED indexes behave similar to traditional NONCLUSTERED indexes from an application point-of-view. The main difference being that memory-optimized indexes are always covering (i.e. all columns are virtually included), while with traditional disk-based NONCLUSTERED indexes you need to specify which column you want to include alongside the index key columns.

Memory-optimized HASH indexes behave different from NONCLUSTERED indexes. They are optimized for point-lookup operations, and do not support ordered scans or inequality seek operations. In addition, you need to specify a BUCKET_COUNT when creating the index, and you need to pick the right value in order for the index to perform optimally. In earlier posts we covered the following three index troubleshooting aspects:

  • Bucket_count troubleshooting for HASH indexes: how to find out whether a bucket_count is too low, and how to determine what should be the correct bucket_count. You can also find more information about determining the right bucket_count in Books Online.
  • Searching on a subset of the index key: HASH indexes do not support searching on the leading columns of an index key.
  • Searching on inequality predicates and ordered scans: both operations are not supported with HASH indexes, but they are supported with NONCLUSTERED indexes. But note that NONCLUSTERED indexes support ordered scans only in the direction indicated with the index key: for example, if the index key is (c1 DESC), the index supports retrieving the values for c1 in descending order, but not in ascending order.

All that said, it seems that HASH indexes have quite a few limitations, compared with NONCLUSTERED indexes. For that reason, it is usually a safer bet to start with NONCLUSTERED indexes, both for new applications and when migrating existing disk-based tables to memory-optimized. You can then use HASH indexes to further optimize the workload. Indeed, HASH indexes are the most efficient index for equality search (point lookup) and full table scan operations.

In the remainder of this post we:

  • Describe a way to troubleshoot seek vs. scans on memory-optimized indexes using the new DMV sys.dm_db_xtp_index_stats: if there are more full index scans than expected, one of the above-mentioned issues may be the case: trying to search on a subset of a HASH index key or trying to search on inequality predicates with a HASH index.
  • Troubleshoot duplicates: you may run into problems if the index keys of a memory-optimized index contain a lot of duplicate values, particularly when using a HASH index. We describe how to detect this situation and how to work around.

Troubleshooting seek vs. scan using XTP index DMVs

The DMV sys.dm_db_xtp_index_stats shows statistics for index operations performed by the in-memory storage engine. The index contains stats about the usage of the index since its creation in memory – note that memory-optimized indexes are always recreated on database restart. You can use the following query to retrieve key statistics about the usage of indexes on your table:

SELECT ix.index_id, ix.name, scans_started, rows_returned

FROM sys.dm_db_xtp_index_stats ixs JOIN sys.indexes ix ON

ix.object_id=ixs.object_id AND ix.index_id=ixs.index_id

WHERE ix.object_id=object_id('<table name>')

For troubleshooting indexes, the columns ‘scans_started’ and ‘rows_returned’ contain key information:

  • scans_started – this is the number of scan operations the in-memory storage engine has started. Note that from the storage engine point-of-view, all operations to locate a row or the location to insert a new row are scans: for example, a full index scan, a point lookup and, a row insert all require a single scan operation.
  • rows_returned – the cumulative number of rows returned by all scan operations in the storage engine. Note that this number reflects the rows returned by the storage engine, not the number of rows returned to the client. For example, the query plan may call for a filter condition or aggregation that reduces the number of rows before it is returned to the client.
    • Insert operations do not result in rows being returned. Update and delete operations consist of a scan, to locate the row to be updated, followed by the actual row update/delete.

If the number of rows_returned is significantly larger than the scans_started, this is an indication that, on average, index operations scan a large part of the index. If all index operations are expected to be point lookups, this could be an indication of one of the earlier-mentioned problems where the query calls for an operation to be supported by the index, thus causing a revert to full index scan, such as: search requires a subset of hash index key columns or search on inequality predicates with a hash index.

The scans_started being larger than rows_returned is an indication that the workload is insert-heavy, or that a lot of point lookups failed to locate a row.

Index keys with many duplicate values

Issue: Index keys with many duplicate values can cause performance problems. If each index key has 5 duplicates this is usually not a problem, but if the discrepancy between the number of unique index keys and the number of rows in the tables becomes very large – more than 10X – this can become problematic.

All rows with the same index key end up in the same duplicate chain. For hash indexes this can create a lot of overhead in case of hash collisions: if multiple index keys end up in the same bucket due to a hash collision, index scanners always need to scan the full duplicate chain for the first value before they can locate the first row corresponding to the second value. For nonclustered indexes this causes additional overhead for garbage collection.

Symptom: For hash indexes the performance of DML operations degrades and CPU utilization increases. In addition, there is an increase in CPU utilization during database startup, and a potential increase in recovery time. This becomes especially clear when inserting a large number of rows.

For nonclustered indexes garbage collection will start to consume more resources, resulting in an overall increased CPU utilization in the system. The problem does not affect DML operations directly,[1] but it does put more overall pressure on system resources.

How to troubleshoot: The average number of duplicate rows for a given index key can be obtained using T-SQL queries. First determine the row count of the table, then determine the number of unique index key values. Divide the row count by the number of unique index keys to obtain the average number of duplicates.

To determine the row count for the table use the following query:

select count(*) as 'Row count' from <tableName>

To determine the number of unique index key values use the following query:

select count(*) as 'Distinct index key values' from (select distinct <indexKeyColumns> from <tableName>) a

For hash indexes specifically, you can also troubleshoot using the hash index stats DMV. Use the following query:

SELECT hs.object_id, object_name(hs.object_id) AS 'object name', i.name as 'index name', hs.*

FROM sys.dm_db_xtp_hash_index_stats AS hs

JOIN sys.indexes AS i ON hs.object_id=i.object_id AND hs.index_id=i.index_id;

If the average chain length is high and the number of empty buckets is high, it is likely that there are many rows with duplicate index key values or there is a skew in the key values.

Workaround: First, evaluate if the index is truly needed. The index may have been added in the past to support queries that are no longer part of the workload.

For HASH indexes there are two ways to work around the issue of many duplicates:

  • In most cases you will want to use a NONCLUSTERED index instead, as NONCLUSTERED indexes generally perform better in case of duplicates. If you go for this option, consider uniquifying the index key, as indicated below.
  • Alternatively, you can over-size the index by using a very high bucket count; for example, 20 – 100 times the number of unique index key values. This will make hash collisions unlikely.

For NONCLUSTERED indexes with a lot of duplicates, consider adding additional columns to the index key. For example, you can add the primary key columns to the index key to make it unique, in other words to uniquify the index.

Adding columns to the index key does come with a performance penalty for DML operations. Therefore, it is important to test the new uniquified index under production load, and compare the performance with the old index. Consider the overall throughput of the system, query performance, and also the overall resource utilization, especially CPU.



[1] In SQL 2014 CTP2 user transactions could be affected, due to garbage collection performed at commit time. This issue will be resolved in SQL 2014 RTM: garbage collection no longer directly affects user transactions.

State Transition of Checkpoint Files in Databases with Memory-Optimized Tables

In the previous blog Merge Operations, we described that how a CFP transitions across various stages in its life cycle. In this blog, we will walk through an actual example to show the state transition. Note, some of the states shown for CFPs are only visible in RTM bits (not in CTP2) but internally the state transitions is the same. Basing this example using RTM bits adds lot more clarity in how storage is managed. Another note is that RTM bits allocate CFPs (16MB of data file and 1 MB of delta file) on machines with physical memory <= 16GB. This example was run on a machine with 8GB of RAM using pre-RTM bits.

CREATE DATABASE hkdb1 ON 

 PRIMARY (NAME = [hkdb1_hk_fs_data], FILENAME = 'C:\hekaton_test\data\hkdb1_data.mdf'),

 FILEGROUP [hkdb1_hk_fs_fg] CONTAINS MEMORY_OPTIMIZED_DATA

 (NAME = [hkdb1_hk_fs_dir],  FILENAME = 'C:\hekaton_test\data\hkdb1_hk_fs_dir')

 LOG ON (name = [hktest_log], Filename='C:\hekaton_test\data\hkdb1_log.ldf', size=100MB)

go

-- disable auto-merge so that we can show the merge

-- in a predictable way

dbcc traceon (9851, -1)

-- set the database to full recovery. This is a common practice for production scenario

alter database hkdb1 set recovery full

go

use hkdb1

go

-- create a memory-optimized table with each row of size > 8KB

CREATE TABLE dbo.t_memopt (

       c1 int NOT NULL,

       c2 char(40) NOT NULL,

       c3 char(8000) NOT NULL,

 

       CONSTRAINT [pk_t_memopt_c1] PRIMARY KEY NONCLUSTERED HASH (c1) WITH (BUCKET_COUNT = 100000)

) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)

go

-- do the database backup. We do this so that we can do transaction log backups

BACKUP DATABASE [hkdb1] TO  DISK = N'C:\hekaton_test\data\hk-empty-data.bak'

WITH NOFORMAT, INIT,  NAME = N'hkdb1-Full Database Backup', SKIP, NOREWIND, NOUNLOADSTATS = 10

GO

-- load 8000 rows. This should load 5 16MB data files on a machine with <= 16GB of storage

declare @i int = 0

while (@i < 8000)

begin

        insert t_memopt values (@i, 'a', replicate ('b', 8000))

        set @i += 1;

end

go

In this example, we will show the output of DMV sys.dm_db_xtp_checkpoint_files using the following query.

select file_type_desc, state_desc, internal_storage_slot,file_size_in_bytes, file_size_used_in_bytes,

inserted_row_count, deleted_row_count, lower_bound_tsn, upper_bound_tsn, last_backup_page_count, drop_table_deleted_row_count 

from sys.dm_db_xtp_checkpoint_files

order by container_id, file_type_desc, upper_bound_tsn

Here is the output after removing some columns and rows (just showing rows with DATA file).

  • There are 8 data files under ‘PRECREATED’ state. This demo was run on machine with 4 logical processors with 8GB memory. The size of the files is 16MB (supported in SQL Server 2014 RTM on a machine with <= 16GB memory)
  • There are 5 data files marked ‘UNDER CONSTRUCTION’ as checkpoint has not been taken but these files together store 8000 rows that we had inserted. If database is restarted, the data rows will loaded using transaction log and NOT from these files as they are not part of a durable checkpoint

Now, execute a manual checkpoint by executing the following command and see the state transition in the CFPs. You will note that UNDER CONSTRUCTION CFPs are now marked ACTIVE as they are part of a durable checkpoint that we have just completed. The manual checkpoint closed the data file with internal_storage_slot though it was only 25% full. This data file can potentially be merged in future as it qualifies under the merge policy.

checkpoint

go

Now, we will delete 50% rows

-- now delete 50% rows

declare @i int = 0

while (@i <= 8000)

begin

       delete t_memopt where c1 = @i

       set @i += 2;

end

go

checkpoint

go

Here is the output of the DMV for CFPs in ACTIVE state using the following query

select file_type_desc, state_desc, internal_storage_slot,file_size_in_bytes, file_size_used_in_bytes,

inserted_row_count, deleted_row_count, lower_bound_tsn, upper_bound_tsn, last_backup_page_count, drop_table_deleted_row_count 

from sys.dm_db_xtp_checkpoint_files

where state = 2

order by container_id, file_type_desc, upper_bound_tsn

Note that the delta files have reference to the deleted rows. There is a CFP with 0 data rows due to a manual checkpoint.

 For this example, we had disabled auto-merge so that we can see the state transitions with no surprises.

We will now force a manual merge (merging all the ACTIVE CFPs) by executing the following command

-- now do the manual merge

-- it generates merge-target and other files stay as regular files

exec sys.sp_xtp_merge_checkpoint_files 'hkdb1',  1877, 12004

go

The DMV shows that the merge was executed and there is new CFP marked as MERGE TARGET containing 4000 rows. Also, note the data file is around 33MB, much larger than 16MB. This happened because we forced the merge. The auto-merge would not have merged all ACTIVE CFPs because it won’t meet the merge policy.

At this time, the CFPs that are sources of MERGE are still marked as ACTIVE. This is because the merge has not been installed. You can query the state of merge request as follows

select request_state_desc, lower_bound_tsn, upper_bound_tsn

from sys.dm_db_xtp_merge_requests

go

request_state_desc   lower_bound_tsn      upper_bound_tsn

-------------------- -------------------- --------------------

PENDING              0                    12007

 

Now, we will force a checkpoint to install the merge.

checkpoint

go

Here is the output of the DMVs. It shows that the MERGE TARGET is now changed to ACTIVE state and all the CFPs that were used as source of the MERGE are now marked as MERGED SOURCE. Referring to the blog <reference>, these CFPs are now in category-3. 

The source CFPs will transition into category-4 automatically as automatic checkpoints and log backups are taken. For the example, here we will force the checkpoint and log backups as follows

checkpoint

go

-- do the log backup.

BACKUP LOG [hkdb1] TO  DISK = N'C:\hekaton_test\data\hk-log-1.bak'

WITH NOFORMAT, INIT,  NAME = N'hk-log Database Backup', SKIP, NOREWIND, NOUNLOADSTATS = 10

GO

Here is the output of the DMV showing that earlier CFPs have now transitioned to state REQUIRED FOR BACKUP/HA meaning that these CFPs can’t be garbage collected as they are needed for operational correctness of the database with memory-optimized table

 

After couple of manual checkpoint followed by log backup, the DMV output shows that CFPs are finally being handed off to the garbage collector for File Stream. At this stage, these CFPs are not needed for operational correctness for our database. To state differently, the full database backup of the database will NOT include these CFPs but they continue take storage space in the memory-optimized filegroup until they are garbage collected.

One more cycle of manual checkpoint followed by log backup, the CFPs are actually transitioned to TOMBSTONE state which indicates that these will get removed as part of File Stream Garbage Collection. Please refer to FS Garbage Collection for details

 

This concludes the state transition of CFPs. The main point to remember is that it takes few checkpoint/log-backups after a set of CFPs have been merged before the storage can be garbage collected. We expect these state transitions to occur automatically in production environment as checkpoints will occur automatically for each additional 512MB transaction log generation and the regular log backups that are taken as part of backup strategy. However, you will need to account for this extra storage as CFPs go through transition from MERGE SOURCE to finally getting garbage collected.

Merge Operation in Memory-Optimized Tables

In the Storage Allocation and Management for Memory-Optimized Tables, we had briefly touched upon the Merge operation. This blog describes Merge operation in detail. We also recommend you to refer to implementing durability for memory-optimized tables for a good understanding of data/delta files and CFPs

 The metadata of all Checkpoint File Pairs (i.e. CFP) that exist in storage is stored in an internal array structure referred to as Storage Array. It is a finitely sized (4096 entries in CTP2 and 8192 entries in RTM) array of CFPs to support a cumulative size of 256GB for durable memory-optimized tables in the database. The entries in the storage array ordered by transaction range. The CFPs in the storage array (along with the tail of the log) represent all the on-disk state required to recover a database with memory-optimized tables.

Running an OLTP workload overtime can lead to storage array to fill up leading to many inefficiencies as described here

  1. Deleted rows – The deleted rows are not actually removed from the data file but a reference to the deleted row is added to the corresponding delta file. This leads to wastage of storage which in turn impacts the recovery time negatively.
  2. Storage Array Manipulation Overhead - Internal processes search the storage array for operations such as finding the delta file to delete a row. The cost of these operations increases with the number of entries

To alleviate these inefficiencies, the older closed CFPs are merged based on a merge policy (described below) so the storage array is compacted to represent the same set of data, with reduced number of CFPs and the storage.

Merge Operation

The Merge operation takes one or more closed CFPs, called MERGE SOURCE(s), based on an internally defined merge policy, described later, and produces one resultant CFP, called MERGE TARGET. The entries in each delta file of source CFP(s) are used to filter rows from the corresponding data file to remove the data rows that are not needed anymore. The remaining rows in the source CFPs are consolidated into one target CFP.  After the merge is complete, the resultant CFP (i.e. the merge-target) replaces the source CFPs (i.e. the merge sources). The merge-source CFPs go through a transition phase before they are eventually removed from the storage.

In the example below, the memory-optimized table file group has four data and delta file pairs at timestamp 500 containing data from previous transactions. For example, the rows in the first data file correspond to transactions with timestamp > 100 and <=200 or alternatively represented as (100, 200].  The second and third data files are shown to be less than 50% full after accounting for the rows marked deleted. The merge operation combines these two CFPs and creates a new CFP containing transactions with timestamp > 200 and <=400, which is the combined range of these two CFPs.  You see another CFP with range (500, 600] and non-empty delta file for transaction range (200, 400] shows that Merge operation can be done concurrently with transactional activity including deleting more rows from the source CFPs.

 In-memory OLTP engine, a background thread evaluates all closed CFPs using a merge policy and then initiates one or more merge requests for the qualifying CFPs. These merge requests are processed by the offline checkpoint thread. The evaluation of merge policy is done periodically and also when a checkpoint is closed.

Merge Policy

SQL Server 2014 implements the following merge policy

  • A merge is scheduled if 2 or more consecutive CFPs can be consolidated, after accounting for deleted rows, such that the resultant rows can fit into 1 data of ideal size. The ideal size of data  is determined as follows
    • For machines < 16GB. The data file is 16MB and delta file is 1MB (not supported in CTP2)
    • For machines > 16GB, the data file is 128MB and delta file is 8MB

A maximum of 10 consecutive CFPs can be part of one merge operation.

  • A single CFP can be self-merged if the data file is larger than 256 MB and over half of the rows are marked deleted. A data file can grow larger than 128MB if, for example, a single transaction or multiple concurrent transactions insert/update large amount of data. This will result in the data file to grow beyond its ideal size because a transaction(s) cannot span multiple CFPs.   Note, Self-merge is not supported in CTP2.

Here are some examples that show the CFPs that will be merged under the merge policy.

Adjacent CFPs Source Files (% full)

Merge Selection

CFP0 (30%), CFP1 (50%), CFP2 (50%), CFP3 (90%)

(CFP0, CFP1). CFP2 is not chosen as it will make resultant data file > 100% of the ideal size

CFP0 (30%), CFP1 (20%), CFP2 (50%), CFP3 (10%)

(CFP0, CFP1, CFP2). Files are chosen starting from left. CTP3 is not combined as it will make resultant data file > 100% of the ideal size

CFP0 (80%), CFP1 (30%), CFP2 (10%), CFP3 (40%)

(CFP1, CFP2, CFP3). Files are chosen starting from left. CFP0 is skipped because if combined with CFP1, the resultant data file will be > 100% of the ideal size

 Not all CFPs with available space qualify for merge. For example, if two adjacent CFPs are 60% full, they will not qualify for merge which will result in 40% of wasted storage for these CFPs. In the worst case, all CFPs can be little over 50% full leading to storage utilization of only 50% approximately. It is important to note that deleted row(s) may exist in the storage, but they may have already been removed (i.e. garbage collected) from the memory. The management of storage and the memory is somewhat independent from garbage collection perspective. A thumb rule is that storage taken by ‘Active’ CFPs (note, it does not account for all CFPs) can be up to 2x larger than the size of durable tables in memory.

You can force merge using stored procedure sys.sp_xtp_merge_checkpoint_files (Transact-SQL) to override the merge policy. For details, please refer to http://msdn.microsoft.com/en-us/library/dn198330(v=sql.120).aspx

Life Cycle of a CFP

As mentioned earlier, once the CFPs are merged, they go through a state transition before they can be deallocated. At any given time, the CFPs data/delta file pairs can be in the following states

  • PRECREATED – A small set of CFPs are kept pre-allocated to minimize or eliminate any waits to allocate new files as transactions are being executed. These are full sized with data file size of 128MB and delta file size of 8 MB but contain no data. The number of CFPs is computed as the number of logical processors or schedulers with a minimum  of 8. This is a fixed storage overhead in databases with memory-optimized tables
  • UNDER CONSTRUCTION – Set of CFPs that store newly inserted and possibly deleted data rows since the last checkpoint.
  • ACTIVE - These contain the inserted/deleted rows from previous closed checkpoints. These CFPs contain all required inserted/deleted rows required before applying the active part of the transaction log at the database restart. We expect that size of these CFPs to be approximately 2x of the in-memory size of memory-optimized tables assuming merge operation is keeping up with the transactional workload.
  • MERGE TARGET – CFP stores the consolidated data rows from the CFP(s) that were identified by the merge policy. Once the merge is installed, the MERGE TARGET transitions into ACTIVE state
  • MERGED SOURCE – Once the merge operation is installed, the source CFPs are marked as MERGED SOURCE. Note, the merge policy evaluator may identify multiple merges a CFP can only participate in one merge operation.
  • REQUIRED FOR BACKUP/HA – Once the merge has been installed and the MERGE TARGET CFP is part of durable checkpoint, the merge source CFPs transition into this state. CFPs in this state are needed for operational correctness of the database with memory-optimized table.  For example, to recover from a durable checkpoint to go back in time.  A CFP can be marked for garbage collection once the log truncation point moves beyond its transaction range.
  • IN TRANSITION TO TOMBSTONE – These CFPs are not needed by in-memory OLTP engine can they can be garbage collected. This state indicates that these CFPs are waiting for the background thread to transition them to the next state TOMBSTONE
  • TOMBSTONE – These CFPs are waiting to be garbage collected by the filestream garbage collector.  Please refer to FS Garbage Collection for details

After accounting for the storage taken by CFPs in various states, the overall storage taken by durable memory-optimized tables can be much larger than 2x. The DMV sys.dm_db_xtp_checkpoint_files can be queried to list all the CFPs in memory-optimized filegroup including the phase they are in. Transitioning CFPs from MERGE_SOURCE state to TOMBSTONE and ultimately getting garbage collected can take up to 5 checkpoints and transaction log backup, if database is configured for full or bulk-logged recovery model, steps. You can, of course, manually force the checkpoint followed by log backup to expedite the garbage collection but then this will add 5 empty CFPs (i.e. 5 data/delta file pairs with data file of size 128MB each). In production scenarios, the automatic checkpoints and log backups taken as part of backup strategy will seamlessly transition CFPs through various states without requiring any manual intervention. The impact of the garbage collection process is that in-memory databases may have a disproportionate storage footprint compared to its size in memory. It is not uncommon to have the size storage occupied by memory-optimized tables to be 3x to their size in memory.

In the next blog, we will walk through an example showing transition of CFPs across all phases.

Storage Allocation and Management for Memory-Optimized Tables

As described in implementing durability for memory optimized tables, the storage requirements and its management for memory optimized tables is very different compared to the disk-based tables. If you are migrating a subset of your disk-based tables to memory-optimized tables or are just developing a new application with memory-optimized tables, you will notice these differences. Unlike disk-based tables in SQL Server that use 8k pages to store data in the same format both on disk and in memory, the  memory-optimized tables utilize a different size and  format both in-memory and on disk. This has been a point of concern by many customers as they have observed, sometimes, disproportionate storage consumed by memory optimized tables. The goal of this blog is to help alleviate this confusion by describing how storage is managed over the lifetime of data rows. 

The data for memory optimized tables is stored in one or more data/delta file pairs (also referred to as checkpoint file pairs or CFP) with data file(s) storing inserted rows and delta file(s) referencing deleted rows.  During the execution of an OLTP workload, as the DML operations update, insert, and delete rows, new data/delta files are created to persist the data. Also, existing delta files are updated to process delete of existing rows.  Over time, if the number of active rows, after accounting for deleted rows, in two or more consecutive CFPs falls below a threshold (usually < 50%) such that they can be merged into one CFP of 128 MB, they are merged automatically by a background merge process into a new CFP.  Once the merge operation is complete, the older CFPs go through a transition phase and are eventually removed (i.e. garbage collected) from the storage.  Note, SQL Server 2014 CTP2 supports up to 4096 CFPs within a database but this limit will be changed to 8192 CFPs in RTM

At any given time, the data/delta file pairs are in one of the following 4 categories

  1. Pre-allocated CFPs (data and delta): A small set of CFPs are kept pre-allocated to minimize or eliminate any waits to allocate new files as transactions are being executed. These are full sized with data file size of 128MB and delta file size of 8MB but contain no data. The number of CFPs is computed as the number of logical processors or schedulers with a minimum of 8. This is a fixed storage overhead in databases with memory-optimized tables
  2. Active CFPs: These contain the inserted/deleted rows for the last ‘durable’ checkpoint. These CFPs contain all inserted/deleted rows required before applying the active part of the transaction log at the database restart. We expect that the combined size of the ACTIVE CFPs to be at most 2x of the in-memory size of memory-optimized tables. However, there are cases, specifically with data files > 128MB which can happen due to large concurrent transactions or if the merge operation falls behind, when we may exceed this 2x limit. To simplify the discussion here, let us just assume all data files are 128MB and that the merge/checkpoint operations are keeping up.
  3. CFPs required for Operational Correctness: These represents files that were a source of MERGE operations where rows from these files were moved to a new data/delta files. These files (i.e. CFPs) are not needed for database restart.  However, these CFPs can’t be de-allocated or marked for deallocation (i.e.  Garbage collection) because they are needed if SQL Server needs to recover from a previous durable checkpoint.  A CFP can be marked for garbage collection once the log truncation point moves beyond its transaction range. This requires a combination of checkpoint operation and transaction log backups. Transaction log backups are not needed if the database has been configured in simple recovery model.
  4. CFPs that can be removed from storage: These are garbage collected similar to how file-stream files are garbage collected.  Please refer to FS Garbage Collection for details

Transitioning CFPs out of category-3 and category-4 can take up to 5 checkpoints and transaction log backup steps, if not running in simple recovery mode. For in-memory databases, the automatic checkpoint is taken when the size of transaction log exceeds 512MB since the last checkpoint.  You can, of course, manually force the checkpoint followed by log backup to expedite the garbage collection but then this will add 5 empty CFPs (i.e. 5 data/delta file pairs with data file of size 128MB each). Typically, in test environment, you may have to force transaction log backups and checkpoints to remove the CFPs that are not needed. In production scenarios, however, we don’t expect customers to do manual checkpoint but to rely on the automatic checkpoints and log backups that are taken as part of backup strategy. The impact of this garbage collection process is that in-memory databases may have a disproportionate storage footprint compared to its size in memory.

To clarify the point further, let us walk through some examples of in-memory databases under various scenarios and compare their size in-memory and on storage.  For these examples, we will assume we are running a SQL instance with 8 logical processor with 1 memory optimized table with following schema. Note, this has a row size of approximately 8KB. 

CREATE TABLE dbo.t_memopt (

       c1 int NOT NULL,

       c2 char(40) NOT NULL,

       c3 char(8000) NOT NULL,

 

       CONSTRAINT [pk_t_memopt_c1] PRIMARY KEY NONCLUSTERED HASH (c1) WITH

       (BUCKET_COUNT = 100000)

) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)

As indicated earlier, the fixed storage overhead for this configuration will be 1 GB (i.e. 8 data files of 128MB each).

  • Example-1:
    • DML Operation: Insert 1 row
    • In-memory Size - you can get the size by querying sys.dm_db_xtp_table_memory_stats. The data row is taking 8KB while hash index is taking 1MB for allocating 100K buckets

Name       memory_used_by_indexes_kb memory_used_by_table_kb

---------- ------------------------- -----------------------

t1         1024                      8

    • Storage for memory-optimized table – The storage is more interesting. On the machine with 8 logical processors, there are 8 data and 8 delta files pre-allocated and there is 1 data/delta file pair used to store the currently active rows. So overall storage for memory optimized FG is (9*128MB) = 1.12 GB which is disproportionately larger than the size of in-memory table. This is expected because my configuration has a fixed overhead of 1 GB plus the granularity of storage allocation is 128MB. This is different than disk-based tables where unit of storage allocation is an 8K page for tables < 64k in size. This additional size of storage for memory-optimized tables may surprise you if you are just playing with small data sets. However, we don’t expect this to be a cause of concern as typical customer workload will have much larger dataset (e.g. 10s or 100s of GB) stored in memory-optimized tables.
  • Example-2
    • Operation: Insert 1-million rows
    • In-memory Size – 7.8GB
    • Storage for memory-optimized table – 8.9GB which is pretty reasonable compared to previous example. 1 million rows should take approx. 8GB of storage plus 1GB of fixed overhead which is now around 10%.
    • Storage for disk-based table – it is around 8GB so it is of the same order as memory optimized tables.
  • Example-3
    • Operation: update non-key values for the first 500K rows of the table loaded in the previous example.
    • In-memory Size (used) – 8 GB. When I measured this, most rows had been garbage collected
    • Storage for memory-optimized table – 13 GB. You may wonder why the size of storage has gone up for update operation.  An update operation on memory-optimized table is done as delete followed by an insert.  For the deleted rows, the delta file(s) are updated to reflect them and then the new row versions are inserted into another data file. So at this time, both old and new version of the data row are persisted and that is why you see that the storage consumption has gone up approximately 50%. The data/delta files will get merged eventually and get garbage collected as described earlier.
    • Storage for disk-based tables – it stays around 8GB. This is expected because the update of the non-key columns is done in-place for disk-based tables.
  • Example-4
    • Workload/Operations – Update only OLTP workload running against a database of size 150GB for over 24 hours to measure storage and memory in the steady state
    • In-memory size (used) – 167 GB. This shows that the stale row versions are getting garbage collected. Please refer to Memory Management for Memory-Optimized Tables for details.
    • Storage for memory-optimized table –320 GB. Given that the size of the database is 150GB, the storage in steady state is around 2.2 times.

Though these examples were over simplified but they do illustrate the difference in storage provisioning for memory optimized tables over disk-based tables. Your storage requirement will depend upon following key factors such

    • size of your durable memory optimized tables
    • DML operations on the durable tables. For example, if your workload is update heavy, it leads to MERGEs thereby inflating the storage requirement until these files get garbage collected
    • Frequency of checkpoint and log backups.

Boosting Transaction Performance in Windows Azure Virtual Machines with In-Memory OLTP

With the release of SQL Server 2014 CTP2, you can now significantly boost the performance of your OLTP workloads in Windows Azure Virtual Machines. By creating a new VM with our preloaded image of SQL Server 2014 CTP2 on Windows Server 2012 R2, or installing SQL Server 2014 CTP2 on your VM, In-Memory OLTP functionalities are immediately available to you. This blog post provides a good guide on how to create a Windows Azure VM.

However, since the transition to In-Memory OLTP is not as simple as flipping a switch, you must carefully evaluate your application scenario and see if it is the right solution for you.

Recommended Scenarios

For SQL Server 2014 CTP 2, we recommend the following scenarios for SQL Server In-Memory OLTP on a Windows Azure Virtual Machine:

  • Development and Test scenarios, e.g. familiarizing with In-Memory OLTP’s capabilities. The on-demand provisioning of Windows Azure VMs and its low cost make it easy to gain full access to In-Memory OLTP functionalities without a large capital investment. Once you are comfortable with its capabilities and understand its limitations, you can move to deploy In-Memory OLTP on your local server or keep using the VM if it suits your business needs.
  • Scenarios with relaxed data persistence requirements, e.g. web browser session state. SQL Server In-Memory OLTP provides non-durable tables that are perfect for transient data regardless of the transaction read/write mix. By completely eliminating I/O operations, non-durable tables could provide amazing boosts to performance for your non-persistent workload. If some persistence is still required but strict durability is not, you can leverage the new Delayed Durability features we have added to CTP2. Delayed Durability commits transactions but do not immediately write the log records to disk, lightening log I/O pressure by allowing larger and less frequent log writes to disk. For more details see the Books Online topic here.
  • Read-mostly and read-only scenarios, e.g. an online product catalogue. In-Memory OLTP provides extreme performance and parallelism in read-mostly and read-only scenarios due to new algorithms and data structures optimized for memory-resident data. Furthermore, native compilation for stored procedures can dramatically increase CPU efficiency and throughput.
  • Durable read-write scenarios not under log I/O pressure, e.g. workloads with heavy contention. In-Memory OLTP can also provide benefits for workloads with full durability requirements that is not under pressure from log I/O latency or throughput. Because In-Memory OLTP eliminates page contention by using an optimistic concurrency system, it could provide significant boosts to parallelism for your workload if it is suffering from contention problems. In addition, native compilation can improve the speed of a stored procedure with heavy business logic processing.

However, if your workload is suffering from long log I/O latency or if it is under pressure from log I/O throughput, and at the same time you require strict durability of your data, In-Memory OLTP on Windows Azure VM will not alleviate these problems.

SQL Server 2014 CTP2 on Windows Azure VM is not suitable for testing the performance of applications deployed in machines with similar configuration on premise.

Selecting the Right Windows Azure VM Size

The selection of VM sizes is important for any workload running in the IaaS space, whether you are provisioning a new VM using our preloaded CTP2 image or adding CTP2 onto an existing instance. Windows Azure provides a selection of virtual machine sizes for a variety of purposes, as listed in this article.

Since In-Memory OLTP is designed to serve extreme high session concurrency and the nature of memory-resident data requires sufficient memory space, we recommend the following Windows Azure VM sizes for adopting In-Memory OLTP:

Compute Instance Name

Virtual Cores

RAM

Extra Large (A4)

8

14 GB

A6

4

28 GB

A7

8

56 GB

The exact instance size that you will choose will depend on the scenario you wish to run and the size of data you wish to make memory-resident in SQL Server In-Memory OLTP. We recommend provisioning 100% more memory than the estimated size of data in memory and leave sufficient space for the buffer pool used by disk-based workloads on the same machine. This article on SQL Server Books Online has more information on how to estimate the size of a table in memory, and we have published a blog post on hardware considerations for In-Memory OLTP, some of which apply to provisioning VMs as well.

Virtual Machine Configuration

To configure a Windows Azure Virtual Machine for best performance in In-Memory OLTP, we suggest that you should follow the best practices outlined in this whitepaper. A summary of key considerations, plus some unique attributes for In-Memory OLTP, are listed below:

  • Use a single storage account for all disks attached to the same VM.
  • Do not turn on disk host caching for the data and log drives.
    When creating new drives for a Windows Azure VM, the disk host caching option is turned off by default. We do not recommend you change this option.
  • Do not use the Temporary Disk (D: Drive) for log storage.
    This will sacrifice durability for your database, as the content on the Temporary Disk is transient and may be lost on a VM failure or reboot. Please use a persistent Windows Azure Storage disk for your log storage needs.
  • Do not use native operating system striping for storing the log of an In-Memory OLTP-enabled database.
    Logging memory-optimized tables is latency-sensitive, and striping of drives increase the latency of I/O operations. If you wish to use any form of striping for your log drive, you should test your configuration to ensure that the I/O latency is sufficiently low.
  • Compression is not supported for In-Memory OLTP tables. Compression will still work with regular tables in an In-Memory OLTP-enabled database.

Boosting Transaction Performance in Windows Azure Virtual Machines with In-Memory OLTP

With the release of SQL Server 2014 CTP2, you can now significantly boost the performance of your OLTP workloads in Windows Azure Virtual Machines. By creating a new VM with our preloaded image of SQL Server 2014 CTP2 on Windows Server 2012 R2, or installing SQL Server 2014 CTP2 on your VM, In-Memory OLTP functionalities are immediately available to you. This blog post provides a good guide on how to create a Windows Azure VM.

However, since the transition to In-Memory OLTP is not as simple as flipping a switch, you must carefully evaluate your application scenario and see if it is the right solution for you.

Recommended Scenarios

For SQL Server 2014 CTP 2, we recommend the following scenarios for SQL Server In-Memory OLTP on a Windows Azure Virtual Machine:

  • Development and Test scenarios, e.g. familiarizing with In-Memory OLTP’s capabilities. The on-demand provisioning of Windows Azure VMs and its low cost make it easy to gain full access to In-Memory OLTP functionalities without a large capital investment. Once you are comfortable with its capabilities and understand its limitations, you can move to deploy In-Memory OLTP on your local server or keep using the VM if it suits your business needs.
  • Scenarios with relaxed data persistence requirements, e.g. web browser session state. SQL Server In-Memory OLTP provides non-durable tables that are perfect for transient data regardless of the transaction read/write mix. By completely eliminating I/O operations, non-durable tables could provide amazing boosts to performance for your non-persistent workload. If some persistence is still required but strict durability is not, you can leverage the new Delayed Durability features we have added to CTP2. Delayed Durability commits transactions but do not immediately write the log records to disk, lightening log I/O pressure by allowing larger and less frequent log writes to disk. For more details see the Books Online topic here.
  • Read-mostly and read-only scenarios, e.g. an online product catalogue. In-Memory OLTP provides extreme performance and parallelism in read-mostly and read-only scenarios due to new algorithms and data structures optimized for memory-resident data. Furthermore, native compilation for stored procedures can dramatically increase CPU efficiency and throughput.
  • Durable read-write scenarios not under log I/O pressure, e.g. workloads with heavy contention. In-Memory OLTP can also provide benefits for workloads with full durability requirements that is not under pressure from log I/O latency or throughput. Because In-Memory OLTP eliminates page contention by using an optimistic concurrency system, it could provide significant boosts to parallelism for your workload if it is suffering from contention problems. In addition, native compilation can improve the speed of a stored procedure with heavy business logic processing.

However, if your workload is suffering from long log I/O latency or if it is under pressure from log I/O throughput, and at the same time you require strict durability of your data, In-Memory OLTP on Windows Azure VM will not alleviate these problems.

SQL Server 2014 CTP2 on Windows Azure VM is not suitable for testing the performance of applications deployed in machines with similar configuration on premise.

Selecting the Right Windows Azure VM Size

The selection of VM sizes is important for any workload running in the IaaS space, whether you are provisioning a new VM using our preloaded CTP2 image or adding CTP2 onto an existing instance. Windows Azure provides a selection of virtual machine sizes for a variety of purposes, as listed in this article.

Since In-Memory OLTP is designed to serve extreme high session concurrency and the nature of memory-resident data requires sufficient memory space, we recommend the following Windows Azure VM sizes for adopting In-Memory OLTP:

Compute Instance Name

Virtual Cores

RAM

Extra Large (A4)

8

14 GB

A6

4

28 GB

A7

8

56 GB

The exact instance size that you will choose will depend on the scenario you wish to run and the size of data you wish to make memory-resident in SQL Server In-Memory OLTP. We recommend provisioning 100% more memory than the estimated size of data in memory and leave sufficient space for the buffer pool used by disk-based workloads on the same machine. This article on SQL Server Books Online has more information on how to estimate the size of a table in memory, and we have published a blog post on hardware considerations for In-Memory OLTP, some of which apply to provisioning VMs as well.

Virtual Machine Configuration

To configure a Windows Azure Virtual Machine for best performance in In-Memory OLTP, we suggest that you should follow the best practices outlined in this whitepaper. A summary of key considerations, plus some unique attributes for In-Memory OLTP, are listed below:

  • Use a single storage account for all disks attached to the same VM.
  • Do not turn on disk host caching for the data and log drives.
    When creating new drives for a Windows Azure VM, the disk host caching option is turned off by default. We do not recommend you change this option.
  • Do not use the Temporary Disk (D: Drive) for log storage.
    This will sacrifice durability for your database, as the content on the Temporary Disk is transient and may be lost on a VM failure or reboot. Please use a persistent Windows Azure Storage disk for your log storage needs.
  • Do not use native operating system striping for storing the log of an In-Memory OLTP-enabled database.
    Logging memory-optimized tables is latency-sensitive, and striping of drives increase the latency of I/O operations. If you wish to use any form of striping for your log drive, you should test your configuration to ensure that the I/O latency is sufficiently low.
  • Compression is not supported for In-Memory OLTP tables. Compression will still work with regular tables in an In-Memory OLTP-enabled database.

SQL Server 2014 In-Memory OLTP: Memory Management for Memory-Optimized Tables

Memory-optimized tables must fully reside in memory and can’t be paged out. Unlike disk-based tables where insufficient memory can slowdown an application, the impact to  memory-optimized tables upon encountering out-of-memory can be severe, causing  DML (i.e. delete, insert or update) operations to fail. While this adds a new dimension to managing memory, the application failure due to resource errors is not something new. For example, applications using disk-based tables can fail with resource errors such as running out of transaction log or TempDB or out of storage. It is the responsibility of DBAs/Administrators to make sure resources are provisioned and managed appropriately to avoid such failures. SQL Server provides a rich set of monitoring tools, including DMVs, PerfMon and XEvents to help administrators identify problems earlier so that a corrective action can be taken. Similarly, for memory-optimized tables, SQL Server provides a rich set of monitoring capabilities and configuration options so that you can manage your database/instance well and keep your application running smoothly.  The remainder of this blog walks thru each of the challenges and details how it can be addressed.

How do I estimate the memory needed?

This is the first question that you need consider when migrating an existing table(s) to memory-optimized table(s) or when you are considering a new application using memory-optimized tables. When migrating a disk-based table, you know, for example using sp_spaceused as described in http://technet.microsoft.com/en-us/library/ms188776.aspx ,  its current size so it is just a simple mathematical calculation to find the corresponding size for memory-optimized tables.  The key differences to be aware of are that memory-optimized tables cannot compressed like disk-based tables with ROW and PAGE compression, so, the memory-optimized table will likely be bigger. However, unlike indexes for disk-based tables, the indexes on memory tables are much smaller. For example, the index key is not stored with hash indexes and all indexes are, by definition, covered indexes. Please refer to http://msdn.microsoft.com/en-us/library/dn247639(v=sql.120).aspx for details. A more challenging task is to estimate the data growth.  While you can make a reasonable guess, the best way is the continuous to monitor the table size and the memory consumed by memory-optimized table (s) in your database and instance.  The same monitoring approach holds for new applications that are created with in-memory OLTP in mind.

How does SQL Server manage memory for memory-optimized tables?

The in-memory OLTP engine is integrated with SQL Server. Memory allocations to memory-optimized tables are managed by SQL Server Memory Manager and the allocated memory is tracked using familiar constructs and tools such as memory clerks and DMVs. The following DMV shows XTP memory clerks. The first row shows the memory allocated by system threads. The second row with name DB_ID_5 represents the consumers in the database objects and the third row with memory node-id 64 represents memory allocated to DAC (Dedicated Admin Connection).

-- Show the memory allocated to in-mempory OLTP objects

select type, name, memory_node_id, pages_kb/1024 as pages_MB

from sys.dm_os_memory_clerks where type like '%xtp%'

type                 name       memory_node_id pages_MB

-------------------- ---------- -------------- --------------------

MEMORYCLERK_XTP      Default    0              18

MEMORYCLERK_XTP      DB_ID_5    0              1358

MEMORYCLERK_XTP      Default    64             0

Also, there are new DMVs that can be used to monitor the memory consumed by the in-memory OLTP engine and memory-optimized tables. Please refer to http://msdn.microsoft.com/en-us/library/dn133203(v=sql.120).aspx for details.

Like any other memory consumer, the in-memory OLTP engine responds to memory-pressure, but to a limited degree. For example, the memory consumed by data and indexes can’t be released even under memory pressure. This is different than disk-based tables where an external memory pressure may cause the buffer pool to shrink which simply means there will be fewer data/index pages in memory. For this reason, it is all the more important to provision the memory for memory-optimized tables appropriately, otherwise in-memory OLTP engine  can starve other memory consumers including the memory needed by SQL Server for its operations which can ultimately leads to slow or unresponsive application. To address this, SQL provides a configuration option to limit the memory consumed by memory-optimized tables.

How can I limit memory consumed by memory-optimized tables?

Starting with SQL Server 2014, you can bind a database to a Resource Pool. This binding is only relevant when the database has one or more memory-optimized table. The memory available in the resource pool controls the total memory available to memory-optimized tables in the database.

For example, create a resource pool, mem_optpool as follows

CREATE RESOURCE POOL mem_optpool WITH (MAX_MEMORY_PERCENT = 40);

Now map the database, mydatabase, to this resource pool by executing the following command. With this command, you are specifying that the total memory taken by memory-optimized tables and indexes cannot exceed the limit in the resource pool. So for this case, the other 60% memory is available to other consumers.

EXEC sp_xtp_bind_db_resource_pool 'mydatabase', 'mem_optpool'

When configuring memory for memory-optimized tables, the capacity planning should be done based on MIN_MEMORY_PERCENT, not on MAX_MEMORY_PERCENT. This provides more predictable memory availability for memory-optimized tables as pools that have the min_memory_percent option set can cause memory pressure notifications against other pools to ensure the minimum percentage is honored.. To ensure that memory is available for the In-Memory OLTP database and help avoid OOM (Out of Memory) conditions, the values for MAX_MEMORY_PERCENT and MIN_MEMORY_PERCENT should be the same. SQL Server target memory is dynamic relative to the OS and setting a minimum memory would be recommended only if the server is not dedicated. For details, please refer to http://msdn.microsoft.com/en-us/library/dn465873(v=sql.120).aspx.

How does SQL Server reclaim memory taken by deleted/updated rows

The rows for memory-optimized tables are stored in-memory and are linked through Hash and non-clustered indexes as described http://msdn.microsoft.com/en-us/library/dn133190(v=sql.120).aspx. Concurrent access to memory-optimized table uses optimistic concurrency control  based on row versions.   Over time, the existing rows may get updated (update operation generates a row version(s)) and deleted but these rows can’t immediately be removed as there may be concurrent transactions that need these rows versions. These older row versions are garbage collected (GC’d) asynchronously when it is determined, based on the active transactions, that they are no longer needed. There is a GC system thread that shares the row version cleanup (i.e. GC) with user transaction activity to ensure that SQL Server is able to keep up with the GC.   When you configure the memory for your workload, you must account for additional memory needed for stale row versions. You can roughly estimate the memory needed for stale row versions using the following steps:

  1. Find the average number of rows updates/deleted by querying DMV dm_db_xtp_object_stats  at the beginning and end of a desired duration for your peak workload, and then compu
  2. Estimate the number (NR) of row versions = (duration of the longest transaction accessing HK tables in seconds) * (average number rows updated/deleted/sec)
  3. Assuming that delete/update operations were uniformly distributed across all memory-optimized tables, find the average row length RLen (including row header and index pointers).  You can now estimate the memory for row versions as (NR * Nlen)

You can use DMVs and Perfmon counters to monitor the progress of Garbage collection. Please refer to http://msdn.microsoft.com/en-us/library/dn133203(v=sql.120).aspx.

SQL Server 2014 In-Memory OLTP: Memory Management for Memory-Optimized Tables

Memory-optimized tables must fully reside in memory and can’t be paged out. Unlike disk-based tables where insufficient memory can slowdown an application, the impact to  memory-optimized tables upon encountering out-of-memory can be severe, causing  DML (i.e. delete, insert or update) operations to fail. While this adds a new dimension to managing memory, the application failure due to resource errors is not something new. For example, applications using disk-based tables can fail with resource errors such as running out of transaction log or TempDB or out of storage. It is the responsibility of DBAs/Administrators to make sure resources are provisioned and managed appropriately to avoid such failures. SQL Server provides a rich set of monitoring tools, including DMVs, PerfMon and XEvents to help administrators identify problems earlier so that a corrective action can be taken. Similarly, for memory-optimized tables, SQL Server provides a rich set of monitoring capabilities and configuration options so that you can manage your database/instance well and keep your application running smoothly.  The remainder of this blog walks thru each of the challenges and details how it can be addressed.

How do I estimate the memory needed?

This is the first question that you need consider when migrating an existing table(s) to memory-optimized table(s) or when you are considering a new application using memory-optimized tables. When migrating a disk-based table, you know, for example using sp_spaceused as described in http://technet.microsoft.com/en-us/library/ms188776.aspx ,  its current size so it is just a simple mathematical calculation to find the corresponding size for memory-optimized tables.  The key differences to be aware of are that memory-optimized tables cannot compressed like disk-based tables with ROW and PAGE compression, so, the memory-optimized table will likely be bigger. However, unlike indexes for disk-based tables, the indexes on memory tables are much smaller. For example, the index key is not stored with hash indexes and all indexes are, by definition, covered indexes. Please refer to http://msdn.microsoft.com/en-us/library/dn247639(v=sql.120).aspx for details. A more challenging task is to estimate the data growth.  While you can make a reasonable guess, the best way is the continuous to monitor the table size and the memory consumed by memory-optimized table (s) in your database and instance.  The same monitoring approach holds for new applications that are created with in-memory OLTP in mind.

How does SQL Server manage memory for memory-optimized tables?

The in-memory OLTP engine is integrated with SQL Server. Memory allocations to memory-optimized tables are managed by SQL Server Memory Manager and the allocated memory is tracked using familiar constructs and tools such as memory clerks and DMVs. The following DMV shows XTP memory clerks. The first row shows the memory allocated by system threads. The second row with name DB_ID_5 represents the consumers in the database objects and the third row with memory node-id 64 represents memory allocated to DAC (Dedicated Admin Connection).

-- Show the memory allocated to in-mempory OLTP objects

select type, name, memory_node_id, pages_kb/1024 as pages_MB

from sys.dm_os_memory_clerks where type like '%xtp%'

type                 name       memory_node_id pages_MB

-------------------- ---------- -------------- --------------------

MEMORYCLERK_XTP      Default    0              18

MEMORYCLERK_XTP      DB_ID_5    0              1358

MEMORYCLERK_XTP      Default    64             0

Also, there are new DMVs that can be used to monitor the memory consumed by the in-memory OLTP engine and memory-optimized tables. Please refer to http://msdn.microsoft.com/en-us/library/dn133203(v=sql.120).aspx for details.

Like any other memory consumer, the in-memory OLTP engine responds to memory-pressure, but to a limited degree. For example, the memory consumed by data and indexes can’t be released even under memory pressure. This is different than disk-based tables where an external memory pressure may cause the buffer pool to shrink which simply means there will be fewer data/index pages in memory. For this reason, it is all the more important to provision the memory for memory-optimized tables appropriately, otherwise in-memory OLTP engine  can starve other memory consumers including the memory needed by SQL Server for its operations which can ultimately leads to slow or unresponsive application. To address this, SQL provides a configuration option to limit the memory consumed by memory-optimized tables.

How can I limit memory consumed by memory-optimized tables?

Starting with SQL Server 2014, you can bind a database to a Resource Pool. This binding is only relevant when the database has one or more memory-optimized table. The memory available in the resource pool controls the total memory available to memory-optimized tables in the database.

For example, create a resource pool, mem_optpool as follows

CREATE RESOURCE POOL mem_optpool WITH (MAX_MEMORY_PERCENT = 40);

Now map the database, mydatabase, to this resource pool by executing the following command. With this command, you are specifying that the total memory taken by memory-optimized tables and indexes cannot exceed the limit in the resource pool. So for this case, the other 60% memory is available to other consumers.

EXEC sp_xtp_bind_db_resource_pool 'mydatabase', 'mem_optpool'

When configuring memory for memory-optimized tables, the capacity planning should be done based on MIN_MEMORY_PERCENT, not on MAX_MEMORY_PERCENT. This provides more predictable memory availability for memory-optimized tables as pools that have the min_memory_percent option set can cause memory pressure notifications against other pools to ensure the minimum percentage is honored.. To ensure that memory is available for the In-Memory OLTP database and help avoid OOM (Out of Memory) conditions, the values for MAX_MEMORY_PERCENT and MIN_MEMORY_PERCENT should be the same. SQL Server target memory is dynamic relative to the OS and setting a minimum memory would be recommended only if the server is not dedicated. For details, please refer to http://msdn.microsoft.com/en-us/library/dn465873(v=sql.120).aspx.

How does SQL Server reclaim memory taken by deleted/updated rows

The rows for memory-optimized tables are stored in-memory and are linked through Hash and non-clustered indexes as described http://msdn.microsoft.com/en-us/library/dn133190(v=sql.120).aspx. Concurrent access to memory-optimized table uses optimistic concurrency control  based on row versions.   Over time, the existing rows may get updated and deleted but these rows can’t immediately be removed as there may be concurrent transactions that need these rows versions. These older row versions are garbage collected (GC’d) asynchronously when it is determined, based on the active transactions, that they are no longer needed. There is a GC system thread that shares the row version cleanup (i.e. GC) with user transaction activity to ensure that SQL Server is able to keep up with the GC.   When you configure the memory for your workload, you must account for additional memory needed for stale row versions. You can roughly estimate the memory needed for stale row versions using 2*(longest running transaction in secs)*(number of row versions generated/sec).  You can use DMVs and Perfmon counters to monitor the progress of Garbage collection. Please refer to http://msdn.microsoft.com/en-us/library/dn133203(v=sql.120).aspx.

SQL Server 2014 In-Memory OLTP: Nonclustered Indexes for Memory-Optimized Tables

SQL Server 2014 CTP1 introduced hash indexes for memory-optimized tables. Hash indexes are very efficient for point lookups, when you know exactly the value you are looking for. However, they do not perform well if you need a range of value, for example a date range, or if you need to retrieve the rows in a particular order.

Memory-optimized nonclustered indexes are a new index type for memory-optimized tables in CTP2. Nonclustered indexes support retrieving ranges of values, and also support retrieving the table rows in the order that was specified when the index was created. They can be used to optimize the performance of queries on memory-optimized tables that use inequality predicates like ‘<’ and ‘>’, and queries using an ORDER BY clause.

Nonclustered indexes also support point lookups, but hash indexes still offer far better performance. You should continue to use hash indexes to optimize the performance of queries with equality predicates.

Hash Index Limitations

Consider, for example, the following (simplified) CREATE TABLE statement. The table SalesOrderHeader_inmem has a hash index on the primary key column ‘SalesOrderID’, as well as a hash index on the column ‘OrderDate’. The index definitions are bold-faced. Note that the BUCKET_COUNT should typically be set between one and two times the number of unique index key values.

CREATE TABLE [Sales].[SalesOrderHeader_inmem](
[SalesOrderID] uniqueidentifier NOT NULL
PRIMARY KEY
NONCLUSTERED HASH WITH (BUCKET_COUNT=1000000)
DEFAULT (NEWID()),
[RevisionNumber] [tinyint] NOT NULL CONSTRAINT 
[IMDF_SalesOrderHeader_RevisionNumber] DEFAULT ((0)),
[OrderDate] [datetime2] NOT NULL
INDEX ix_OrderDate HASH WITH (BUCKET_COUNT=1000000)
) WITH (MEMORY_OPTIMIZED=ON)

If you run a query of the form:

SELECT * FROM Sales.SalesOrderHeader_inmem WHERE SalesOrderID = @ID

SQL Server will use the hash index on the column SalesOrderID to quickly identify the row corresponding to the parameter @ID. Witness the Index Seek operation in the query plan:

 

Hash indexes are very efficient for point lookup operations, because they require only a simple lookup in a hash table rather than traversing an index tree structure, as is required for traditional (non)clustered indexes. If you are looking for the rows corresponding to a particular index key value, using a hash index is the way to go.

Now, hash indexes do have some limitations. Because of the nature of hash tables, rows appear in the index in random order. This means it is not possible to retrieve ranges of values, or to scan rows in any particular order. Therefore, hash indexes do not support Index Seek operations on inequality predicates, and do not support ordered scans. The former results in a full index scan, and the latter results in a scan followed by a(n expensive) sort operator. All this results in a potential performance degradation when using such queries with hash indexes.

Consider the following two example queries:

Query with an inequality predicate

SELECT * FROM Sales.SalesOrderHeader_inmem 
WHERE OrderDate > @Date

The plan for this query does not use the index on OrderDate; it simply includes full index scan for the primary index. This means that SQL Server will process all the rows in the table, and only later filter out the ones with OrderDate > @Date.

 

Query with an ORDER BY

SELECT * FROM Sales.SalesOrderHeader_inmem 
ORDER BY OrderDate

The plan for this query includes a sort operator, which is very costly: after scanning the rows, all rows will need to be ordered to obtain the desired sort-order.

 

Nonclustered indexes

The new memory-optimized nonclustered indexes support both Index Seek operations using inequality predicates, and ordered scans. Consider the following amended CREATE TABLE statement, which now includes a nonclustered index on the column OrderDate.

‘SalesOrderID’, as well as a hash index on the column ‘OrderDate’. The index definitions are bold-faced.

CREATE TABLE [Sales].[SalesOrderHeader_inmem](
[SalesOrderID] uniqueidentifier NOT NULL
PRIMARY KEY
NONCLUSTERED HASH WITH (BUCKET_COUNT=1000000)
DEFAULT (NEWID()),
[RevisionNumber] [tinyint] NOT NULL CONSTRAINT 
[IMDF_SalesOrderHeader_RevisionNumber] DEFAULT ((0)),
[OrderDate] [datetime2] NOT NULL
INDEX ix_OrderDate NONCLUSTERED
) WITH (MEMORY_OPTIMIZED=ON)

Note that the keyword NONCLUSTERED is optional in this example. Also note that the syntax for memory-optimized nonclustered indexes is similar to the traditional disk-based nonclustered indexes. The only difference is that with memory-optimized indexes you need to specify the index with the CREATE TABLE statement, while with traditional indexes, you can create the index after creating the table.

Consider now the following two queries:

Query with an inequality predicate

SELECT * FROM Sales.SalesOrderHeader_inmem 
WHERE OrderDate > @Date

The plan for this query now uses the index on OrderDate. And when inspecting the properties of the Index Seek operator, you will see the inequality predicate OrderDate > @Date. This means that SQL Server will only need to process the rows with OrderDate > @Date.

Query with an ORDER BY

SELECT * FROM Sales.SalesOrderHeader_inmem 
ORDER BY OrderDate

The plan for this query does not include a sort operator. SQL Server will scan the rows in the index in order; in this case the sort-order of the index is the same as the sort-order required by the query, thus no additional sort operator is needed.

 

Note that, in contrast to disk-based nonclustered indexes, memory-optimized nonclustered indexes are uni-directional. This means that they support index scans only in the order that was specified when the index was created. If the ORDER BY clause in the above example would require OrderDate in DESC order, the index ix_OrderDate could not be used.

Limitations on memory-optimized indexes

Memory-optimized indexes do have a few limitations in SQL Server 2014 that we hope to address in future versions. You do need to consider these limitations when deciding on an indexing strategy for your memory-optimized tables.

  • At most 8 indexes – you cannot specify more than 8 indexes on a single memory-optimized table.
  • BIN2 collations – when using n(var)char columns in an index key, the columns must have a _BIN2 collation. Using BIN2 allows very fast lookup, as this can be based simply on binary comparison. However, you need to consider the implications of using a BIN2 collation, such as case and accent sensitivity. For more details see the Books Online topic on Collations and Code Pages.
  • NOT NULL columns in the index key – memory-optimized indexes do not support nullable columns; all columns in the index key must be declared as NOT NULL.

Guidelines for choosing indexes

A few simple guidelines to help choose the type of index you need for your application:

  • If you need to perform only point lookups, meaning you need to retrieve only the rows corresponding to a single index key value, use a hash index.
  • If you need to retrieve ranges of rows, or need to retrieve the rows in a particular sort-order, use a nonclustered index.
  • If you need to do both, particularly if point lookups are frequent, you can consider creating two indexes: it is possible to create both a hash and a nonclustered index with the same index key.

Hash indexes require an estimate of the number of unique index key values, to determine an appropriate value for the BUCKET_COUNT – typically between 1X and 2X. Because nonclustered indexes do not require setting the BUCKET_COUNT, it can be tempting to simply use a nonclustered index instead of a hash index. However, this could lead to sub-optimal performance for point lookup operations. In many scenarios where point lookup performance is critical and there is no good estimate for the number of unique index key values, it is better to over-provision the BUCKET_COUNT: i.e., pick a very high bucket count that you know is going to be larger than the number of index keys. Note that over-provisioning does not affect the performance of point lookups, but under-provisioning does. However, over-provisioning does increase memory-consumption and it slows down full index scans.

For more details on usage of memory-optimized indexes see the Books Online article on Guidelines for Using Indexes on Memory-Optimized Tables.