We believe that the In-Memory OLTP engine advances the industry state of the art with respect to concurrency control. The main reason for this advancement is due to the combination of lock free algorithms and the row-versioned architecture of the engine.
This post examines what we precisely mean when we describe the In-Memory OLTP engine as being ‘lock free’, both in abstract terms but more importantly in terms of impact on user workloads.
Let’s start with a brief definition capturing the core attributes implied by the term ‘lock free’ in In-Memory OLTP:
“At steady state, transaction carrying threads executing in the context of the In-Memory OLTP engine are designed to require no blocking operation.”
Element by element, this definition implies the following exclusions:
- Steady state – we explicitly do not claim that execution will be lock free when the system ramps up or shuts down or when it goes through significant fluctuations in the workload. In particular for instance, the In-Memory OLTP engine allocates large pages of memory from the SQL Server host (and implicitly from the host OS). Clearly these operations will acquire locks since neither the OS nor SQL Server in general are ‘lock free’. The engine does implement its own lock free small block allocator (currently capped at 8K blocks), so once a page is obtained from the host process, future block allocations will be serviced without the need to acquire any subsequent locks, but the workload ramp-up – characterized as it is by page acquisition from the host – will incur some locking overhead.
- Transaction carrying threads – we use this term to differentiate between threads doing work on behalf of the user workload from threads doing work on behalf of the system itself. Examples of system (or worker) threads include threads involved in checkpoint, some GC maintenance, file allocation – and so on. Since these threads are never visible to the end user and since their responsiveness does not impact the end user workload, execution of these system threads is not designed to be entirely lock free.
- Execution in the context of the In-Memory OLTP engine. We use this phrase to distinguish between execution that takes place within the context of the engine and execution that takes place in the SQL host, in the OS or even in the client stack. One example triggering this exclusion was presented above: acquiring large pages from the SQL host will occasionally acquire locks. Similarly, file access in the context of checkpoint or logging will incur OS level locking; waiting for log IO before acknowledging a transaction commit will also incur some form of waiting; clearing of transaction dependencies - which is required for correctness - will also cause a thread to stall, and so on. These examples all demonstrate that locking is still present in SQL Server running In-Memory OLTP. However, it also remains true that we have eliminated all locks or waits within the engine itself – and that is precisely the location where locks (or in our case their absence) have the most impact on the end user.
Once we agree on the definitions above, we note that in the context of any database system, locking behavior always implies two distinct considerations: we use locks for logical data protection (also known as transactional isolation) and for physical data protection.
One example of logical data protection is row payload protection. Row payload protection means that while a user modifies a row another user does not modify the same row. This is what logical (transactional) locks are used for in traditional SQL Server (row locks, page locks, database locks and even app locks can be used for the same purpose). With In-Memory OLTP we rely on row versioning to ensure that row content is never modified by two users at the same time – in other words we don’t use transactional locks because we never update data in place. If two or more users try to update the same row at the same time, one will succeed while the others will fail due to a write/write conflict. Note that this can be achieved without any locking by creating a new version for each user and then trying to install each version atomically in the same index location (by using ‘InterlockedCompareExchange’ – or ICX). Out of the multiple users that try to update the same row at the same time, one will succeed and the others will fail this hardware level ICX operation – and that translates directly into the behavior reported back by the system.
Physical data protection is a different problem altogether. In traditional database systems (SQL Server but also more broadly in the industry) we protect internal data structures via spinlocks and latches. Broadly speaking (and oversimplifying), the difference between spinlocks and latches is that a thread trying to acquire a spinlock will spin when the lock is found to be currently held by a different thread whereas for latches the acquiring thread will yield its CPU time back to the OS if the latch is found to be currently held by a different thread. Given this difference, traditional SQL uses latches for waits that could take a while (getting a page in the buffer pool) while spinlocks are used for short term waits (waiting for a memory only linked list traversal for instance). However, both locks and latches have in common the fact that they are ‘region locks’. We use the term ‘region lock’ to describe the mechanism used to protect a region of code or data structures against simultaneous thread access. Region locks implement an ‘Acquire/Release’ pattern – where a lock (either latch or spinlock) is first acquired, the protected region executes, and then the lock is released. The problem with that approach is that it does not scale. In a system with many cores or very high concurrency the region being protected becomes a bottleneck. For instance, in Windows Server the scheduling quantum is around 180ms, so if a thread that holds a spinlock gets preempted, that spinlock will be held for 180ms regardless of how short the protected region would be otherwise. There are other negative side effects from region locks because they all involve writing to shared cache lines even when the lock is acquired for read access – which becomes problematic in many-core and NUMA systems or under high concurrency. The lock free engine avoids these issues by implementing all operations in an atomic fashion. In other words, the In-Memory OLTP engine does not define any protected regions in transaction executing paths. The data structures and algorithms are structured such that state transitions are atomic and therefore are not subject to the whims of the scheduling subsystem. In addition, many operations are done without any shared-cache line modifying instructions at all (meaning that the entire operation does not even use ICX but rather touches in write mode only cache lines that are private to the local processor) which improves scalability and concurrency to the limits supported by the hardware.
The prime example of that is index traversal: the engine walks both hash and range indices without any locks or ICX instructions. In the process the engine detects if the underlying data structure has changed in a manner that could invalidate the current traversal and re-attempts the small portion of the traversal that was invalidated. These re-traversal are extremely rare even at very high concurrency (in the early days of the In-Memory OLTP engine we have measured under 100 retries for million tx / sec workload) – so their measurable performance impact is virtually non-existent. When the engine needs to modify one of these lock free data structures it does so via ICX – which makes the modification visible atomically.
A careful observation of current hardware trends presents overwhelming evidence that the number of available cores in any given computing platform is likely to rise with time. In this context, concurrency control that is at its core lean and efficient is absolutely crucial to achieving first-rate performance. With In-Memory OLTP we have taken these insights to heart and built an engine that relies on no locks, waits, latches or other synchronization primitives to ensure consistency of execution. We believe this approach removes locking and latching as a concern for even our most demanding users.