Practical Lock-Free and Wait-Free LL/SC/VL Implementations Using 64-Bit CAS

Authors

Maged Michael, IBM Thomas J Watson Research Center

Abstract

The ABA problem is a fundamental problem for lock-free and wait-free algorithms. The ideal semantics of the atomic primitives LL/SC/VL (Load-Linked, Store-Conditional, Validate) are inherently immune to that problem. However, for practical architectural reasons, no processor architecture supports the ideal semantics of LL/VL/SC. Current mainstream architectures support either CAS (Compare-and-Swap) or LL/SC with restricted semantics, which are susceptible to the ABA problem. Furthermore, most current mainstream 64-bit architectures do not support atomic instructions on more than 64-bit memory blocks, thus making LL/SC/VL implementations that require support for the atomic manipulation of wider memory blocks impractical. The recent 64-bit wait-free implementations of 64-bit LL/SC/VL variables by Jayanti and Petrovic entail space overhead per LL/SC/VL variable that is proportional to the number of threads that may operate on it. In programs with a large number of LL/SC/VL variables, the space overhead can be unacceptable in practice. The implementations also require advance knowledge of the maximum number of threads that may operate on the LL/SC/VL variables, which is not always possible to predict without making conservative estimates that further increase the space overhead beyond practical limits. In this paper, we present practical lock-free and wait-free implementations of arbitrary-sized LL/SC/VL that require only 64-bit CAS in 64-bit programs. The implementations require only a constant space overhead per LL/SC/VL variable (one word for the lock-free implementation and four words for the wait-free implementation), and linear space per participating thread that is amortized over all the LL/SC/VL variables in the program, and may be subsumed by the independent memory reclamation needs of the program. The implementations do not require advance knowledge of the maximum number of participating threads. In the wait-free implementation, LL and VL take constant time, and SC takes constant amortized expected time. In the lock-free implementation, VL takes constant time and SC takes constant amortized expected time, and in the absence of interfering successful SC operations, LL takes constant time. In both implementations---regardless of contention---concurrent LL, VL, and unsuccessful SC operations do not interfere with each other, and SC and VL are wait-free. Using the work performance measure, the amortized expected complexity of any set of LL/SC/VL operations using either of our implementations is the same as that assuming hypothetical hardware support for ideal LL/SC/VL.