Rust has become a popular choice for developing high-performance, concurrent systems. Its ownership model and strong type system provide powerful guarantees for memory safety and thread safety. However, when it comes to building lock-free concurrent data structures, we need to employ specific techniques to ensure correctness and efficiency.
In this article, I’ll share six essential Rust techniques for implementing lock-free concurrent data structures. These methods have proven invaluable in my experience developing high-performance systems.
Atomic Operations
At the core of lock-free programming are atomic operations. Rust provides atomic types in the std::sync::atomic module, which allow for thread-safe updates to shared data without using locks.
Let’s look at a simple example of using an atomic counter:
use std::sync::atomic::{AtomicUsize, Ordering};
use std::thread;
fn main() {
let counter = AtomicUsize::new(0);
let handles: Vec<_> = (0..10)
.map(|_| {
thread::spawn(move || {
for _ in 0..1000 {
counter.fetch_add(1, Ordering::SeqCst);
}
})
})
.collect();
for handle in handles {
handle.join().unwrap();
}
println!("Final count: {}", counter.load(Ordering::SeqCst));
}
In this example, we create an AtomicUsize to represent our counter. Multiple threads can safely increment the counter concurrently using the fetch_add method, which atomically adds a value and returns the previous value.
Atomic operations are the building blocks for more complex lock-free data structures. They allow us to perform reads, writes, and compound operations on shared data without the need for locks.
Memory Ordering
When working with atomic operations, we must consider memory ordering. Memory ordering defines the guarantees about how memory accesses are observed by different threads.
Rust provides several memory ordering options:
- Relaxed: No synchronization or ordering guarantees.
- Release: All previous writes become visible to other threads that perform an Acquire operation.
- Acquire: All subsequent reads observe writes from other threads that performed a Release operation.
- AcqRel: Combines Acquire and Release semantics.
- SeqCst: Provides the strongest guarantees, ensuring a total order of operations across all threads.
Choosing the right memory ordering is crucial for both correctness and performance. Let’s modify our previous example to use Release-Acquire ordering:
use std::sync::atomic::{AtomicUsize, Ordering};
use std::thread;
fn main() {
let counter = AtomicUsize::new(0);
let handles: Vec<_> = (0..10)
.map(|_| {
thread::spawn(move || {
for _ in 0..1000 {
counter.fetch_add(1, Ordering::Release);
}
})
})
.collect();
for handle in handles {
handle.join().unwrap();
}
println!("Final count: {}", counter.load(Ordering::Acquire));
}
In this case, we use Ordering::Release for the fetch_add operation and Ordering::Acquire for the final load. This ensures that all increments are visible when we read the final value, while potentially offering better performance than SeqCst ordering.
ABA Problem Mitigation
The ABA problem is a common issue in lock-free programming. It occurs when a thread reads a value A, another thread changes it to B and back to A, and the first thread incorrectly concludes that the value hasn’t changed.
One technique to mitigate the ABA problem is to use tagged pointers. By combining a pointer with a tag that’s incremented on each update, we can detect ABA situations.
Here’s a simplified example of a tagged pointer implementation:
use std::sync::atomic::{AtomicUsize, Ordering};
use std::ptr;
struct TaggedPointer<T> {
data: AtomicUsize,
_marker: std::marker::PhantomData<T>,
}
impl<T> TaggedPointer<T> {
fn new(ptr: *mut T) -> Self {
TaggedPointer {
data: AtomicUsize::new(ptr as usize),
_marker: std::marker::PhantomData,
}
}
fn load(&self, order: Ordering) -> (*mut T, usize) {
let data = self.data.load(order);
let ptr = (data & !0x3) as *mut T;
let tag = data & 0x3;
(ptr, tag)
}
fn compare_and_swap(&self, current: *mut T, new: *mut T, current_tag: usize, order: Ordering) -> bool {
let current_data = (current as usize) | current_tag;
let new_data = (new as usize) | ((current_tag + 1) & 0x3);
self.data.compare_exchange(current_data, new_data, order, Ordering::Relaxed).is_ok()
}
}
This implementation uses the two least significant bits of the pointer as a tag, assuming pointer alignment. The compare_and_swap method increments the tag on each successful update, allowing detection of ABA situations.
Epoch-based Reclamation
Memory reclamation is a significant challenge in lock-free programming. We need to ensure that memory is freed only when no other threads are accessing it. Epoch-based reclamation is an efficient technique for managing memory in lock-free data structures.
The crossbeam crate provides an implementation of epoch-based reclamation. Here’s an example of how to use it:
use crossbeam_epoch::{self as epoch, Atomic, Owned};
use std::sync::atomic::Ordering;
struct Node<T> {
data: T,
next: Atomic<Node<T>>,
}
struct Stack<T> {
head: Atomic<Node<T>>,
}
impl<T> Stack<T> {
fn new() -> Self {
Stack {
head: Atomic::null(),
}
}
fn push(&self, data: T) {
let mut new_node = Owned::new(Node {
data,
next: Atomic::null(),
});
let guard = &epoch::pin();
loop {
let head = self.head.load(Ordering::Relaxed, guard);
new_node.next.store(head, Ordering::Relaxed);
match self.head.compare_exchange(
head,
new_node,
Ordering::Release,
Ordering::Relaxed,
guard,
) {
Ok(_) => break,
Err(e) => new_node = e.new,
}
}
}
fn pop(&self) -> Option<T> {
let guard = &epoch::pin();
loop {
let head = self.head.load(Ordering::Acquire, guard);
match unsafe { head.as_ref() } {
Some(node) => {
let next = node.next.load(Ordering::Relaxed, guard);
if self.head
.compare_exchange(head, next, Ordering::Release, Ordering::Relaxed, guard)
.is_ok()
{
unsafe {
guard.defer_destroy(head);
return Some(ptr::read(&node.data));
}
}
}
None => return None,
}
}
}
}
This example implements a lock-free stack using epoch-based reclamation. The epoch::pin() method creates a guard that ensures the current thread is tracked by the epoch system. The defer_destroy method is used to safely defer the destruction of nodes until no other threads are accessing them.
Hazard Pointers
Hazard pointers are another technique for safe memory reclamation in lock-free data structures. They work by having threads explicitly declare which pointers they’re currently accessing.
While Rust doesn’t have built-in support for hazard pointers, we can implement a basic version:
use std::sync::atomic::{AtomicPtr, AtomicUsize, Ordering};
use std::ptr;
const MAX_THREADS: usize = 100;
const MAX_HAZARD_POINTERS: usize = 2;
struct HazardPointer<T> {
pointer: AtomicPtr<T>,
}
struct HazardPointerDomain<T> {
hazard_pointers: [[HazardPointer<T>; MAX_HAZARD_POINTERS]; MAX_THREADS],
retired_list: AtomicPtr<RetiredNode<T>>,
}
struct RetiredNode<T> {
data: *mut T,
next: *mut RetiredNode<T>,
}
impl<T> HazardPointerDomain<T> {
fn new() -> Self {
HazardPointerDomain {
hazard_pointers: [[HazardPointer { pointer: AtomicPtr::new(ptr::null_mut()) }; MAX_HAZARD_POINTERS]; MAX_THREADS],
retired_list: AtomicPtr::new(ptr::null_mut()),
}
}
fn protect(&self, index: usize, ptr: *mut T) -> *mut T {
self.hazard_pointers[index][0].pointer.store(ptr, Ordering::SeqCst);
ptr
}
fn clear(&self, index: usize) {
self.hazard_pointers[index][0].pointer.store(ptr::null_mut(), Ordering::SeqCst);
}
fn retire(&self, ptr: *mut T) {
let node = Box::into_raw(Box::new(RetiredNode {
data: ptr,
next: ptr::null_mut(),
}));
loop {
let head = self.retired_list.load(Ordering::Relaxed);
unsafe { (*node).next = head };
if self.retired_list.compare_exchange(head, node, Ordering::Release, Ordering::Relaxed).is_ok() {
break;
}
}
self.try_reclaim();
}
fn try_reclaim(&self) {
let mut hazard_pointers = Vec::new();
for thread_hazard_pointers in &self.hazard_pointers {
for hazard_pointer in thread_hazard_pointers {
let ptr = hazard_pointer.pointer.load(Ordering::SeqCst);
if !ptr.is_null() {
hazard_pointers.push(ptr);
}
}
}
let mut prev = &self.retired_list;
while let Some(current) = unsafe { (*prev).as_ref() } {
if !hazard_pointers.contains(&(current.data as *mut _)) {
let next = current.next;
if self.retired_list.compare_exchange(current, next, Ordering::Release, Ordering::Relaxed).is_ok() {
unsafe {
Box::from_raw(current.data as *mut T);
Box::from_raw(current as *mut RetiredNode<T>);
}
}
} else {
prev = &mut (*current).next;
}
}
}
}
This implementation provides a basic hazard pointer system. Threads can protect pointers they’re accessing, clear their hazard pointers when done, and retire pointers for later reclamation. The try_reclaim method attempts to free memory that’s no longer being accessed by any thread.
Lock-free Linked Lists
Now that we’ve covered these fundamental techniques, let’s put them together to implement a lock-free linked list. This example will use atomic operations and epoch-based reclamation:
use crossbeam_epoch::{self as epoch, Atomic, Owned};
use std::sync::atomic::Ordering;
pub struct Node<T> {
data: T,
next: Atomic<Node<T>>,
}
pub struct List<T> {
head: Atomic<Node<T>>,
}
impl<T: Clone> List<T> {
pub fn new() -> Self {
List {
head: Atomic::null(),
}
}
pub fn insert(&self, data: T) {
let mut new_node = Owned::new(Node {
data,
next: Atomic::null(),
});
let guard = &epoch::pin();
loop {
let head = self.head.load(Ordering::Relaxed, guard);
new_node.next.store(head, Ordering::Relaxed);
match self.head.compare_exchange(
head,
new_node,
Ordering::Release,
Ordering::Relaxed,
guard,
) {
Ok(_) => break,
Err(e) => new_node = e.new,
}
}
}
pub fn remove(&self, data: &T) -> Option<T>
where
T: PartialEq,
{
let guard = &epoch::pin();
let mut prev = &self.head;
let mut curr = prev.load(Ordering::Acquire, guard);
while let Some(curr_ref) = unsafe { curr.as_ref() } {
let next = curr_ref.next.load(Ordering::Acquire, guard);
if curr_ref.data == *data {
if prev.compare_exchange(curr, next, Ordering::Release, Ordering::Relaxed, guard).is_ok() {
unsafe {
guard.defer_destroy(curr);
return Some(ptr::read(&curr_ref.data));
}
}
} else {
prev = &curr_ref.next;
curr = next;
}
}
None
}
pub fn contains(&self, data: &T) -> bool
where
T: PartialEq,
{
let guard = &epoch::pin();
let mut curr = self.head.load(Ordering::Acquire, guard);
while let Some(node) = unsafe { curr.as_ref() } {
if node.data == *data {
return true;
}
curr = node.next.load(Ordering::Acquire, guard);
}
false
}
}
This implementation provides a lock-free linked list with insert, remove, and contains operations. It uses atomic operations for thread-safe updates and epoch-based reclamation for safe memory management.
The insert method creates a new node and repeatedly attempts to add it to the head of the list using a compare-and-exchange operation. The remove method traverses the list, removing the first node with matching data. The contains method simply traverses the list to check for the presence of a value.
These techniques form the foundation for building efficient, lock-free concurrent data structures in Rust. By leveraging atomic operations, carefully managing memory ordering, and employing advanced memory reclamation techniques, we can create high-performance, thread-safe data structures without relying on locks.
Remember, lock-free programming is complex and requires careful consideration of edge cases and potential race conditions. Always thoroughly test your implementations and consider using established libraries like crossbeam for production use.
As we continue to push the boundaries of concurrent programming, these techniques will become increasingly important. They allow us to harness the full power of modern multi-core processors while maintaining the safety and reliability that Rust is known for.
In my experience, mastering these techniques has been crucial for developing high-performance systems. They’ve allowed me to create efficient, scalable solutions that can handle high levels of concurrency without the overhead of traditional locking mechanisms.
As you explore lock-free programming in Rust, remember that it’s a journey. Start with simpler data structures and gradually work your way up to more complex ones. With practice and persistence, you’ll be able to create robust, high-performance concurrent systems that can handle the demands of modern computing environments.