Concurrency used to scare me. I’d stare at code that worked perfectly one moment and then, for reasons I couldn’t easily see, it would crash or produce subtly wrong answers when multiple threads got involved. The problems had names like data races and deadlocks, but they felt more like ghosts in the machine—unpredictable and hard to catch. Then I started working with Rust. It wasn’t a magic fix, but it gave me a new set of tools and, more importantly, a new way of thinking. The language’s rules guide you toward writing code that can safely do many things at once. Here are some of the most effective ways I’ve learned to structure that code.
Think of data as a big loaf of bread. If you want several people to put different toppings on pieces of it, you wouldn’t give them all the same loaf to fight over. You’d slice it first. Data parallelism works the same way. You split your data into distinct, non-overlapping pieces and hand each piece to a different thread. Because each thread owns its slice, there’s no conflict. In Rust, methods like chunks or chunks_mut on slices make this straightforward. This is often the first and most effective pattern I try.
use std::thread;
fn parallel_sum(data: &[i32]) -> i32 {
let num_threads = 4;
// Ensure we have at least one item per chunk to avoid division by zero
let chunk_size = std::cmp::max(1, data.len() / num_threads);
let mut handles = vec![];
for chunk in data.chunks(chunk_size) {
// Each thread gets its own owned copy of the chunk data.
let chunk_vec = chunk.to_vec();
handles.push(thread::spawn(move || {
// This thread can safely work on `chunk_vec` without affecting others.
let local_sum: i32 = chunk_vec.iter().sum();
println!("Thread calculated sum: {}", local_sum);
local_sum
}));
}
// Collect results from all threads.
let total: i32 = handles
.into_iter()
.map(|h| h.join().expect("Thread panicked"))
.sum();
total
}
fn main() {
let numbers: Vec<i32> = (1..=100).collect(); // 1 to 100
let sum = parallel_sum(&numbers);
println!("The parallel sum is: {}", sum); // Should print 5050
}
Sometimes, making a full copy of each data chunk feels heavy, especially if the data is large. You might want threads to work on views into the original data. Standard Rust threads have a limitation: they require the data they capture to last for the entire program, which is too restrictive for temporary data. This is where scoped threads come in, and the crossbeam crate provides an excellent implementation. Scoped threads let you promise the compiler that all child threads will finish before the current function ends. This lets them safely borrow data from the parent’s stack.
use crossbeam::thread; // Note: using crossbeam's scoped threads
fn find_large_numbers(data: &[i32], threshold: i32) -> Vec<i32> {
let mut results = Vec::new();
// The `scope` function creates a boundary. All threads spawned inside
// are guaranteed to terminate before the closure ends.
thread::scope(|s| {
let num_chunks = 4;
let chunk_size = std::cmp::max(1, data.len() / num_chunks);
let mut handles = vec![];
for chunk in data.chunks(chunk_size) {
// We are spawning a thread that borrows `chunk`.
// `chunk` itself is a borrow of `data`, which lives in the outer function.
// This is only safe because of the scoped guarantee.
handles.push(s.spawn(move |_| {
chunk
.iter()
.filter(|&&x| x > threshold)
.copied()
.collect::<Vec<_>>()
}));
}
// Collect results from the scoped threads.
for handle in handles {
results.extend(handle.join().unwrap());
}
})
.unwrap(); // If a thread panicked, this will propagate the error.
results
}
fn main() {
let readings = vec![5, 23, 7, 48, 12, 56, 3, 71];
let big_readings = find_large_numbers(&readings, 20);
println!("Readings above 20: {:?}", big_readings); // [23, 48, 56, 71]
}
Not all data needs to be split. Sometimes, many threads need to read the same information, like a configuration map or a dataset that’s expensive to create. For this, we need shared ownership. Rust’s Arc<T>—which stands for Atomically Reference Counted—is the tool for the job. It’s like a smart pointer that keeps track of how many places are using it. When the count drops to zero, the data is cleaned up. Because the reference counting is atomic, it’s safe to clone the Arc and send those clones across threads. The key here is that the data inside is immutable. Many readers, no writers.
use std::sync::Arc;
use std::thread;
fn main() {
// This vector of data might be large or costly to compute.
let reference_data = Arc::new(vec!["Alpha", "Beta", "Gamma", "Delta"]);
let mut thread_handles = vec![];
for thread_id in 0..5 {
// Clone the `Arc`. This does not clone the underlying data,
// just increments the reference count.
let data_for_thread = Arc::clone(&reference_data);
thread_handles.push(thread::spawn(move || {
// Each thread can read from the shared, immutable data.
let item = &data_for_thread[thread_id % data_for_thread.len()];
println!(
"Thread {} is processing item: {}",
thread_id, item
);
}));
}
for handle in thread_handles {
handle.join().unwrap();
}
// When the last `Arc` clone is dropped here, the vector is freed.
}
What happens when threads genuinely need to change shared data? This is where things traditionally get dangerous. Rust’s answer is the Mutex (short for mutual exclusion). You can think of a Mutex as a box that holds your data, but it has one key. A thread must ask for the key to open the box and look at or change the data. If another thread has the key, the requesting thread will wait. In Rust, the “key” is called a guard. The brilliance of Rust’s system is that the lock is automatically released when the guard goes out of scope. You can’t forget to unlock it.
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
fn main() {
// Wrap the data we want to share and mutate in a Mutex.
// Then wrap the Mutex in an Arc so multiple threads can own it.
let shared_counter = Arc::new(Mutex::new(0));
let mut handles = vec![];
for _ in 0..10 {
let counter_ref = Arc::clone(&shared_counter);
let handle = thread::spawn(move || {
// Simulate some work before updating.
thread::sleep(Duration::from_millis(10));
{
// `lock()` acquires the mutex. It returns a `MutexGuard`.
// If another thread holds the lock, this thread will block here.
let mut guard = counter_ref.lock().unwrap();
// The guard dereferences to our inner data (`i32`).
*guard += 1;
// The lock is automatically released when `guard` is dropped at the end of this block.
}
// Other work can happen here without holding the lock.
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
// Acquire the lock one last time to read the final value.
let final_value = *shared_counter.lock().unwrap();
println!("Final counter value: {}", final_value); // Should be 10
}
Channels provide a different model for communication: message passing. Instead of multiple threads touching the same data, one thread sends a message and another receives it. The simplest form is a one-time channel, often called a oneshot. It’s perfect for sending a single result from a worker thread back to the main thread, or for sending a shutdown signal. Rust’s standard library provides this through its multi-producer, single-consumer (mpsc) channels.
use std::sync::mpsc; // Multi-producer, single-consumer
use std::thread;
fn expensive_task(input: i32) -> i32 {
thread::sleep(std::time::Duration::from_millis(500)); // Simulate work
input * 2
}
fn main() {
// Create a channel. `tx` is the sending end (transmitter), `rx` is the receiving end (receiver).
let (tx, rx) = mpsc::channel();
thread::spawn(move || {
let result = expensive_task(21);
// Send the result down the channel.
// This consumes `tx`. If we needed to send multiple messages, we could clone `tx`.
tx.send(result).expect("Failed to send result");
// The thread ends here.
});
// The main thread waits for the message to arrive.
// `recv()` will block until a message is sent.
let received = rx.recv().expect("Failed to receive or sender dropped");
println!("The answer is: {}", received); // Prints "The answer is: 42"
}
A single message is useful, but many systems involve a steady stream of tasks. This is where a multi-producer, multi-consumer work queue shines. You create a pool of worker threads at startup. A main thread (or multiple producer threads) sends units of work into a channel. Worker threads sit in a loop, taking tasks from the channel and executing them. This pattern elegantly balances load across available CPU cores.
use std::sync::mpsc;
use std::thread;
fn main() {
let (tx, rx) = mpsc::channel();
let num_workers = 4;
let mut workers = vec![];
// Create worker threads.
for worker_id in 0..num_workers {
// Each worker needs its own receiving end of the channel.
// `rx` is cloned for each worker. The original `rx` remains with the main thread.
let worker_rx = rx.clone();
workers.push(thread::spawn(move || {
// The worker listens for tasks until the channel is closed.
for task in worker_rx {
println!("Worker {} processing task: {}", worker_id, task);
thread::sleep(std::time::Duration::from_millis(100)); // Simulate work
}
println!("Worker {} shutting down.", worker_id);
}));
}
// Send some tasks from the main thread.
for task_number in 0..15 {
tx.send(task_number).unwrap();
}
// Important: We must drop the original `tx` to close the channel.
// If we don't, the worker threads will wait forever for more messages.
drop(tx);
// Wait for all workers to finish processing their tasks and exit.
for worker in workers {
worker.join().unwrap();
}
println!("All work completed.");
}
Mutexes are powerful, but they can become a bottleneck. If many threads are constantly trying to lock the same data, they spend a lot of time waiting. For high-contention scenarios, lock-free data structures can offer better performance. These are complex to build correctly, but libraries like crossbeam provide safe, battle-tested implementations. They allow multiple threads to push and pop items concurrently without ever blocking each other.
use crossbeam::queue::ArrayQueue;
use std::sync::Arc;
use std::thread;
fn main() {
// Create a fixed-capacity, lock-free queue.
let queue = Arc::new(ArrayQueue::new(10)); // Holds up to 10 items.
let mut handles = vec![];
// Producer thread
let producer_queue = Arc::clone(&queue);
let producer = thread::spawn(move || {
for i in 0..10 {
// `push` will fail if the queue is full.
while producer_queue.push(i).is_err() {
// In a real system, you might wait or do other work.
thread::yield_now();
}
println!("Produced: {}", i);
}
println!("Producer done.");
});
handles.push(producer);
// Consumer thread
let consumer_queue = Arc::clone(&queue);
let consumer = thread::spawn(move || {
let mut consumed = 0;
while consumed < 10 {
// `pop` returns `None` if the queue is empty.
if let Some(val) = consumer_queue.pop() {
println!("Consumed: {}", val);
consumed += 1;
} else {
thread::yield_now(); // Briefly yield the CPU if queue is empty.
}
}
println!("Consumer done.");
});
handles.push(consumer);
for handle in handles {
handle.join().unwrap();
}
println!("Queue test finished.");
}
Writing all this thread spawning and channel management can be repetitive. For many common operations on collections, the rayon crate is a fantastic choice. It provides parallel iterators. You can often change a regular .iter() to .par_iter() and suddenly your loop runs across multiple cores. Under the hood, Rayon manages a thread pool and splits the work for you. It’s a high-level pattern that lets you focus on what to compute, not how to split it up.
use rayon::prelude::*; // Import the parallel iterator traits.
fn is_prime(n: u32) -> bool {
if n <= 1 {
return false;
}
let limit = (n as f64).sqrt() as u32;
for i in 2..=limit {
if n % i == 0 {
return false;
}
}
true
}
fn main() {
// A large range of numbers to check.
let range = 1..=100_000;
// Sequential version (for comparison).
// let prime_count_seq = range.clone().filter(|&n| is_prime(n)).count();
// Parallel version: simply change `iter()` to `par_iter()`.
// Rayon splits the range and processes chunks on its internal thread pool.
let prime_count_par = range
.into_par_iter() // This turns it into a parallel iterator.
.filter(|&n| is_prime(n))
.count();
println!("Number of primes found: {}", prime_count_par);
// You can also collect results into a vector in parallel.
let first_20_primes: Vec<u32> = (1..=100)
.into_par_iter()
.filter(|&n| is_prime(n))
.collect(); // `collect()` works with parallel iterators too!
println!("First few primes: {:?}", &first_20_primes[..10]);
}
These patterns form a practical toolkit. When I approach a problem now, I have a mental checklist. Can I split the data? If yes, that’s often the fastest path. Do threads just need to read? Arc is my friend. Do they need to write? A Mutex or a channel is likely the answer. Is it a stream of tasks? I set up a worker pool. Am I iterating over a collection? I reach for Rayon first to see if it simplifies things. The goal is never to use the most complex pattern, but to use the simplest one that makes the compiler happy and gets the job done safely. Rust’s strict compiler is a partner in this process. It will stop me if I try to share data in an unsafe way, forcing me to pick an explicit, safe pattern. This initial friction saves immense time later that would have been spent debugging unpredictable concurrency errors. These patterns are the bridges that help you cross from single-threaded thought into parallel execution, with the guardrails firmly in place.