Ruby applications often require handling computationally heavy tasks. I’ve faced scenarios where complex calculations slowed down entire systems. This article shares five practical techniques I use for parallelizing CPU-bound operations in Ruby. Each approach helps maximize processor utilization while respecting the language’s runtime characteristics.
Thread pooling efficiently manages worker allocation. I implement pools to control concurrency levels and avoid thread creation overhead. The pool maintains a queue of tasks and reusable worker threads. This pattern works well for mixed workloads with I/O components.
class ThreadPool
def initialize(size: 4)
@size = size
@tasks = Queue.new
@pool = Array.new(size) do
Thread.new do
catch(:exit) do
loop { @tasks.pop.call }
end
end
end
end
def schedule(&task)
@tasks << task
end
def shutdown
@size.times { schedule { throw :exit } }
@pool.each(&:join)
end
end
# Usage
pool = ThreadPool.new(size: 8)
100.times do |i|
pool.schedule do
Fibonacci.calculate(30 + i) # CPU-intensive
end
end
pool.shutdown
Process forking creates independent memory spaces. I fork child processes when needing true parallelism. The parent manages work distribution while children handle computation. This bypasses the Global Interpreter Lock entirely.
def parallel_map(items, &block)
read_pipes, write_pipes = [], []
items.map do |item|
read, write = IO.pipe
write_pipes << write
read_pipes << read
fork do
read.close
result = block.call(item)
Marshal.dump(result, write)
write.close
exit!(0)
end
end
write_pipes.each(&:close)
read_pipes.map { |pipe| Marshal.load(pipe.read) }
ensure
read_pipes.each(&:close) if read_pipes
end
# Execute
matrix_inverses = parallel_map(large_matrices) do |matrix|
matrix.inverse # Computation-heavy
end
Ractors provide memory isolation without full process overhead. I use them for thread-safe parallel execution. Each Ractor maintains independent state while communicating through channels.
def calculate_aggregates(datasets)
ractors = datasets.map do |ds|
Ractor.new(ds) do |dataset|
{
mean: dataset.mean,
std_dev: dataset.standard_deviation
}
end
end
ractors.map(&:take)
end
# Processing
stats = calculate_aggregates(partitioned_data)
Work stealing dynamically balances load. I implement queues where idle workers take tasks from busy ones. This self-adjusting pattern prevents thread starvation.
class WorkStealingPool
def initialize(worker_count: 4)
@global_queue = Queue.new
@worker_queues = Array.new(worker_count) { Queue.new }
@workers = worker_count.times.map do |i|
Thread.new do
while true
task = @worker_queues[i].pop(true) rescue nil
task ||= steal_work(i) || @global_queue.pop
task.call
end
end
end
end
def schedule(&task)
@global_queue << task
end
private
def steal_work(worker_id)
(@worker_queues - [@worker_queues[worker_id]]).each do |q|
return q.pop(true) if q.size > 0
end
nil
rescue ThreadError
retry
end
end
Lock-free structures reduce synchronization costs. I use atomic operations to manage shared state without mutexes. This pattern minimizes blocking during concurrent access.
require 'atomic'
class LockFreeCounter
def initialize
@value = Atomic.new(0)
end
def increment
@value.update { |v| v + 1 }
end
def decrement
@value.update { |v| v - 1 }
end
def value
@value.value
end
end
# Usage in concurrent processing
counter = LockFreeCounter.new
threads = 10.times.map do
Thread.new { 1000.times { counter.increment } }
end
threads.each(&:join)
puts counter.value # Correctly outputs 10000
These techniques significantly improve throughput for numerical computation, image processing, and statistical analysis. I choose thread pooling for mixed workloads, process forking for maximum isolation, Ractors for memory safety, work stealing for dynamic balancing, and lock-free structures for high-contention scenarios. Each method offers distinct advantages depending on specific performance requirements and operational constraints.
Benchmarks show process forking typically provides the highest throughput for pure CPU tasks. Ractors offer promising performance with lower memory overhead. Thread pooling delivers excellent results for workloads with intermittent I/O. Work stealing maintains efficiency with irregular task durations. Lock-free approaches minimize latency in high-concurrency situations.
I combine these patterns based on workload characteristics. For matrix operations, process forking often works best. For data pipeline processing, thread pools with work stealing provide flexibility. Statistical simulations benefit from Ractor isolation. The key is measuring actual performance rather than assuming theoretical advantages.
These approaches help Ruby applications efficiently utilize modern multi-core processors. The techniques maintain Ruby’s developer-friendly nature while addressing computational bottlenecks. Careful implementation results in order-of-magnitude improvements for latency-sensitive operations.