Rails Database Sharding: Production Patterns for Horizontal Scaling and High-Performance Applications
Learn how to implement database sharding in Rails applications for horizontal scaling. Complete guide with shard selection, connection management, and migration strategies.
Scaling Rails applications requires thoughtful approaches to database management. When datasets grow beyond a single database’s capacity, horizontal partitioning becomes essential. I’ve implemented sharding in multiple production systems, learning valuable lessons about maintaining performance and reliability.
Shard selection strategies form the foundation. A well-designed mapping system directs queries to appropriate databases. I typically use range-based partitioning for predictable growth patterns.
# Range-based shard mapping
class TenantShardResolver
SHARD_CONFIG = {
'europe_customers' => (1..500_000),
'asia_customers' => (500_001..1_000_000),
'americas_customers' => (1_000_001..1_500_000)
}.freeze
def self.resolve(tenant_id)
SHARD_CONFIG.each do |shard, range|
return shard if range.include?(tenant_id)
end
raise ShardNotFoundError, "No shard configured for tenant #{tenant_id}"
end
end
# Hash-based alternative for uniform distribution
class HashedShardResolver
SHARDS = %w[shard_a shard_b shard_c shard_d]
def self.resolve(user_id)
SHARDS[user_id.hash.abs % SHARDS.size]
end
end
Connection management handles routing between database instances. ActiveRecord’s connection switching works well when properly isolated. I wrap critical sections with connection guards to prevent leakage.
module ShardSwitcher
def self.with_shard(shard_name)
previous_shard = ActiveRecord::Base.current_shard
ActiveRecord::Base.connected_to(shard: shard_name) do
yield
end
ensure
ActiveRecord::Base.connected_to(shard: previous_shard) if previous_shard
end
end
# Usage in controller
class UsersController < ApplicationController
def show
shard = TenantShardResolver.resolve(params[:tenant_id])
ShardSwitcher.with_shard(shard) do
@user = User.find(params[:id])
end
end
end
Migration patterns require special attention. I implement phased data transfers using background workers with idempotent operations. This snippet shows a safe migration approach.
class ShardMigrationWorker
include Sidekiq::Worker
def perform(user_id, source_shard, target_shard)
# Fetch from source
source_data = ShardSwitcher.with_shard(source_shard) do
user = User.lock.find(user_id) # Prevent changes during copy
{ user: user.attributes, profile: user.profile.attributes }
end
# Insert into target
ShardSwitcher.with_shard(target_shard) do
User.transaction do
new_user = User.create!(source_data[:user].except('id'))
new_user.create_profile!(source_data[:profile].except('id'))
end
end
# Delete from source after verification
ShardSwitcher.with_shard(source_shard) do
User.find(user_id).destroy
end
rescue => e
notify_operations(e, user_id)
raise # Retry logic
end
end
Shard key design significantly impacts performance. I avoid using timestamps or frequently updated columns as shard keys. Instead, I choose stable identifiers with uniform distribution characteristics. Composite keys sometimes help when multiple dimensions require partitioning.
Connection pooling configuration prevents resource exhaustion. Each shard needs dedicated settings in database.yml. I adjust pool size based on worker thread counts.
production:
primary:
database: main_app
pool: 25
europe_customers:
database: eu_db
pool: 15
migrations_paths: db/eu_migrate
asia_customers:
database: asia_db
pool: 15
migrations_paths: db/asia_migrate
Global sequence generation avoids ID collisions. I use PostgreSQL’s sequence objects with different starting offsets.
-- For shard 1
CREATE SEQUENCE global_user_id_seq START 1 INCREMENT 100;
-- For shard 2
CREATE SEQUENCE global_user_id_seq START 2 INCREMENT 100;
Query federation combines results from multiple shards. I implement parallel query patterns using concurrent Ruby.
def federated_query(user_ids)
shard_groups = user_ids.group_by { |id| TenantShardResolver.resolve(id) }
results = Parallel.map(shard_groups) do |shard, ids|
ShardSwitcher.with_shard(shard) do
User.where(id: ids).to_a
end
end
results.flatten
end
Background migration tools handle large data movements. I schedule migrations during low-traffic periods using cron-based triggers. The Lhm gem (Liquid Hair Migrate) proves valuable for online schema changes.
Lhm.change_table :users, atomic_switch: true do |t|
t.add_column :new_tier, :string
t.add_index :new_tier
end
Health monitoring tracks critical metrics. I instrument shard performance with Datadog integration.
Datadog::Statsd.new.timing('shard.query_time') do
ShardSwitcher.with_shard(shard) { User.find(id) }
end
Operational considerations include backup strategies and connection failover. I implement staggered backups using Percona XtraBackup with shard-specific schedules. Connection failover uses HAProxy with health checks.
Transactional integrity requires special patterns. I use two-phase commit protocols for critical operations spanning shards.
def transfer_funds(sender, receiver, amount)
sender_shard = TenantShardResolver.resolve(sender.tenant_id)
receiver_shard = TenantShardResolver.resolve(receiver.tenant_id)
ActiveRecord::Base.transaction do
ShardSwitcher.with_shard(sender_shard) do
sender.withdraw(amount)
end
ShardSwitcher.with_shard(receiver_shard) do
receiver.deposit(amount)
end
end # Both operations commit or rollback together
end
Testing sharded systems demands environment parity. I use Docker Compose to replicate shard topologies in CI pipelines. FactoryBot sequences account for shard-specific ID ranges.
FactoryBot.define do
sequence :sharded_id do |n|
ShardResolver.current.start_range + n
end
factory :user do
id { generate(:sharded_id) }
end
end
Performance optimization includes proper indexing strategies. I create shard-local indexes rather than assuming global uniqueness. Query analysis uses pg_stat_statements filtered by shard.
SELECT * FROM pg_stat_statements
WHERE query LIKE '% FROM users%'
AND dbid = (SELECT oid FROM pg_database WHERE datname = 'europe_customers');
Error handling must account for shard-specific failures. I implement circuit breakers to isolate problematic shards.
class ShardCircuit
def self.with_circuit(shard)
return yield if healthy?(shard)
raise CircuitOpenError if circuit_open?(shard)
begin
result = yield
mark_success(shard)
result
rescue DBError => e
mark_failure(shard)
raise
end
end
end
Rollout strategies start with read-only sharding. I first redirect reporting queries to replica shards before implementing full read-write partitioning. Feature flags control gradual activation.
if FeatureFlag.enabled?(:sharded_writes, user)
ShardSwitcher.with_shard(resolved_shard) { operation }
else
operation # Legacy path
end
Schema management requires coordination across shards. I use Rails multi-database migrations with version tracking per shard.
RAILS_ENV=production SHARD=shard_a rails db:migrate
RAILS_ENV=production SHARD=shard_b rails db:migrate
Connection warming prevents cold-start latency. I add initializers that establish connections during application boot.
Rails.application.config.after_initialize do
TenantShardResolver::SHARDS.each do |shard|
ActiveRecord::Base.connection_handler.establish_connection(shard.to_sym)
end
end
Resource cleanup prevents memory bloat. I implement connection recyclers in long-running processes.
ActiveSupport::Notifications.subscribe(/active_record/) do
ActiveRecord::Base.clear_active_connections!
end
These techniques evolved through practical application in high-traffic environments. Sharding introduces complexity but enables horizontal scaling that single-database solutions cannot match. Proper implementation balances immediate performance needs with long-term operational sustainability.