Building robust API rate limiting systems has become essential for modern Ruby on Rails applications. After implementing rate limiting across numerous production APIs, I’ve discovered that effective rate limiting requires more than basic request counting. It demands sophisticated patterns that balance user experience with system protection.
Rate limiting serves multiple purposes in API design. It prevents abuse, ensures fair resource allocation, and maintains service availability during traffic spikes. The challenge lies in implementing these protections without hampering legitimate usage or creating unnecessary complexity.
Token Bucket Algorithm for Smooth Rate Control
The token bucket algorithm provides elegant rate limiting by allowing burst traffic while maintaining average rate control. I prefer this approach for APIs that experience irregular traffic patterns.
class TokenBucketRateLimiter
def initialize(capacity: 100, refill_rate: 10, redis: Redis.current)
@capacity = capacity
@refill_rate = refill_rate
@redis = redis
end
def allow_request?(client_id)
bucket_key = "token_bucket:#{client_id}"
now = Time.current.to_f
# Get current bucket state
bucket_data = @redis.hmget(bucket_key, 'tokens', 'last_refill')
current_tokens = bucket_data[0].to_f
last_refill = bucket_data[1].to_f
# Calculate tokens to add based on time elapsed
if last_refill > 0
time_elapsed = now - last_refill
tokens_to_add = time_elapsed * @refill_rate
current_tokens = [@capacity, current_tokens + tokens_to_add].min
else
current_tokens = @capacity
end
# Check if request can be allowed
if current_tokens >= 1
new_token_count = current_tokens - 1
@redis.multi do |pipeline|
pipeline.hmset(bucket_key, 'tokens', new_token_count, 'last_refill', now)
pipeline.expire(bucket_key, 3600)
end
set_bucket_headers(new_token_count)
true
else
set_bucket_headers(current_tokens)
false
end
end
private
def set_bucket_headers(tokens_remaining)
Thread.current[:rate_limit_headers] = {
'X-RateLimit-Bucket-Capacity' => @capacity.to_s,
'X-RateLimit-Bucket-Tokens' => tokens_remaining.round(2).to_s,
'X-RateLimit-Bucket-RefillRate' => @refill_rate.to_s
}
end
end
This implementation maintains smooth traffic flow by allowing clients to consume tokens during quiet periods and use them during burst activity. The refill rate ensures sustainable usage patterns while accommodating legitimate traffic spikes.
Sliding Window Rate Limiting for Precise Control
Sliding window rate limiting offers more precise control than fixed windows by considering the exact timing of requests. This pattern prevents clients from gaming the system by clustering requests at window boundaries.
class SlidingWindowRateLimiter
def initialize(limit: 1000, window_seconds: 3600, redis: Redis.current)
@limit = limit
@window_seconds = window_seconds
@redis = redis
end
def within_limit?(client_id, endpoint = 'default')
key = sliding_window_key(client_id, endpoint)
now = Time.current.to_f
window_start = now - @window_seconds
# Remove expired entries and count current requests
@redis.multi do |pipeline|
pipeline.zremrangebyscore(key, 0, window_start)
pipeline.zcard(key)
pipeline.expire(key, @window_seconds * 2)
end.then do |results|
current_count = results[1].to_i
if current_count < @limit
# Add current request to the window
@redis.zadd(key, now, "#{now}:#{SecureRandom.uuid}")
set_sliding_window_headers(current_count + 1, window_start)
true
else
set_sliding_window_headers(current_count, window_start)
false
end
end
end
def get_window_stats(client_id, endpoint = 'default')
key = sliding_window_key(client_id, endpoint)
now = Time.current.to_f
window_start = now - @window_seconds
# Clean expired entries first
@redis.zremrangebyscore(key, 0, window_start)
{
current_requests: @redis.zcard(key),
window_start: Time.at(window_start),
window_end: Time.at(now),
oldest_request: get_oldest_request_time(key),
request_distribution: get_request_distribution(key, window_start, now)
}
end
private
def sliding_window_key(client_id, endpoint)
"sliding_window:#{client_id}:#{endpoint.gsub('/', ':')}"
end
def set_sliding_window_headers(current_count, window_start)
Thread.current[:rate_limit_headers] = {
'X-RateLimit-Limit' => @limit.to_s,
'X-RateLimit-Remaining' => [@limit - current_count, 0].max.to_s,
'X-RateLimit-Window' => @window_seconds.to_s,
'X-RateLimit-WindowStart' => window_start.to_s
}
end
def get_oldest_request_time(key)
oldest = @redis.zrange(key, 0, 0, with_scores: true).first
oldest ? Time.at(oldest[1]) : nil
end
def get_request_distribution(key, window_start, window_end)
requests = @redis.zrangebyscore(key, window_start, window_end, with_scores: true)
# Group requests by minute for distribution analysis
distribution = Hash.new(0)
requests.each do |_, timestamp|
minute_bucket = (timestamp.to_i / 60) * 60
distribution[Time.at(minute_bucket)] += 1
end
distribution
end
end
The sliding window approach provides granular control over request patterns and helps identify traffic anomalies. I find this particularly useful for APIs that need to enforce strict quotas or detect potential abuse patterns.
Distributed Rate Limiting with Redis
When running multiple application instances, distributed rate limiting becomes crucial. Redis provides the shared state necessary for consistent rate limiting across all instances.
class DistributedRateLimiter
def initialize(redis: Redis.current, node_id: nil)
@redis = redis
@node_id = node_id || Socket.gethostname
@lua_script = load_rate_limit_script
end
def check_distributed_limit(client_id, limit: 100, window: 60, endpoint: 'api')
key_prefix = "distributed_rate_limit:#{endpoint}:#{client_id}"
# Use Lua script for atomic operations
result = @redis.eval(@lua_script, keys: [
"#{key_prefix}:count",
"#{key_prefix}:nodes",
"#{key_prefix}:reset_time"
], argv: [
limit,
window,
Time.current.to_i,
@node_id
])
current_count, reset_time, allowed = result
set_distributed_headers(current_count.to_i, limit, reset_time.to_i)
if allowed == 1
record_node_activity(client_id, endpoint)
true
else
record_rate_limit_breach(client_id, endpoint, current_count)
false
end
end
def get_cluster_stats(client_id, endpoint = 'api')
key_prefix = "distributed_rate_limit:#{endpoint}:#{client_id}"
nodes_data = @redis.hgetall("#{key_prefix}:nodes")
total_requests = @redis.get("#{key_prefix}:count").to_i
{
total_requests: total_requests,
active_nodes: nodes_data.keys.size,
node_distribution: nodes_data.transform_values(&:to_i),
cluster_health: calculate_cluster_health(nodes_data)
}
end
private
def load_rate_limit_script
<<~LUA
local count_key = KEYS[1]
local nodes_key = KEYS[2]
local reset_key = KEYS[3]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local node_id = ARGV[4]
-- Get or set reset time
local reset_time = redis.call('GET', reset_key)
if not reset_time then
reset_time = now + window
redis.call('SET', reset_key, reset_time, 'EX', window * 2)
else
reset_time = tonumber(reset_time)
end
-- Check if window has expired
if now >= reset_time then
redis.call('DEL', count_key, nodes_key)
reset_time = now + window
redis.call('SET', reset_key, reset_time, 'EX', window * 2)
end
-- Get current count
local current_count = tonumber(redis.call('GET', count_key) or 0)
-- Check if limit exceeded
if current_count >= limit then
return {current_count, reset_time, 0}
end
-- Increment counters
redis.call('INCR', count_key)
redis.call('HINCRBY', nodes_key, node_id, 1)
redis.call('EXPIRE', count_key, window * 2)
redis.call('EXPIRE', nodes_key, window * 2)
return {current_count + 1, reset_time, 1}
LUA
end
def set_distributed_headers(current, limit, reset_time)
Thread.current[:rate_limit_headers] = {
'X-RateLimit-Limit' => limit.to_s,
'X-RateLimit-Remaining' => [limit - current, 0].max.to_s,
'X-RateLimit-Reset' => reset_time.to_s,
'X-RateLimit-Node' => @node_id
}
end
def record_node_activity(client_id, endpoint)
activity_key = "node_activity:#{@node_id}:#{Date.current}"
@redis.hincrby(activity_key, "#{endpoint}:#{client_id}", 1)
@redis.expire(activity_key, 86400 * 7) # Keep for a week
end
def record_rate_limit_breach(client_id, endpoint, count)
breach_key = "rate_limit_breaches:#{Date.current}"
breach_data = {
client_id: client_id,
endpoint: endpoint,
count: count,
node: @node_id,
timestamp: Time.current.iso8601
}
@redis.lpush(breach_key, breach_data.to_json)
@redis.expire(breach_key, 86400 * 30) # Keep for a month
end
def calculate_cluster_health(nodes_data)
return 1.0 if nodes_data.empty?
values = nodes_data.values.map(&:to_i)
avg = values.sum.to_f / values.size
max_deviation = values.map { |v| (v - avg).abs }.max
# Health score based on request distribution evenness
max_deviation == 0 ? 1.0 : [1.0 - (max_deviation / avg), 0.0].max
end
end
This distributed approach ensures consistent rate limiting regardless of which server handles the request. The Lua script guarantees atomic operations, preventing race conditions that could allow rate limit bypasses.
Per-User and Per-Endpoint Quota Management
Different API endpoints require different rate limiting strategies. Administrative endpoints need stricter limits than read-only operations, while premium users might deserve higher quotas.
class QuotaManager
def initialize(redis: Redis.current)
@redis = redis
@default_quotas = load_default_quotas
end
def check_quota(user_id, endpoint, user_tier: 'basic')
user_quota = get_user_quota(user_id, endpoint, user_tier)
endpoint_quota = get_endpoint_quota(endpoint, user_tier)
# Use the more restrictive quota
active_quota = [user_quota, endpoint_quota].min
quota_key = "quota:#{user_id}:#{normalized_endpoint(endpoint)}"
period_key = get_current_period_key
current_usage = @redis.hget(quota_key, period_key).to_i
if current_usage >= active_quota[:limit]
set_quota_headers(current_usage, active_quota, period_key)
record_quota_exhaustion(user_id, endpoint, user_tier)
return false
end
new_usage = @redis.hincrby(quota_key, period_key, 1)
@redis.expire(quota_key, active_quota[:period] * 2)
set_quota_headers(new_usage, active_quota, period_key)
true
end
def get_quota_status(user_id, endpoint, user_tier: 'basic')
user_quota = get_user_quota(user_id, endpoint, user_tier)
endpoint_quota = get_endpoint_quota(endpoint, user_tier)
active_quota = [user_quota, endpoint_quota].min
quota_key = "quota:#{user_id}:#{normalized_endpoint(endpoint)}"
period_key = get_current_period_key
current_usage = @redis.hget(quota_key, period_key).to_i
{
limit: active_quota[:limit],
used: current_usage,
remaining: [active_quota[:limit] - current_usage, 0].max,
period: active_quota[:period],
resets_at: calculate_reset_time(active_quota[:period]),
quota_source: active_quota[:source],
user_tier: user_tier
}
end
def bulk_quota_check(user_id, endpoints, user_tier: 'basic')
results = {}
endpoints.each do |endpoint|
results[endpoint] = {
allowed: check_quota(user_id, endpoint, user_tier: user_tier),
status: get_quota_status(user_id, endpoint, user_tier: user_tier)
}
end
results
end
def reset_user_quota(user_id, endpoint = nil)
if endpoint
quota_key = "quota:#{user_id}:#{normalized_endpoint(endpoint)}"
@redis.del(quota_key)
else
# Reset all quotas for user
pattern = "quota:#{user_id}:*"
keys = @redis.keys(pattern)
@redis.del(*keys) unless keys.empty?
end
record_quota_reset(user_id, endpoint)
end
private
def load_default_quotas
{
'basic' => {
'GET' => { limit: 1000, period: 3600 },
'POST' => { limit: 100, period: 3600 },
'PUT' => { limit: 100, period: 3600 },
'DELETE' => { limit: 50, period: 3600 },
'admin' => { limit: 10, period: 3600 }
},
'premium' => {
'GET' => { limit: 5000, period: 3600 },
'POST' => { limit: 1000, period: 3600 },
'PUT' => { limit: 1000, period: 3600 },
'DELETE' => { limit: 500, period: 3600 },
'admin' => { limit: 100, period: 3600 }
},
'enterprise' => {
'GET' => { limit: 20000, period: 3600 },
'POST' => { limit: 5000, period: 3600 },
'PUT' => { limit: 5000, period: 3600 },
'DELETE' => { limit: 2000, period: 3600 },
'admin' => { limit: 1000, period: 3600 }
}
}
end
def get_user_quota(user_id, endpoint, user_tier)
# Check for custom user quotas first
custom_quota = @redis.hget("custom_quotas:#{user_id}", normalized_endpoint(endpoint))
if custom_quota
parsed_quota = JSON.parse(custom_quota, symbolize_names: true)
parsed_quota[:source] = 'custom'
return parsed_quota
end
# Fall back to tier-based quotas
method_type = extract_method_type(endpoint)
quota = @default_quotas.dig(user_tier, method_type) || @default_quotas.dig('basic', 'GET')
quota[:source] = 'tier'
quota
end
def get_endpoint_quota(endpoint, user_tier)
# Check for specific endpoint overrides
endpoint_override = @redis.hget("endpoint_quotas:#{user_tier}", normalized_endpoint(endpoint))
if endpoint_override
parsed_quota = JSON.parse(endpoint_override, symbolize_names: true)
parsed_quota[:source] = 'endpoint'
return parsed_quota
end
# Return a high default to not interfere with user quotas
{ limit: Float::INFINITY, period: 3600, source: 'default' }
end
def normalized_endpoint(endpoint)
endpoint.gsub(/\/\d+/, '/:id').downcase
end
def extract_method_type(endpoint)
case endpoint
when /admin|manage|delete/i
'admin'
when /create|post|update|put/i
'POST'
when /delete/i
'DELETE'
else
'GET'
end
end
def get_current_period_key
Time.current.to_i / 3600 # Hourly periods
end
def calculate_reset_time(period)
current_period = get_current_period_key
Time.at((current_period + 1) * period)
end
def set_quota_headers(usage, quota, period_key)
Thread.current[:quota_headers] = {
'X-Quota-Limit' => quota[:limit].to_s,
'X-Quota-Used' => usage.to_s,
'X-Quota-Remaining' => [quota[:limit] - usage, 0].max.to_s,
'X-Quota-Period' => quota[:period].to_s,
'X-Quota-Source' => quota[:source]
}
end
def record_quota_exhaustion(user_id, endpoint, user_tier)
exhaustion_key = "quota_exhaustions:#{Date.current}"
exhaustion_data = {
user_id: user_id,
endpoint: endpoint,
user_tier: user_tier,
timestamp: Time.current.iso8601
}
@redis.lpush(exhaustion_key, exhaustion_data.to_json)
@redis.expire(exhaustion_key, 86400 * 7) # Keep for a week
end
def record_quota_reset(user_id, endpoint)
Rails.logger.info("Quota reset: user=#{user_id}, endpoint=#{endpoint}")
end
end
This quota system provides flexible rate limiting based on user tiers and endpoint sensitivity. It allows for custom quotas while maintaining reasonable defaults for different user types.
Rate Limit Headers and Client Communication
Clear communication with API clients prevents confusion and helps developers integrate properly. Standard rate limit headers provide essential information about current limits and remaining capacity.
class RateLimitHeaders
STANDARD_HEADERS = %w[
X-RateLimit-Limit
X-RateLimit-Remaining
X-RateLimit-Reset
X-RateLimit-RetryAfter
].freeze
def initialize(response)
@response = response
end
def apply_rate_limit_headers(limit_info)
set_basic_headers(limit_info)
set_advanced_headers(limit_info) if limit_info[:advanced]
set_debug_headers(limit_info) if Rails.env.development?
@response
end
def apply_quota_headers(quota_info)
quota_info.each do |key, value|
header_name = format_header_name(key)
@response.headers[header_name] = value.to_s
end
@response
end
def set_rate_limit_exceeded_headers(limit_info)
@response.status = 429
@response.headers['X-RateLimit-Limit'] = limit_info[:limit].to_s
@response.headers['X-RateLimit-Remaining'] = '0'
@response.headers['X-RateLimit-Reset'] = limit_info[:reset_time].to_s
@response.headers['Retry-After'] = calculate_retry_after(limit_info[:reset_time]).to_s
# Add helpful error message
error_body = {
error: 'Rate limit exceeded',
message: generate_helpful_message(limit_info),
retry_after: calculate_retry_after(limit_info[:reset_time]),
documentation_url: 'https://api.example.com/docs/rate-limits'
}
@response.body = error_body.to_json
@response.headers['Content-Type'] = 'application/json'
@response
end
private
def set_basic_headers(limit_info)
@response.headers['X-RateLimit-Limit'] = limit_info[:limit].to_s
@response.headers['X-RateLimit-Remaining'] = limit_info[:remaining].to_s
@response.headers['X-RateLimit-Reset'] = limit_info[:reset_time].to_s
if limit_info[:remaining].to_i <= 0
@response.headers['Retry-After'] = calculate_retry_after(limit_info[:reset_time]).to_s
end
end
def set_advanced_headers(limit_info)
@response.headers['X-RateLimit-Policy'] = limit_info[:policy] if limit_info[:policy]
@response.headers['X-RateLimit-Scope'] = limit_info[:scope] if limit_info[:scope]
if limit_info[:burst_available]
@response.headers['X-RateLimit-Burst-Capacity'] = limit_info[:burst_capacity].to_s
@response.headers['X-RateLimit-Burst-Remaining'] = limit_info[:burst_remaining].to_s
end
if limit_info[:global_limit]
@response.headers['X-RateLimit-Global-Limit'] = limit_info[:global_limit].to_s
@response.headers['X-RateLimit-Global-Remaining'] = limit_info[:global_remaining].to_s
end
end
def set_debug_headers(limit_info)
@response.headers['X-RateLimit-Debug-Window'] = limit_info[:window_size].to_s if limit_info[:window_size]
@response.headers['X-RateLimit-Debug-Algorithm'] = limit_info[:algorithm] if limit_info[:algorithm]
@response.headers['X-RateLimit-Debug-Node'] = Socket.gethostname
@response.headers['X-RateLimit-Debug-Timestamp'] = Time.current.iso8601
end
def format_header_name(key)
key.to_s.split('_').map(&:capitalize).join('-').prepend('X-')
end
def calculate_retry_after(reset_time)
[reset_time.to_i - Time.current.to_i, 1].max
end
def generate_helpful_message(limit_info)
base_message = "Rate limit of #{limit_info[:limit]} requests exceeded."
retry_seconds = calculate_retry_after(limit_info[:reset_time])
time_message = if retry_seconds < 60
"Try again in #{retry_seconds} seconds."
elsif retry_seconds < 3600
"Try again in #{retry_seconds / 60} minutes."
else
"Try again in #{retry_seconds / 3600} hours."
end
"#{base_message} #{time_message}"
end
end
# Middleware to automatically apply headers
class RateLimitHeadersMiddleware
def initialize(app)
@app = app
end
def call(env)
status, headers, response = @app.call(env)
# Apply any rate limit headers set by controllers
if Thread.current[:rate_limit_headers]
Thread.current[:rate_limit_headers].each do |key, value|
headers[key] = value
end
Thread.current[:rate_limit_headers] = nil
end
if Thread.current[:quota_headers]
Thread.current[:quota_headers].each do |key, value|
headers[key] = value
end
Thread.current[:quota_headers] = nil
end
[status, headers, response]
end
end
Proper header communication transforms rate limiting from a frustrating black box into a transparent system that developers can work with effectively.
Burst Handling Mechanisms
Real-world traffic patterns include sudden spikes that legitimate users generate. Effective burst handling allows these spikes while preventing sustained abuse.
class BurstHandler
def initialize(redis: Redis.current)
@redis = redis
end
def handle_burst_request(client_id, endpoint, base_limit: 100, burst_multiplier: 2.0, burst_window: 300)
burst_key = "burst:#{client_id}:#{endpoint}"
base_key = "base:#{client_id}:#{endpoint}"
current_time = Time.current.to_i
base_window_start = current_time - 3600 # 1 hour base window
burst_window_start = current_time - burst_window
# Check base rate compliance
base_usage = get_usage_in_window(base_key, base_window_start, current_time)
if base_usage >= base_limit
return {
allowed: false,
reason: 'base_limit_exceeded',
base_usage: base_usage,
base_limit: base_limit
}
end
# Check burst allowance
burst_limit = calculate_dynamic_burst_limit(client_id, base_limit, burst_multiplier)
burst_usage = get_usage_in_window(burst_key, burst_window_start, current_time)
if burst_usage >= burst_limit
return {
allowed: false,
reason: 'burst_limit_exceeded',
burst_usage: burst_usage,
burst_limit: burst_limit,
burst_window: burst_window
}
end
# Allow request and record usage
record_request(base_key, current_time, 3600)
record_request(burst_key, current_time, burst_window * 2)
{
allowed: true,
base_usage: base_usage + 1,
base_limit: base_limit,
burst_usage: burst_usage + 1,
burst_limit: burst_limit,
burst_capacity_remaining: burst_limit - burst_usage - 1
}
end
def analyze_burst_patterns(client_id, days: 7)
end_time = Time.current.to_i
start_time = end_time - (days * 86400)
pattern_key = "burst_analysis:#{client_id}"
# Collect burst events from the specified period
burst_events = @redis.zrangebyscore(
"burst_history:#{client_id}",
start_time,
end_time,
with_scores: true
)
analyze_burst_frequency(burst_events, days)
end
def adjust_burst_limits_for_client(client_id, adjustment_factor: 1.0)
current_limits = @redis.hgetall("burst_limits:#{client_id}")
current_limits.each do |endpoint, limit|
new_limit = (limit.to_f * adjustment_factor).round
@redis.hset("burst_limits:#{client_id}", endpoint, new_limit)
end
record_burst_adjustment(client_id, adjustment_factor)
end
private
def get_usage_in_window(key, window_start, window_end)
@redis.zcount(key, window_start, window_end)
end
def record_request(key, timestamp, ttl)
@redis.multi do |pipeline|
pipeline.zadd(key, timestamp, "#{timestamp}:#{SecureRandom.uuid}")
pipeline.expire(key, ttl)
end
end
def calculate_dynamic_burst_limit(client_id, base_limit, burst_multiplier)
# Check client's burst history
burst_violations = @redis.get("burst_violations:#{client_id}").to_i
client_trust_score = calculate_trust_score(client_id)
# Adjust burst multiplier based on trust and violations
adjusted_multiplier = if burst_violations > 5
[burst_multiplier * 0.5, 1.0].max
elsif client_trust_score > 0.8
burst_multiplier * 1.2
else
burst_multiplier
end
(base_limit * adjusted_multiplier).round
end
def calculate_trust_score(client_id)
# Factors influencing trust score
violation_count = @redis.get("total_violations:#{client_id}").to_i
account_age_days = get_account_age_days(client_id)
successful_requests = @redis.get("successful_requests:#{client_id}").to_i
# Simple trust score calculation
base_score = 0.5
# Age bonus (up to 0.2)
age_bonus = [account_age_days / 365.0 * 0.2, 0.2].min
# Success rate bonus (up to 0.3)
total_requests = successful_requests + violation_count
success_rate = total_requests > 0 ? successful_requests.to_f / total_requests : 0
success_bonus = success_rate * 0.3