Production Rails applications require robust monitoring to maintain performance and reliability. I’ve implemented numerous monitoring systems throughout my career, and these twelve practices have consistently proven essential for keeping applications running smoothly.
Understanding Ruby Metrics Collection
Metrics collection is the foundation of application monitoring. In Rails applications, we gather data about application performance, resource usage, and user behavior to identify issues before they affect users.
Effective metrics provide visibility into what’s happening inside your application. They help you respond to incidents faster, plan capacity, and make informed optimization decisions.
# Basic metrics collector using StatsD
class MetricsCollector
def initialize
@statsd = Datadog::Statsd.new('localhost', 8125, tags: ["app:#{Rails.application.class.module_parent_name.downcase}"])
end
def increment(metric, tags = {})
@statsd.increment(metric, tags: tags_array(tags))
end
def timing(metric, milliseconds, tags = {})
@statsd.timing(metric, milliseconds, tags: tags_array(tags))
end
def gauge(metric, value, tags = {})
@statsd.gauge(metric, value, tags: tags_array(tags))
end
private
def tags_array(tags)
tags.map { |k, v| "#{k}:#{v}" }
end
end
Request Performance Monitoring
Tracking request performance is critical for understanding user experience. I’ve found that monitoring endpoint response times reveals bottlenecks and helps prioritize optimization efforts.
Adding instrumentation to track request durations can be done with a simple middleware:
class RequestTimingMiddleware
def initialize(app)
@app = app
@metrics = MetricsCollector.new
end
def call(env)
start_time = Time.now
status, headers, response = @app.call(env)
duration = (Time.now - start_time) * 1000
controller = env['action_dispatch.request.parameters']['controller']
action = env['action_dispatch.request.parameters']['action']
if controller && action
@metrics.timing(
'request.duration',
duration.round,
controller: controller,
action: action
)
end
[status, headers, response]
end
end
Add this middleware to your Rails application config:
# config/application.rb
config.middleware.insert_before 0, RequestTimingMiddleware
Database Query Monitoring
Slow database queries often cause application performance issues. Tracking query times helps identify problematic database operations.
I like using ActiveSupport notifications to track database performance:
# config/initializers/query_tracking.rb
ActiveSupport::Notifications.subscribe('sql.active_record') do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
payload = event.payload
# Skip schema and transaction queries
next if payload[:name] =~ /SCHEMA/ || payload[:name] =~ /TRANSACTION/
duration = event.duration
metrics = MetricsCollector.new
metrics.timing(
'database.query.duration',
duration,
name: payload[:name],
statement_name: payload[:statement_name]
)
# Log slow queries for investigation
if duration > 500
Rails.logger.warn("[SLOW QUERY] (#{duration.round}ms) #{payload[:sql][0..100]}")
end
end
Background Job Monitoring
Background jobs can silently fail without proper monitoring. I track job execution times, failure rates, and queue depths to ensure my background processing systems work efficiently.
For Sidekiq, I’ve implemented the following monitoring approach:
# app/workers/application_worker.rb
class ApplicationWorker
include Sidekiq::Worker
def perform_with_metrics(*args)
metrics = MetricsCollector.new
start_time = Time.now
begin
metrics.increment('worker.start', class: self.class.name)
yield
metrics.increment('worker.success', class: self.class.name)
rescue => e
metrics.increment('worker.failure', class: self.class.name, error: e.class.name)
raise
ensure
duration = (Time.now - start_time) * 1000
metrics.timing('worker.duration', duration.round, class: self.class.name)
end
end
end
# Example worker using the monitoring wrapper
class ProductImportWorker < ApplicationWorker
def perform(product_id)
perform_with_metrics do
# Actual import logic
product = Product.find(product_id)
ProductImporter.new(product).import
end
end
end
Memory Usage Monitoring
Memory leaks and bloat can degrade performance over time. I monitor process memory usage to detect leaks and optimize garbage collection.
This periodic reporter helps collect memory metrics:
# lib/memory_reporter.rb
class MemoryReporter
def self.start
Thread.new do
metrics = MetricsCollector.new
loop do
memory_stats = get_memory_stats
metrics.gauge('memory.rss', memory_stats[:rss])
metrics.gauge('memory.heap_live_slots', memory_stats[:heap_live_slots])
metrics.gauge('memory.heap_free_slots', memory_stats[:heap_free_slots])
sleep 60 # Report every minute
end
end
end
def self.get_memory_stats
stats = {}
# Get RSS from OS
stats[:rss] = `ps -o rss= -p #{Process.pid}`.to_i / 1024.0 # Convert to MB
# Get Ruby GC stats
gc_stats = GC.stat
stats[:heap_live_slots] = gc_stats[:heap_live_slots]
stats[:heap_free_slots] = gc_stats[:heap_free_slots]
stats
end
end
# In an initializer:
# config/initializers/memory_reporter.rb
Rails.application.config.after_initialize do
MemoryReporter.start if Rails.env.production?
end
API Endpoint Performance
For applications with APIs, I track endpoint performance separately from regular web requests because they often have different performance characteristics and SLAs.
module ApiMetrics
extend ActiveSupport::Concern
included do
around_action :track_api_metrics
end
private
def track_api_metrics
metrics = MetricsCollector.new
start_time = Time.now
begin
yield
ensure
duration = (Time.now - start_time) * 1000
metrics.timing(
'api.request.duration',
duration.round,
endpoint: "#{controller_name}##{action_name}",
status: response.status
)
metrics.increment(
'api.request.count',
endpoint: "#{controller_name}##{action_name}",
status: response.status
)
end
end
end
# In API controllers:
class Api::V1::BaseController < ApplicationController
include ApiMetrics
# Other API controller setup
end
Error Rate Monitoring
Error rates reveal application health issues. I’ve found that tracking errors by type helps prioritize fixes and monitor the impact of deployments.
# config/initializers/error_tracking.rb
Rails.application.config.middleware.use(
ExceptionNotification::Rack,
email: {
email_prefix: '[ERROR] ',
sender_address: %{"Error Notifier" <[email protected]>},
exception_recipients: %w{[email protected]}
}
)
# Custom error tracker
module ErrorTracking
def self.track(exception, context = {})
metrics = MetricsCollector.new
metrics.increment(
'error',
type: exception.class.name,
component: context[:component] || 'unknown'
)
# Additional context for error tracking services
Sentry.capture_exception(exception, extra: context)
Rails.logger.error("[ERROR] #{exception.class}: #{exception.message}\n#{exception.backtrace.join("\n")}")
end
end
# Usage in application code
begin
# Risky operation
rescue => e
ErrorTracking.track(e, component: 'payment_processor')
raise # Re-raise if needed
end
Cache Hit Ratio Monitoring
Cache efficiency significantly impacts application performance. Monitoring cache hit ratios helps optimize caching strategies.
module CacheMonitoring
class Store < ActiveSupport::Cache::Store
def initialize(store)
@store = store
@metrics = MetricsCollector.new
end
def read(name, options = nil)
@metrics.increment('cache.read.attempt', cache_type: @store.class.name)
value = @store.read(name, options)
if value.nil?
@metrics.increment('cache.read.miss', cache_type: @store.class.name)
else
@metrics.increment('cache.read.hit', cache_type: @store.class.name)
end
value
end
def write(name, value, options = nil)
@metrics.increment('cache.write', cache_type: @store.class.name)
@store.write(name, value, options)
end
def delete(name, options = nil)
@metrics.increment('cache.delete', cache_type: @store.class.name)
@store.delete(name, options)
end
def method_missing(method, *args, &block)
@store.send(method, *args, &block)
end
end
end
# In an initializer:
# config/initializers/cache_monitoring.rb
Rails.application.config.after_initialize do
Rails.cache = CacheMonitoring::Store.new(Rails.cache)
end
Throughput Measurement
Understanding application throughput helps with capacity planning. I track requests per minute across different endpoints to identify traffic patterns.
# config/initializers/throughput_metrics.rb
# This uses the request timing middleware we defined earlier
# but adds specific throughput tracking
class ThroughputTracker
def initialize
@metrics = MetricsCollector.new
@last_minute = Time.now.beginning_of_minute
@counters = Hash.new(0)
start_reporting
end
def track_request(controller, action)
key = "#{controller}##{action}"
@counters[key] += 1
@counters['total'] += 1
end
private
def start_reporting
Thread.new do
loop do
# Sleep until the next minute
sleep_until_next_minute
# Report previous minute's counters
current_minute = Time.now.beginning_of_minute
report_counters(current_minute - 60)
# Reset counters
@counters = Hash.new(0)
end
end
end
def sleep_until_next_minute
now = Time.now
next_minute = (now + 60).beginning_of_minute
sleep_time = (next_minute - now).to_i
sleep(sleep_time)
end
def report_counters(timestamp)
@counters.each do |key, count|
@metrics.gauge(
'requests_per_minute',
count,
endpoint: key,
timestamp: timestamp.to_i
)
end
end
end
# Initialize in an initializer
Rails.application.config.after_initialize do
$throughput_tracker = ThroughputTracker.new if Rails.env.production?
end
# Modify the request middleware to use this
class RequestTimingMiddleware
# ... existing code ...
def call(env)
# ... existing timing code ...
if controller && action && $throughput_tracker
$throughput_tracker.track_request(controller, action)
end
[status, headers, response]
end
end
Log Aggregation and Analysis
Logs provide context for metrics. I’ve found that structured logging combined with a good aggregation system makes troubleshooting much easier.
# config/initializers/logging.rb
class JsonLogger < ActiveSupport::Logger
def initialize(*args)
super
@formatter = JsonFormatter.new
end
class JsonFormatter < ActiveSupport::Logger::Formatter
def call(severity, timestamp, progname, msg)
payload = {
severity: severity,
timestamp: timestamp.utc.iso8601(3),
pid: Process.pid
}
case msg
when String
payload[:message] = msg
when Exception
payload[:error] = {
class: msg.class.name,
message: msg.message,
backtrace: msg.backtrace
}
else
payload.merge!(msg) if msg.is_a?(Hash)
end
"#{payload.to_json}\n"
end
end
end
# Configure Rails to use structured logging
Rails.application.configure do
config.log_formatter = JsonLogger::JsonFormatter.new
end
# Enhanced logging helper
module StructuredLogging
def self.info(message, context = {})
Rails.logger.info(context.merge(message: message))
end
def self.error(message, context = {})
Rails.logger.error(context.merge(message: message))
end
def self.warn(message, context = {})
Rails.logger.warn(context.merge(message: message))
end
def self.debug(message, context = {})
Rails.logger.debug(context.merge(message: message))
end
end
# Usage
StructuredLogging.info("User signed up", user_id: user.id, plan: user.plan)
Resource Utilization Tracking
Tracking CPU, memory, and disk usage helps detect performance bottlenecks. I’ve implemented resource tracking that works well for Ruby applications:
# lib/resource_tracker.rb
class ResourceTracker
def self.start
Thread.new do
metrics = MetricsCollector.new
loop do
# Track process CPU usage
process_cpu = process_cpu_percent
metrics.gauge('process.cpu_percent', process_cpu)
# Track Ruby VM stats
gc_stats = GC.stat
metrics.gauge('ruby.gc.total_allocated_objects', gc_stats[:total_allocated_objects])
metrics.gauge('ruby.gc.total_freed_objects', gc_stats[:total_freed_objects])
metrics.gauge('ruby.gc.count', gc_stats[:count])
# Track file descriptors
fd_count = file_descriptor_count
metrics.gauge('process.file_descriptors', fd_count)
# Track thread count
thread_count = Thread.list.count
metrics.gauge('process.thread_count', thread_count)
sleep 30 # Report every 30 seconds
end
end
end
def self.process_cpu_percent
# Simple CPU calculation using process stats
# Note: More accurate implementations would use CPU time deltas
process_info = `ps -o %cpu= -p #{Process.pid}`.strip.to_f
process_info
end
def self.file_descriptor_count
# Count open file descriptors for this process
# Works on Linux
begin
Dir.glob("/proc/#{Process.pid}/fd/*").count
rescue
# Fall back on lsof for macOS/BSD
`lsof -p #{Process.pid} | wc -l`.to_i
end
end
end
# Initialize in an initializer
Rails.application.config.after_initialize do
ResourceTracker.start if Rails.env.production?
end
Custom Health Checks
Health checks are vital for load balancers and containers. I implement comprehensive health checks that verify all application dependencies.
# app/controllers/health_controller.rb
class HealthController < ActionController::Base
def index
checks = {
database: check_database,
redis: check_redis,
sidekiq: check_sidekiq,
cache: check_cache,
disk_space: check_disk_space
}
# Calculate overall status - we're only as healthy as our weakest component
overall_status = checks.values.all? { |c| c[:status] == 'ok' } ? 'ok' : 'error'
response_body = {
status: overall_status,
checks: checks,
timestamp: Time.now.utc.iso8601,
version: Rails.application.config.version
}
if overall_status == 'ok'
render json: response_body
else
render json: response_body, status: :service_unavailable
end
end
private
def check_database
ActiveRecord::Base.connection.execute("SELECT 1")
{ status: 'ok' }
rescue => e
{ status: 'error', message: e.message }
end
def check_redis
result = Sidekiq.redis { |conn| conn.ping }
{ status: result == 'PONG' ? 'ok' : 'error' }
rescue => e
{ status: 'error', message: e.message }
end
def check_sidekiq
ps = Sidekiq::ProcessSet.new
{ status: 'ok', workers: ps.size }
rescue => e
{ status: 'error', message: e.message }
end
def check_cache
test_key = "health_check_#{SecureRandom.hex(10)}"
test_value = SecureRandom.hex
Rails.cache.write(test_key, test_value, expires_in: 1.minute)
read_value = Rails.cache.read(test_key)
{ status: test_value == read_value ? 'ok' : 'error' }
rescue => e
{ status: 'error', message: e.message }
end
def check_disk_space
stat = Sys::Filesystem.stat(Rails.root)
gb_available = stat.block_size * stat.blocks_available / 1024.0 / 1024.0 / 1024.0
# Alert if less than 5GB available
{
status: gb_available < 5 ? 'warning' : 'ok',
available_gb: gb_available.round(2)
}
rescue => e
{ status: 'error', message: e.message }
end
end
# routes.rb
Rails.application.routes.draw do
get '/health' => 'health#index'
get '/health/basic' => 'health#basic'
end
Bringing It All Together
I’ve found that integrating these monitoring practices requires a systematic approach. Start with the most critical metrics for your application, gradually adding more as needed.
To tie it all together, implement a centralized metrics interface that standardizes how metrics are collected and reported:
# lib/application_monitoring.rb
module ApplicationMonitoring
class << self
def configure
yield(configuration)
setup_integrations if configuration.enabled
end
def configuration
@configuration ||= Configuration.new
end
def track_request(controller, action, &block)
return yield unless configuration.enabled
start_time = Time.now
result = yield
duration = (Time.now - start_time) * 1000
metrics.timing(
'request.duration',
duration.round,
controller: controller,
action: action
)
result
end
def track_method(class_name, method_name, &block)
return yield unless configuration.enabled
start_time = Time.now
result = yield
duration = (Time.now - start_time) * 1000
metrics.timing(
'method.duration',
duration.round,
class: class_name,
method: method_name
)
result
end
def metrics
@metrics ||= MetricsCollector.new
end
private
def setup_integrations
if configuration.track_memory
require_relative 'memory_reporter'
MemoryReporter.start
end
if configuration.track_resources
require_relative 'resource_tracker'
ResourceTracker.start
end
# Set up other integrations
end
end
class Configuration
attr_accessor :enabled, :track_memory, :track_resources
def initialize
@enabled = false
@track_memory = false
@track_resources = false
end
end
end
# In an initializer
# config/initializers/monitoring.rb
ApplicationMonitoring.configure do |config|
config.enabled = Rails.env.production?
config.track_memory = true
config.track_resources = true
end
Monitoring is a continuous process, not a one-time setup. I regularly review my metrics to ensure they’re providing valuable information and adjust my monitoring strategy as application needs evolve.
By implementing these twelve monitoring practices, I’ve been able to maintain healthy, performant Ruby on Rails applications in production. Most importantly, I catch issues before users report them, which is the ultimate goal of any monitoring system.