12 Essential Monitoring Practices for Production Rails Applications

ruby

12 Essential Monitoring Practices for Production Rails Applications

Discover 12 essential Ruby on Rails monitoring practices for robust production environments. Learn how to track performance, database queries, and resources to maintain reliable applications and prevent issues before they impact users.

May 21, 2025

12 Essential Monitoring Practices for Production Rails Applications

Production Rails applications require robust monitoring to maintain performance and reliability. I’ve implemented numerous monitoring systems throughout my career, and these twelve practices have consistently proven essential for keeping applications running smoothly.

Understanding Ruby Metrics Collection

Metrics collection is the foundation of application monitoring. In Rails applications, we gather data about application performance, resource usage, and user behavior to identify issues before they affect users.

Effective metrics provide visibility into what’s happening inside your application. They help you respond to incidents faster, plan capacity, and make informed optimization decisions.

# Basic metrics collector using StatsD
class MetricsCollector
  def initialize
    @statsd = Datadog::Statsd.new('localhost', 8125, tags: ["app:#{Rails.application.class.module_parent_name.downcase}"])
  end

  def increment(metric, tags = {})
    @statsd.increment(metric, tags: tags_array(tags))
  end

  def timing(metric, milliseconds, tags = {})
    @statsd.timing(metric, milliseconds, tags: tags_array(tags))
  end

  def gauge(metric, value, tags = {})
    @statsd.gauge(metric, value, tags: tags_array(tags))
  end

  private

  def tags_array(tags)
    tags.map { |k, v| "#{k}:#{v}" }
  end
end

Request Performance Monitoring

Tracking request performance is critical for understanding user experience. I’ve found that monitoring endpoint response times reveals bottlenecks and helps prioritize optimization efforts.

Adding instrumentation to track request durations can be done with a simple middleware:

class RequestTimingMiddleware
  def initialize(app)
    @app = app
    @metrics = MetricsCollector.new
  end

  def call(env)
    start_time = Time.now
    status, headers, response = @app.call(env)
    duration = (Time.now - start_time) * 1000

    controller = env['action_dispatch.request.parameters']['controller']
    action = env['action_dispatch.request.parameters']['action']

    if controller && action
      @metrics.timing(
        'request.duration', 
        duration.round, 
        controller: controller, 
        action: action
      )
    end

    [status, headers, response]
  end
end

Add this middleware to your Rails application config:

# config/application.rb
config.middleware.insert_before 0, RequestTimingMiddleware

Database Query Monitoring

Slow database queries often cause application performance issues. Tracking query times helps identify problematic database operations.

I like using ActiveSupport notifications to track database performance:

# config/initializers/query_tracking.rb
ActiveSupport::Notifications.subscribe('sql.active_record') do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)
  payload = event.payload
  
  # Skip schema and transaction queries
  next if payload[:name] =~ /SCHEMA/ || payload[:name] =~ /TRANSACTION/
  
  duration = event.duration
  metrics = MetricsCollector.new
  
  metrics.timing(
    'database.query.duration', 
    duration,
    name: payload[:name],
    statement_name: payload[:statement_name]
  )
  
  # Log slow queries for investigation
  if duration > 500
    Rails.logger.warn("[SLOW QUERY] (#{duration.round}ms) #{payload[:sql][0..100]}")
  end
end

Background Job Monitoring

Background jobs can silently fail without proper monitoring. I track job execution times, failure rates, and queue depths to ensure my background processing systems work efficiently.

For Sidekiq, I’ve implemented the following monitoring approach:

# app/workers/application_worker.rb
class ApplicationWorker
  include Sidekiq::Worker
  
  def perform_with_metrics(*args)
    metrics = MetricsCollector.new
    start_time = Time.now
    
    begin
      metrics.increment('worker.start', class: self.class.name)
      yield
      metrics.increment('worker.success', class: self.class.name)
    rescue => e
      metrics.increment('worker.failure', class: self.class.name, error: e.class.name)
      raise
    ensure
      duration = (Time.now - start_time) * 1000
      metrics.timing('worker.duration', duration.round, class: self.class.name)
    end
  end
end

# Example worker using the monitoring wrapper
class ProductImportWorker < ApplicationWorker
  def perform(product_id)
    perform_with_metrics do
      # Actual import logic
      product = Product.find(product_id)
      ProductImporter.new(product).import
    end
  end
end

Memory Usage Monitoring

Memory leaks and bloat can degrade performance over time. I monitor process memory usage to detect leaks and optimize garbage collection.

This periodic reporter helps collect memory metrics:

# lib/memory_reporter.rb
class MemoryReporter
  def self.start
    Thread.new do
      metrics = MetricsCollector.new
      
      loop do
        memory_stats = get_memory_stats
        
        metrics.gauge('memory.rss', memory_stats[:rss])
        metrics.gauge('memory.heap_live_slots', memory_stats[:heap_live_slots])
        metrics.gauge('memory.heap_free_slots', memory_stats[:heap_free_slots])
        
        sleep 60 # Report every minute
      end
    end
  end
  
  def self.get_memory_stats
    stats = {}
    
    # Get RSS from OS
    stats[:rss] = `ps -o rss= -p #{Process.pid}`.to_i / 1024.0 # Convert to MB
    
    # Get Ruby GC stats
    gc_stats = GC.stat
    stats[:heap_live_slots] = gc_stats[:heap_live_slots]
    stats[:heap_free_slots] = gc_stats[:heap_free_slots]
    
    stats
  end
end

# In an initializer:
# config/initializers/memory_reporter.rb
Rails.application.config.after_initialize do
  MemoryReporter.start if Rails.env.production?
end

API Endpoint Performance

For applications with APIs, I track endpoint performance separately from regular web requests because they often have different performance characteristics and SLAs.

module ApiMetrics
  extend ActiveSupport::Concern
  
  included do
    around_action :track_api_metrics
  end
  
  private
  
  def track_api_metrics
    metrics = MetricsCollector.new
    start_time = Time.now
    
    begin
      yield
    ensure
      duration = (Time.now - start_time) * 1000
      
      metrics.timing(
        'api.request.duration',
        duration.round,
        endpoint: "#{controller_name}##{action_name}",
        status: response.status
      )
      
      metrics.increment(
        'api.request.count',
        endpoint: "#{controller_name}##{action_name}",
        status: response.status
      )
    end
  end
end

# In API controllers:
class Api::V1::BaseController < ApplicationController
  include ApiMetrics
  # Other API controller setup
end

Error Rate Monitoring

Error rates reveal application health issues. I’ve found that tracking errors by type helps prioritize fixes and monitor the impact of deployments.

# config/initializers/error_tracking.rb
Rails.application.config.middleware.use(
  ExceptionNotification::Rack,
  email: {
    email_prefix: '[ERROR] ',
    sender_address: %{"Error Notifier" <[email protected]>},
    exception_recipients: %w{[email protected]}
  }
)

# Custom error tracker
module ErrorTracking
  def self.track(exception, context = {})
    metrics = MetricsCollector.new
    
    metrics.increment(
      'error',
      type: exception.class.name,
      component: context[:component] || 'unknown'
    )
    
    # Additional context for error tracking services
    Sentry.capture_exception(exception, extra: context)
    
    Rails.logger.error("[ERROR] #{exception.class}: #{exception.message}\n#{exception.backtrace.join("\n")}")
  end
end

# Usage in application code
begin
  # Risky operation
rescue => e
  ErrorTracking.track(e, component: 'payment_processor')
  raise # Re-raise if needed
end

Cache Hit Ratio Monitoring

Cache efficiency significantly impacts application performance. Monitoring cache hit ratios helps optimize caching strategies.

module CacheMonitoring
  class Store < ActiveSupport::Cache::Store
    def initialize(store)
      @store = store
      @metrics = MetricsCollector.new
    end
    
    def read(name, options = nil)
      @metrics.increment('cache.read.attempt', cache_type: @store.class.name)
      
      value = @store.read(name, options)
      
      if value.nil?
        @metrics.increment('cache.read.miss', cache_type: @store.class.name)
      else
        @metrics.increment('cache.read.hit', cache_type: @store.class.name)
      end
      
      value
    end
    
    def write(name, value, options = nil)
      @metrics.increment('cache.write', cache_type: @store.class.name)
      @store.write(name, value, options)
    end
    
    def delete(name, options = nil)
      @metrics.increment('cache.delete', cache_type: @store.class.name)
      @store.delete(name, options)
    end
    
    def method_missing(method, *args, &block)
      @store.send(method, *args, &block)
    end
  end
end

# In an initializer:
# config/initializers/cache_monitoring.rb
Rails.application.config.after_initialize do
  Rails.cache = CacheMonitoring::Store.new(Rails.cache)
end

Throughput Measurement

Understanding application throughput helps with capacity planning. I track requests per minute across different endpoints to identify traffic patterns.

# config/initializers/throughput_metrics.rb
# This uses the request timing middleware we defined earlier
# but adds specific throughput tracking

class ThroughputTracker
  def initialize
    @metrics = MetricsCollector.new
    @last_minute = Time.now.beginning_of_minute
    @counters = Hash.new(0)
    start_reporting
  end
  
  def track_request(controller, action)
    key = "#{controller}##{action}"
    @counters[key] += 1
    @counters['total'] += 1
  end
  
  private
  
  def start_reporting
    Thread.new do
      loop do
        # Sleep until the next minute
        sleep_until_next_minute
        
        # Report previous minute's counters
        current_minute = Time.now.beginning_of_minute
        report_counters(current_minute - 60)
        
        # Reset counters
        @counters = Hash.new(0)
      end
    end
  end
  
  def sleep_until_next_minute
    now = Time.now
    next_minute = (now + 60).beginning_of_minute
    sleep_time = (next_minute - now).to_i
    sleep(sleep_time)
  end
  
  def report_counters(timestamp)
    @counters.each do |key, count|
      @metrics.gauge(
        'requests_per_minute',
        count,
        endpoint: key,
        timestamp: timestamp.to_i
      )
    end
  end
end

# Initialize in an initializer
Rails.application.config.after_initialize do
  $throughput_tracker = ThroughputTracker.new if Rails.env.production?
end

# Modify the request middleware to use this
class RequestTimingMiddleware
  # ... existing code ...
  
  def call(env)
    # ... existing timing code ...
    
    if controller && action && $throughput_tracker
      $throughput_tracker.track_request(controller, action)
    end
    
    [status, headers, response]
  end
end

Log Aggregation and Analysis

Logs provide context for metrics. I’ve found that structured logging combined with a good aggregation system makes troubleshooting much easier.

# config/initializers/logging.rb
class JsonLogger < ActiveSupport::Logger
  def initialize(*args)
    super
    @formatter = JsonFormatter.new
  end
  
  class JsonFormatter < ActiveSupport::Logger::Formatter
    def call(severity, timestamp, progname, msg)
      payload = {
        severity: severity,
        timestamp: timestamp.utc.iso8601(3),
        pid: Process.pid
      }
      
      case msg
      when String
        payload[:message] = msg
      when Exception
        payload[:error] = {
          class: msg.class.name,
          message: msg.message,
          backtrace: msg.backtrace
        }
      else
        payload.merge!(msg) if msg.is_a?(Hash)
      end
      
      "#{payload.to_json}\n"
    end
  end
end

# Configure Rails to use structured logging
Rails.application.configure do
  config.log_formatter = JsonLogger::JsonFormatter.new
end

# Enhanced logging helper
module StructuredLogging
  def self.info(message, context = {})
    Rails.logger.info(context.merge(message: message))
  end
  
  def self.error(message, context = {})
    Rails.logger.error(context.merge(message: message))
  end
  
  def self.warn(message, context = {})
    Rails.logger.warn(context.merge(message: message))
  end
  
  def self.debug(message, context = {})
    Rails.logger.debug(context.merge(message: message))
  end
end

# Usage
StructuredLogging.info("User signed up", user_id: user.id, plan: user.plan)

Resource Utilization Tracking

Tracking CPU, memory, and disk usage helps detect performance bottlenecks. I’ve implemented resource tracking that works well for Ruby applications:

# lib/resource_tracker.rb
class ResourceTracker
  def self.start
    Thread.new do
      metrics = MetricsCollector.new
      
      loop do
        # Track process CPU usage
        process_cpu = process_cpu_percent
        metrics.gauge('process.cpu_percent', process_cpu)
        
        # Track Ruby VM stats
        gc_stats = GC.stat
        metrics.gauge('ruby.gc.total_allocated_objects', gc_stats[:total_allocated_objects])
        metrics.gauge('ruby.gc.total_freed_objects', gc_stats[:total_freed_objects])
        metrics.gauge('ruby.gc.count', gc_stats[:count])
        
        # Track file descriptors
        fd_count = file_descriptor_count
        metrics.gauge('process.file_descriptors', fd_count)
        
        # Track thread count
        thread_count = Thread.list.count
        metrics.gauge('process.thread_count', thread_count)
        
        sleep 30 # Report every 30 seconds
      end
    end
  end
  
  def self.process_cpu_percent
    # Simple CPU calculation using process stats
    # Note: More accurate implementations would use CPU time deltas
    process_info = `ps -o %cpu= -p #{Process.pid}`.strip.to_f
    process_info
  end
  
  def self.file_descriptor_count
    # Count open file descriptors for this process
    # Works on Linux
    begin
      Dir.glob("/proc/#{Process.pid}/fd/*").count
    rescue
      # Fall back on lsof for macOS/BSD
      `lsof -p #{Process.pid} | wc -l`.to_i
    end
  end
end

# Initialize in an initializer
Rails.application.config.after_initialize do
  ResourceTracker.start if Rails.env.production?
end

Custom Health Checks

Health checks are vital for load balancers and containers. I implement comprehensive health checks that verify all application dependencies.

# app/controllers/health_controller.rb
class HealthController < ActionController::Base
  def index
    checks = {
      database: check_database,
      redis: check_redis,
      sidekiq: check_sidekiq,
      cache: check_cache,
      disk_space: check_disk_space
    }
    
    # Calculate overall status - we're only as healthy as our weakest component
    overall_status = checks.values.all? { |c| c[:status] == 'ok' } ? 'ok' : 'error'
    
    response_body = {
      status: overall_status,
      checks: checks,
      timestamp: Time.now.utc.iso8601,
      version: Rails.application.config.version
    }
    
    if overall_status == 'ok'
      render json: response_body
    else
      render json: response_body, status: :service_unavailable
    end
  end
  
  private
  
  def check_database
    ActiveRecord::Base.connection.execute("SELECT 1")
    { status: 'ok' }
  rescue => e
    { status: 'error', message: e.message }
  end
  
  def check_redis
    result = Sidekiq.redis { |conn| conn.ping }
    { status: result == 'PONG' ? 'ok' : 'error' }
  rescue => e
    { status: 'error', message: e.message }
  end
  
  def check_sidekiq
    ps = Sidekiq::ProcessSet.new
    { status: 'ok', workers: ps.size }
  rescue => e
    { status: 'error', message: e.message }
  end
  
  def check_cache
    test_key = "health_check_#{SecureRandom.hex(10)}"
    test_value = SecureRandom.hex
    Rails.cache.write(test_key, test_value, expires_in: 1.minute)
    read_value = Rails.cache.read(test_key)
    
    { status: test_value == read_value ? 'ok' : 'error' }
  rescue => e
    { status: 'error', message: e.message }
  end
  
  def check_disk_space
    stat = Sys::Filesystem.stat(Rails.root)
    gb_available = stat.block_size * stat.blocks_available / 1024.0 / 1024.0 / 1024.0
    
    # Alert if less than 5GB available
    { 
      status: gb_available < 5 ? 'warning' : 'ok',
      available_gb: gb_available.round(2) 
    }
  rescue => e
    { status: 'error', message: e.message }
  end
end

# routes.rb
Rails.application.routes.draw do
  get '/health' => 'health#index'
  get '/health/basic' => 'health#basic'
end

Bringing It All Together

I’ve found that integrating these monitoring practices requires a systematic approach. Start with the most critical metrics for your application, gradually adding more as needed.

To tie it all together, implement a centralized metrics interface that standardizes how metrics are collected and reported:

# lib/application_monitoring.rb
module ApplicationMonitoring
  class << self
    def configure
      yield(configuration)
      setup_integrations if configuration.enabled
    end
    
    def configuration
      @configuration ||= Configuration.new
    end
    
    def track_request(controller, action, &block)
      return yield unless configuration.enabled
      
      start_time = Time.now
      result = yield
      duration = (Time.now - start_time) * 1000
      
      metrics.timing(
        'request.duration',
        duration.round,
        controller: controller,
        action: action
      )
      
      result
    end
    
    def track_method(class_name, method_name, &block)
      return yield unless configuration.enabled
      
      start_time = Time.now
      result = yield
      duration = (Time.now - start_time) * 1000
      
      metrics.timing(
        'method.duration',
        duration.round,
        class: class_name,
        method: method_name
      )
      
      result
    end
    
    def metrics
      @metrics ||= MetricsCollector.new
    end
    
    private
    
    def setup_integrations
      if configuration.track_memory
        require_relative 'memory_reporter'
        MemoryReporter.start
      end
      
      if configuration.track_resources
        require_relative 'resource_tracker'
        ResourceTracker.start
      end
      
      # Set up other integrations
    end
  end
  
  class Configuration
    attr_accessor :enabled, :track_memory, :track_resources
    
    def initialize
      @enabled = false
      @track_memory = false
      @track_resources = false
    end
  end
end

# In an initializer
# config/initializers/monitoring.rb
ApplicationMonitoring.configure do |config|
  config.enabled = Rails.env.production?
  config.track_memory = true
  config.track_resources = true
end

Monitoring is a continuous process, not a one-time setup. I regularly review my metrics to ensure they’re providing valuable information and adjust my monitoring strategy as application needs evolve.

By implementing these twelve monitoring practices, I’ve been able to maintain healthy, performant Ruby on Rails applications in production. Most importantly, I catch issues before users report them, which is the ultimate goal of any monitoring system.