As a Ruby developer working on production applications, I’ve learned that performance issues often surface under real-world conditions that are hard to simulate in development environments. Over the years, I’ve developed and refined several profiling techniques that provide actionable insights into application behavior. These methods help identify bottlenecks across different layers of the application stack.
Performance profiling in production requires careful consideration of overhead and data collection strategies. I prefer approaches that minimize impact on user experience while capturing enough detail to diagnose issues. The patterns I’ll share have proven effective across various Ruby applications, from monolithic Rails apps to specialized services.
Let me start with execution profiling, which forms the foundation of performance analysis. When I need to understand how specific code blocks perform, I use a custom profiler that captures both timing and memory metrics. This approach helps identify sections with significant performance impact.
class ExecutionProfiler
def initialize
@measurements = {}
end
def measure(label, &block)
start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
start_memory = memory_usage
result = yield
end_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
end_memory = memory_usage
duration = end_time - start_time
memory_delta = end_memory - start_memory
@measurements[label] = {
duration: duration,
memory_delta: memory_delta,
calls: (@measurements[label]&.dig(:calls) || 0) + 1
}
result
end
def report
@measurements.each do |label, data|
puts "#{label}: #{data[:duration].round(3)}s, #{data[:memory_delta]} bytes, #{data[:calls]} calls"
end
end
private
def memory_usage
`ps -o rss= -p #{Process.pid}`.to_i * 1024
end
end
profiler = ExecutionProfiler.new
profiler.measure("user_processing") do
User.all.each { |u| u.process_profile }
end
profiler.report
This profiler uses high-resolution timing for accurate measurements. The memory tracking helps spot sections with heavy allocation patterns. I often wrap critical paths with this profiler to gather baseline performance data before optimization.
Database performance frequently becomes a bottleneck in Ruby applications. I’ve found that query analysis provides immediate insights into database interaction patterns. By leveraging Rails’ instrumentation system, we can capture detailed query information.
class QueryAnalyzer
def initialize
@queries = []
setup_listeners
end
def setup_listeners
ActiveSupport::Notifications.subscribe("sql.active_record") do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
@queries << {
sql: event.payload[:sql],
duration: event.duration,
name: event.payload[:name],
binds: event.payload[:binds]
}
end
end
def analyze_queries(&block)
@queries.clear
result = yield
generate_report
result
end
def generate_report
total_time = @queries.sum { |q| q[:duration] }
puts "Total queries: #{@queries.size}"
puts "Total query time: #{total_time.round(2)}ms"
@queries.group_by { |q| q[:name] }.each do |name, group|
count = group.size
time = group.sum { |q| q[:duration] }
puts "#{name}: #{count} calls, #{time.round(2)}ms"
end
end
end
analyzer = QueryAnalyzer.new
analyzer.analyze_queries do
User.includes(:posts, :comments).where(active: true).each(&:process)
end
This analyzer groups queries by name to identify repetitive patterns. Duration tracking helps pinpoint slow database operations. I typically run this during peak traffic to understand real-world query performance.
Memory issues can be subtle and cumulative. I’ve developed memory profiling techniques that capture object allocation patterns during code execution. This helps identify memory growth sources and potential leaks.
class MemoryProfiler
def self.snapshot(label = "")
{
label: label,
timestamp: Time.now,
memory: current_memory,
object_counts: count_objects
}
end
def self.diff(snapshot1, snapshot2)
{
memory_delta: snapshot2[:memory] - snapshot1[:memory],
object_deltas: snapshot2[:object_counts].map do |type, count|
[type, count - snapshot1[:object_counts][type]]
end.to_h
}
end
def self.trace_execution(&block)
start_snapshot = snapshot("start")
result = yield
end_snapshot = snapshot("end")
diff_result = diff(start_snapshot, end_snapshot)
log_results(diff_result)
result
end
private
def self.current_memory
`ps -o rss= -p #{Process.pid}`.to_i
end
def self.count_objects
ObjectSpace.count_objects.select { |k, _| k =~ /T_/ }
end
end
MemoryProfiler.trace_execution do
1000.times { User.new(name: "test", email: "[email protected]") }
end
ObjectSpace integration provides detailed counts of different Ruby object types. Snapshot comparison highlights memory growth between code sections. I use this when investigating memory bloat in long-running processes.
Understanding request-level performance requires profiling entire request cycles. I implement this as Rack middleware to capture metrics across all application endpoints.
class RequestProfiler
def initialize(app)
@app = app
end
def call(env)
start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
start_memory = memory_usage
status, headers, response = @app.call(env)
end_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
end_memory = memory_usage
record_metrics(
env['PATH_INFO'],
end_time - start_time,
end_memory - start_memory,
status
)
[status, headers, response]
end
def record_metrics(path, duration, memory_delta, status)
key = "request:#{path}:#{status}"
redis.pipelined do
redis.hincrby(key, "count", 1)
redis.hincrby(key, "total_duration", (duration * 1000).to_i)
redis.hincrby(key, "total_memory", memory_delta)
redis.expire(key, 3600)
end
end
def memory_usage
`ps -o rss= -p #{Process.pid}`.to_i
end
end
This middleware aggregates metrics across multiple application instances. Path-based grouping enables endpoint-specific performance analysis. I’ve found this invaluable for identifying slow endpoints under production load.
Garbage collection significantly impacts Ruby application performance. I profile GC behavior to understand its overhead and optimize memory usage patterns.
class GCProfiler
def self.profile(&block)
GC.start
initial_stats = GC.stat
result = yield
final_stats = GC.stat
analyze_gc_impact(initial_stats, final_stats)
result
end
def self.analyze_gc_impact(initial, final)
puts "GC Statistics:"
puts " Minor GCs: #{final[:minor_gc_count] - initial[:minor_gc_count]}"
puts " Major GCs: #{final[:major_gc_count] - initial[:major_gc_count]}"
puts " Total allocated: #{final[:total_allocated_objects] - initial[:total_allocated_objects]} objects"
puts " Heap pages: #{final[:heap_live_slots]} live, #{final[:heap_free_slots]} free"
end
def self.monitor_gc_pressure(threshold: 0.1)
Thread.new do
loop do
stats = GC.stat
gc_time = stats[:time] / 1000.0
elapsed = stats[:count] > 0 ? gc_time / stats[:count] : 0
if elapsed > threshold
warn "High GC pressure: #{elapsed.round(3)}s per collection"
end
sleep 60
end
end
end
end
GCProfiler.profile do
large_array = Array.new(100000) { |i| "string_#{i}" }
large_array.map(&:upcase)
end
Time-based analysis identifies excessive GC overhead. Live object tracking helps optimize memory usage. I run GC profiling during performance testing to establish baseline behavior.
Method-level profiling provides granular insights into performance hotspots. I use TracePoint to track exact method execution duration within specific classes.
class MethodLevelProfiler
def initialize(target_class)
@target_class = target_class
@method_timings = {}
setup_tracepoint
end
def setup_tracepoint
@tracepoint = TracePoint.new(:call, :return) do |tp|
if tp.defined_class == @target_class
if tp.event == :call
@method_timings[tp.method_id] = {
start_time: Process.clock_gettime(Process::CLOCK_MONOTONIC),
call_count: (@method_timings[tp.method_id]&.dig(:call_count) || 0) + 1
}
else
timing = @method_timings[tp.method_id]
if timing && timing[:start_time]
duration = Process.clock_gettime(Process::CLOCK_MONOTONIC) - timing[:start_time]
timing[:total_duration] = (timing[:total_duration] || 0) + duration
timing[:start_time] = nil
end
end
end
end
end
def start
@tracepoint.enable
end
def stop
@tracepoint.disable
generate_report
end
def generate_report
puts "Method performance report for #{@target_class}:"
@method_timings.each do |method, data|
next unless data[:total_duration]
avg_time = data[:total_duration] / data[:call_count]
puts " #{method}: #{data[:call_count]} calls, avg #{avg_time.round(6)}s"
end
end
end
profiler = MethodLevelProfiler.new(UserService)
profiler.start
UserService.new.process_users
profiler.stop
Call and return events track exact method execution duration. This approach identifies performance hotspots within specific classes. I use it when optimizing critical service objects.
External service interactions often introduce performance variability. I profile these calls to understand their impact and identify unreliable dependencies.
class ExternalServiceProfiler
def initialize
@http_metrics = {}
setup_http_listeners
end
def setup_http_listeners
ActiveSupport::Notifications.subscribe("http.request") do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
service = extract_service_name(event.payload[:uri])
@http_metrics[service] ||= { count: 0, total_duration: 0, errors: 0 }
metrics = @http_metrics[service]
metrics[:count] += 1
metrics[:total_duration] += event.duration
metrics[:errors] += 1 if event.payload[:error]
end
end
def profile_external_calls(&block)
@http_metrics.clear
result = yield
report_external_metrics
result
end
def extract_service_name(uri)
URI.parse(uri).host
end
def report_external_metrics
puts "External service performance:"
@http_metrics.each do |service, metrics|
avg_time = metrics[:total_duration] / metrics[:count]
error_rate = (metrics[:errors].to_f / metrics[:count] * 100).round(2)
puts " #{service}: #{metrics[:count]} calls, avg #{avg_time.round(2)}ms, #{error_rate}% errors"
end
end
end
profiler = ExternalServiceProfiler.new
profiler.profile_external_calls do
PaymentGateway.charge(amount: 100)
NotificationService.send(message: "test")
end
Error rate tracking identifies unreliable external dependencies. Host-based grouping organizes metrics by service provider. I monitor external calls during integration testing and production.
Implementing these profiling patterns requires consideration of overhead and data storage. I typically run profilers in staging environments first to validate their impact. Production deployment should include sampling to minimize performance degradation.
Data aggregation strategies vary by use case. For request-level profiling, I prefer centralized storage like Redis. Method-level profiling might use local aggregation with periodic reporting. The key is matching the storage solution to the profiling scope and frequency.
I’ve found that combining multiple profiling approaches provides the most comprehensive view. Execution profiling identifies broad patterns, while method-level drilling pinpoints specific issues. Database and external service profiling complete the picture by covering integration points.
Regular profiling helps establish performance baselines and detect regressions. I incorporate profiling into continuous integration pipelines for early detection of performance issues. Production profiling with appropriate sampling provides real-world insights.
These techniques have helped me optimize numerous Ruby applications. The patterns scale from small services to large monolithic applications. Adaptation to specific use cases might involve adjusting measurement granularity or storage backends.
Performance optimization should always be data-driven. Profiling provides the empirical evidence needed to make informed decisions. I prioritize fixes based on impact and effort, focusing on areas with significant performance gains.
Memory profiling deserves special attention in Ruby applications. The language’s garbage collected nature means memory usage patterns directly impact performance. Understanding object allocation helps write more efficient code.
Database optimization often provides the biggest performance improvements. Query profiling identifies N+1 queries, missing indexes, and inefficient data access patterns. Combined with execution profiling, it guides database schema and query optimization.
External service profiling highlights integration bottlenecks. It helps design appropriate timeouts, retry strategies, and fallback mechanisms. Monitoring error rates informs reliability improvements.
Garbage collection profiling reveals memory pressure issues. High GC frequency or duration indicates excessive object allocation. Optimization might involve object reuse or algorithm changes.
Method-level profiling targets specific performance hotspots. It’s particularly useful for optimizing frequently called methods in core business logic. The detailed timing data guides refactoring efforts.
Request-level profiling provides user-centric performance metrics. It helps ensure consistent response times across application endpoints. Aggregation over time identifies performance trends and seasonal patterns.
In practice, I often start with execution profiling to identify broad issues. Then I drill down with more specific profilers based on initial findings. This layered approach balances comprehensive coverage with focused investigation.
Profiling overhead must be managed carefully in production. I use sampling rates and lightweight measurements to minimize impact. The benefits of performance insights typically outweigh the modest overhead.
These profiling patterns create a foundation for continuous performance improvement. They help transform subjective perceptions of slowness into objective data for optimization. Regular profiling becomes part of the development culture.
I encourage teams to incorporate profiling into their workflow. The initial setup time pays dividends through faster issue resolution and proactive performance management. The patterns adapt to various Ruby application architectures and scales.
Performance work is never truly complete. Applications evolve, usage patterns change, and new bottlenecks emerge. Consistent profiling ensures performance remains a priority throughout the application lifecycle.
The tools and techniques I’ve shared represent years of refinement across different projects. They’ve helped me deliver faster, more reliable Ruby applications. I hope they provide similar value in your performance optimization efforts.