7 Ruby Gems That Give You Full Production Observability in Rails Apps
Discover 7 essential Rails gems that give you full production observability. Monitor errors, slow queries, and queues with confidence. Start building smarter today.
I remember the first time I deployed a Rails app to production and felt that knot in my stomach. Everything worked on my laptop. On the server, requests crawled, memory climbed, and background jobs piled up like unread emails. I had no idea what was happening. Logs were a wall of text. Error messages appeared hours late. That’s when I learned the difference between monitoring and observability.
Monitoring tells you something is wrong. Observability tells you exactly what is wrong, where, and why. In a running Rails application, you need both. Over the years I’ve settled on seven gems that form a complete observability stack. They don’t require a PhD in DevOps. They just work. Let me walk you through each one, with code you can copy and personal lessons from the trenches.
ScoutAPM – The first thing you add
When my app started feeling slow, I guessed. I looked at the database, added indexes, cached queries. Still slow. Then a friend told me to install ScoutAPM. It was like turning on the lights in a dark room.
# Gemfile
gem 'scout_apm'
That’s it. After bundle install and a key from scoutapm.com, Scout starts instrumenting every request. It shows you which controller actions are slow, which SQL queries eat up time, and how long views take to render. The transaction traces are beautiful—they break down each step in order.
One day I saw a trace where a page took 12 seconds. The culprit was a single User.all call inside a loop. Scout showed me the exact line number. I fixed it with a simple includes. The page dropped to 200 milliseconds.
Scout also catches N+1 queries automatically. It highlights the offending code and suggests fixes. For a junior developer on my team, that was a crash course in database performance.
You can drill into any slow transaction from the dashboard. It shows the SQL, the request parameters, even the backtrace. No extra configuration. No YAML files.
AppSignal – Error tracking with performance hints
After Scout told me where the slow parts lived, I needed to understand why errors happened. AppSignal does both: error tracking and performance monitoring.
gem 'appsignal'
Add your API key to config/appsignal.yml and you get a unified view. Every exception is grouped by type and occurrence counts. But the useful part is context: AppSignal captures the request body, session data, user agent, and even the current Sidekiq job metadata.
I once had a bug that only appeared for users in Australia. AppSignal showed me the request headers included a strange timezone offset. A daylight saving edge case. Without that context, I would have spent days reproducing it.
AppSignal also sends alerts when error rates spike. You can set thresholds per endpoint. When your payment gateway starts failing, you get a Slack message before your customers complain.
The performance traces work like Scout’s, but AppSignal adds host-level metrics: CPU, memory, disk I/O. You can correlate a slow endpoint with a memory spike. That helped me identify a Ruby garbage collection issue that was slowing down a specific controller.
Lograge – Stop drowning in log noise
Default Rails logs are verbose. Each request prints every SQL query, every partial render, every filter. In development that’s fine. In production, you’re paying for log storage and you’re searching for a needle in a haystack.
I spent hours grepping through gigabytes of logs. Then I added Lograge.
# Gemfile
gem 'lograge'
# config/initializers/lograge.rb
Rails.application.configure do
config.lograge.enabled = true
config.lograge.formatter = Lograge::Formatters::Json.new
config.lograge.custom_options = lambda do |event|
{ params: event.payload[:params].except('controller', 'action') }
end
end
After this, every HTTP request becomes one JSON line. It includes method, path, status, duration, and any custom data you add. No more multi-line logs.
I pipe these logs to Elasticsearch via Filebeat or Fluentd. Then I can search for all requests that took longer than 5 seconds, or all 500 errors from a specific IP. Lograge makes log aggregation tools useful.
One tip: add controller and action to the custom_options if you want to group by endpoint. I also add the current user ID for debugging support tickets.
Yabeda – Your own custom metrics
I needed to track business metrics that standard tools don’t cover. How many signups per minute? How many orders failed because of stock? How long does it take from checkout to payment confirmation? Those questions are not answered by request duration.
Yabeda gives me a DSL to define and emit custom metrics.
# Gemfile
gem 'yabeda'
gem 'yabeda-prometheus'
# config/initializers/yabeda.rb
Yabeda.configure do
counter :signups_total, comment: "Total number of user signups"
gauge :cache_miss_latency, comment: "Time in ms for cache misses"
histogram :order_processing_time do
comment "Time to process an order in ms"
unit :milliseconds
end
end
# In your controller
Yabeda.signups_total.increment({ source: params[:source] })
# In a background job
Yabeda.order_processing_time.measure do
ProcessOrder.new.call
end
I expose these metrics via a Prometheus endpoint. Grafana dashboards show real-time signups and order processing times. When we ran a marketing campaign, the signup counter spiked and we saw a correlated increase in cache misses. Yabeda made that connection obvious.
The gem handles threading and periodic collection. You don’t worry about race conditions.
Rollbar – Error grouping that respects reality
I’ve used many error trackers. Rollbar impressed me because it groups errors intelligently. When the same exception occurs a thousand times, it doesn’t flood my inbox. It shows one occurrence and a count.
gem 'rollbar'
You configure it with an access token and optionally a Sidekiq integration. Rollbar catches exceptions automatically and enriches them with request data.
The killer feature is deploy tracking. After each deploy, Rollbar shows whether error rates changed. When I deploy a new feature and see a spike, I can roll back confidently.
Custom fingerprinting lets me override how errors are grouped. Suppose you have a NotFoundError that varies by product ID. By default Rollbar creates a separate group for each product. I overwrite the fingerprint to group all “not found” errors together.
Rollbar.configure do |config|
config.custom_fingerprint = ->(exception) do
if exception.is_a?(ActiveRecord::RecordNotFound)
"record_not_found"
else
nil
end
end
end
That reduces noise and makes the dashboard actionable.
RedisCloud – Watch your queues and caches
I used to think Redis was a magic box. It was fast, so I never looked inside. Then one day a Sidekiq queue grew to 50,000 jobs and my app slowed to a crawl. I needed visibility into Redis itself.
The redis-cloud gem (or just the redis gem with connection pooling) lets me export metrics about Redis usage.
redis_connection_pool = ConnectionPool.new(size: 10, timeout: 5) do
Redis.new(url: ENV['REDISCLOUD_URL'])
end
# In your monitoring endpoint
def redis_info
Redis.new(url: ENV['REDISCLOUD_URL']).info
rescue => e
{ error: e.message }
end
I track cache hit rates by incrementing counters on cache reads and cache misses. If the hit rate drops below 90%, I know I need to rethink my caching strategy.
I also watch queue lengths for Sidekiq. When a queue grows, I send an alert. RedisCloud–style metrics let me see command latencies. A sudden increase in latency often means the Redis instance is under memory pressure or the network is saturated.
You can expose these metrics via a simple endpoint and scrape them with Prometheus, or push them to a time series database.
Pghero – See into your database shoulders
PostgreSQL is a workhorse, but it hides its problems. Long-running queries, missing indexes, table bloat—these accumulate over time. Pghero opens the hood.
gem 'pghero'
Mount the dashboard at /pghero in your routes. You get a web UI showing slow queries, index recommendations, and space usage. It even shows which queries are locked waiting for a table.
I added Pghero to our production app. The first thing I saw: a query that scanned the entire orders table, taking 45 seconds. It was from a background job that ran every hour. I added an index on created_at. The query dropped to 50 milliseconds.
Pghero can also emit metrics to Prometheus. You hit /pghero/metrics and get a text format. I scrape that into Grafana. Now I see database load in the same dashboard as request latency and error rates.
The best part: it’s read-only. No risk of accidentally running a destructive operation.
Putting it all together
These gems don’t compete. They complete each other.
Scout and AppSignal give you end-to-end transaction tracing. They tell you “this request is slow because of this SQL query in this file”. Lograge structures the noise so you can search and alert efficiently. Yabeda lets you instrument business events that no generic tool can guess. Rollbar groups errors intelligently and ties them to deploys. RedisCloud and Pghero watch the infrastructure that your code depends on.
When you combine them, you get a system that tells you not only that something is wrong, but exactly what needs to change.
I once had a production incident where signups failed for ten minutes. Scout showed that the signup controller timed out on a database query. Lograge confirmed that all failing requests hit the same controller with the same params. Rollbar grouped the exceptions with a user_id. Pghero showed a missing index on users.email. Yabeda’s signup counter went to zero. RedisCloud showed a Sidekiq queue backing up because jobs couldn’t find the user.
Each gem provided one piece. Together, they painted the full picture. I added the index, restarted the queue, and everything returned to normal within five minutes.
You don’t need a PhD to set these up. Add the gems, configure the API keys, and let them run. The insights will come. Production runs itself—until it doesn’t. When it doesn’t, you want these tools waiting.
Start with one. I recommend ScoutAPM or AppSignal. Then add Lograge next. The others you can add as you grow. But don’t wait until the knot in your stomach returns. Add them now, while everything is calm. You’ll thank yourself later.