Implementing Full-Text Search in Ruby on Rails
Search functionality separates functional applications from great ones. When users can’t find what they need quickly, they leave. I’ve implemented search across dozens of Rails applications, from small content sites to e-commerce platforms with millions of products. Here’s what actually works.
1. PostgreSQL Full-Text Search
PostgreSQL’s built-in search tools are my first choice for many applications. You avoid external dependencies while getting solid performance. Start with tsvector
and tsquery
:
# Migration
class AddSearchVectorToProducts < ActiveRecord::Migration[7.0]
def change
add_column :products, :search_vector, :tsvector
add_index :products, :search_vector, using: :gin
end
end
# Model
class Product < ApplicationRecord
before_save :update_search_vector
private
def update_search_vector
self.search_vector =
ActiveRecord::Base.connection.execute(
sanitize_sql_array([<<-SQL, id])
UPDATE products
SET search_vector = to_tsvector('english', title || ' ' || description)
WHERE id = ?
RETURNING search_vector
SQL
)[0]['search_vector']
end
end
# Querying
Product.where("search_vector @@ plainto_tsquery('english', ?)", "organic coffee")
This approach handles stemming (searching for “run” matches “running”), ignores stop words, and supports ranking. For medium-sized datasets (under 1M records), response times stay under 50ms. I use this for content-heavy sites like documentation portals.
2. pg_trgm for Fuzzy Search
Users misspell queries constantly. PostgreSQL’s pg_trgm extension saves searches that would otherwise fail:
# Enable extension
enable_extension :pg_trgm
# Migration
add_index :users, :username, using: :gin, opclass: :gin_trgm_ops
# Query
User.where("similarity(username, ?) > 0.3", "johndoe")
.order("similarity(username, 'johndoe') DESC")
The trigram approach breaks text into three-character chunks. “Jonathan” matches “John” because they share “joh” and “ohn”. I set similarity thresholds between 0.3 and 0.6 depending on strictness needs. It’s perfect for username/name searches where typos are common.
3. Searchkick + Elasticsearch
When you need industrial-strength search, Elasticsearch with Searchkick is my go-to. The setup is simpler than raw Elasticsearch:
class Article < ApplicationRecord
searchkick settings: { number_of_shards: 3 },
text_middle: [:title],
suggest: [:title]
def search_data
{
title: title,
content: ActionView::Base.full_sanitizer.sanitize(content),
tags: tags,
status: status
}
end
end
# Indexing asynchronously
Article.reindex(async: true)
# Searching with typo tolerance
results = Article.search("programming langugae",
fields: [:title, :content],
match: :phrase,
misspellings: { edit_distance: 2 },
suggest: true)
I always enable async: true
in production - blocking requests during reindexing causes timeouts. For a client’s e-commerce site, this handled 200 queries/second with 15ms average response time at peak. The cost? Managing an Elasticsearch cluster. Use it when you need:
- Phonetic matching (“smith” matches “smyth”)
- Synonym expansion (“TV” = “television”)
- Custom analyzers for non-English languages
4. Ransack for Simple Filtering
Don’t overcomplicate simple search needs. Ransack provides search forms with zero configuration:
# Controller
def index
@q = Product.ransack(params[:q])
@products = @q.result(distinct: true).includes(:category)
end
# View
<%= search_form_for @q do |f| %>
<%= f.search_field :name_cont %>
<%= f.submit "Search" %>
<% end %>
# Supports associations
@q = Product.ransack(category_name_eq: "Electronics")
The _cont
predicate does partial matches. I add distinct: true
to avoid duplicate records from joins. For admin dashboards and basic filtering, Ransack saves hours of development time. Avoid it for full-text content search - it lacks relevance scoring.
5. Sunspot + Solr for Enterprise Search
When you need faceted search and complex relevancy tuning, Solr delivers:
class Book < ApplicationRecord
searchable do
text :title, boost: 2.0
text :author
string :category, multiple: true
time :published_at
end
end
# Searching with facets
Sunspot.search(Book) do
fulltext "ruby programming"
facet :category
with(:published_at).greater_than(1.year.ago)
paginate page: params[:page], per_page: 30
end
Boost parameters let you prioritize title matches over content. Facets enable drill-down navigation (“Show only technical books published this year”). On a legal document platform, we reduced average search time from 2 minutes to 3 seconds using Solr. The Java-based stack requires more ops overhead but handles billion-document indexes.
6. SQLite FTS5 for Local Apps
For local-first applications or small projects, SQLite’s full-text search surprises:
# Migration
create_table :notes do |t|
t.text :content
t.text :content_fts # Virtual column for FTS5
end
# Create virtual table
execute "CREATE VIRTUAL TABLE notes_fts USING fts5(content, content='notes', content_rowid='id')"
# Trigger for indexing
execute <<-SQL
CREATE TRIGGER notes_ai AFTER INSERT ON notes BEGIN
INSERT INTO notes_fts(rowid, content) VALUES (new.id, new.content);
END;
SQL
# Search
Note.where("id IN (SELECT rowid FROM notes_fts WHERE notes_fts MATCH ?)", "meeting notes")
I use this for desktop applications built with Rails like inventory managers. The entire search index lives in the app DB with zero dependencies. Avoid for high-write volumes - triggers add overhead.
7. Hybrid Approaches
In production systems, I often combine techniques:
# Use pg_search for basic text, Elasticsearch for advanced
def search
if advanced_search?(params)
ElasticsearchSearch.new(params).run
else
PgSearch.multisearch(params[:query])
end
end
# Sample PgSearch setup
include PgSearch::Model
pg_search_scope :search_by_content,
against: [:title, :content],
using: { tsearch: { dictionary: 'english' } }
This strategy reduces load on Elasticsearch for simple queries. I route 80% of traffic through PostgreSQL, only hitting Elasticsearch for complex queries or filters. Monitor your query patterns to set the right routing rules.
Performance Essentials
Indexing blocks production traffic. Always:
# Batch indexing for large datasets
Product.find_in_batches do |batch|
ProductIndexer.perform_async(batch.map(&:id))
end
# Use dedicated queues
Sidekiq.configure_server do |config|
config.queues = %w[default indexing critical]
end
Set up monitoring:
# Track latency
Rails.application.monitor.perform :search do |event|
Metrics.timing("search.latency", event.duration)
end
# Log slow queries
ActiveSupport::Notifications.subscribe("search.solr") do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
if event.duration > 1000
Rails.logger.warn "Slow search: #{event.payload[:query]}"
end
end
For autocomplete, prefix matching outperforms full-text scans:
# PostgreSQL
Product.where("name ILIKE ?", "#{query}%")
# Elasticsearch
Article.searchkick_index.analyze(text: "program", analyzer: "searchkick_autocomplete")
Security Practices
Search exposes injection risks:
# Bad - direct interpolation
Product.where("to_tsvector(description) @@ to_tsquery('#{params[:query]}')")
# Good - parameterization
Product.where("to_tsvector(description) @@ plainto_tsquery(:query)", query: params[:query])
# Escape special characters in Elasticsearch
def sanitize_query(query)
query.gsub(/([+\-!(){}[\]^"~*?:\\\/])/, '\\\\\1')
end
Choosing Your Approach
- < 50k records: Stick with PostgreSQL
- 50k-5M records: Add Elasticsearch/Solr
- User-generated content: Always use typo tolerance
- Multi-language: Prioritize stemming support
Start simple. Add complexity only when metrics show search failures or slow queries. I’ve seen teams deploy Elasticsearch for 10k-record apps - the maintenance burden wasn’t worth the 200ms speed gain.
Good search feels magical when done right. Implement these patterns methodically, monitor performance, and your users will find exactly what they need - instantly.