Let’s talk about testing in a way that moves past the basics. If you’re like me, you started with unit tests for models and integration tests for controllers. They’re essential, like learning to walk. But as an application grows, these tests can leave gaps—places where things break in production that you never saw coming. I want to share some methods I use to fill those gaps, to test the parts of the system that are easy to miss.
Think about when your application talks to an external service, like a payment gateway or a weather API. Your unit tests might use stubs or mocks for that service. But how do you know what the real service will actually send back? And how do you ensure that when that service updates, it doesn’t break your app? This is where contract testing comes in.
The idea is simple: you and the service provider agree on a “contract”—the expected request and response format. My job is to verify that my code can still talk to their service using that contract. In practice, I write a test that doesn’t call the real service in my CI pipeline. Instead, it checks my code against a saved copy of the expected response structure, which I call the contract.
# A simple tool to check if a service response still matches our expectations.
class ServiceContractTest
def initialize(service_client, contract_version)
@client = service_client
@version = contract_version
# I use Redis to store the contract snapshots, but a file works too.
@contract_store = Redis.new
end
def verify_contract(endpoint, request_fixture)
# This is the key part. In a real test, I'd use a mocked client.
# But for contract generation, I might call a real dev endpoint once.
response = @client.call(endpoint, request_fixture)
contract_key = "contract:#{@version}:#{endpoint}"
stored_contract = @contract_store.get(contract_key)
if stored_contract
# Verify my code works with the saved contract structure.
expected = JSON.parse(stored_contract)
verify_response_structure(response, expected)
else
# First run: store the contract as the new source of truth.
@contract_store.set(contract_key, response.to_json)
true
end
end
def verify_response_structure(actual, expected)
# This recursively checks that the 'shape' of the data matches.
case expected
when Hash
expected.keys.all? do |key|
actual.key?(key) &&
verify_response_structure(actual[key], expected[key])
end
when Array
actual.is_a?(Array) &&
actual.all? { |item| verify_response_structure(item, expected.first) }
else
# Just check the type is the same (String, Integer, etc.)
actual.class == expected.class
end
end
end
# How I might use it in a test suite.
describe 'OrderService Consumer' do
before do
# I use version 'v2' of the contract with the Payment service.
@pact = ServiceContractTest.new(OrderServiceClient.new, 'v2')
end
it 'still works with the contract for creating an order' do
request = { user_id: 123, items: [{ id: 456, quantity: 2 }] }
expect(@pact.verify_contract('/orders', request)).to be true
end
end
This approach caught a problem for me once. An external API changed a field from a string to an integer. My mock in the unit test was still a string, so those tests passed. But the contract test failed because the stored contract had a string, and the new real response had an integer. It showed me I needed to update my code to handle both, preventing a production bug.
Now, let’s talk about a question that nagged at me: are my tests any good? I had high code coverage, but I wasn’t confident. I heard about mutation testing. The concept is brilliant and a bit funny. It deliberately introduces small bugs into my code and then runs my test suite. If the tests pass, it means they didn’t catch the bug—they’re not effective against that change.
I built a simple analyzer to understand the idea. It’s not a full mutation system, but it shows the mechanics.
# A basic look at how mutation testing works under the hood.
class MutationAnalyzer
# These are simple mutators: they change + to -, == to !=, etc.
MUTATORS = {
arithmetic: { '+' => '-', '-' => '+', '*' => '/', '/' => '*' },
logical: { '==' => '!=', '!=' => '==', '&&' => '||', '||' => '&&' },
relational: { '>' => '<', '<' => '>', '>=' => '<=', '<=' => '>=' }
}.freeze
def analyze(file_path)
original_code = File.read(file_path)
# Generate versions of the code with small errors.
mutations = generate_mutations(original_code)
results = mutations.map do |mutated_code|
test_result = run_tests_with_mutation(mutated_code)
{
mutation: mutated_code,
killed: test_result[:failed] > 0, # Was the bug caught?
test_output: test_result[:output]
}
end
calculate_mutation_score(results) # What percentage of bugs were caught?
end
def generate_mutations(code)
mutations = []
MUTATORS.each do |category, replacements|
replacements.each do |original, replacement|
# Find each operator and swap it.
code.scan(/#{Regexp.escape(original)}/) do |match|
mutated = code.gsub(match, replacement)
mutations << mutated
end
end
end
mutations.uniq.first(10) # I limit this because it can be slow.
end
def run_tests_with_mutation(code)
# Write the buggy code to a temp file and run the tests.
temp_file = Tempfile.new('mutated')
temp_file.write(code)
temp_file.close
output = `bundle exec rspec #{temp_file.path} --format json 2>&1`
{
output: output,
failed: output.match(/\"failed\":(\d+)/)&.captures&.first&.to_i || 0
}
ensure
temp_file.unlink
end
end
Running this was humbling. It showed me lines of code where changing > to < didn’t cause a test failure. It meant my tests weren’t checking the boundary condition. It’s a powerful way to find weak spots in a test suite.
As test suites grow, they get slow. A slow test suite is a problem because people stop running it. Parallel execution is the obvious answer, but in Rails, the big hurdle is the database. Tests can’t step on each other’s data.
I solved this by giving each parallel worker its own database.
# Running tests in parallel without them interfering.
class ParallelTestRunner
def initialize(worker_count: 4)
@worker_count = worker_count
@queues = Array.new(worker_count) { Queue.new }
@results = Concurrent::Array.new
end
def run_all(spec_files)
# Split the test files evenly across workers.
spec_files.each_with_index do |file, index|
queue_index = index % @worker_count
@queues[queue_index] << file
end
# Tell each worker when to stop.
@queues.each { |q| q << :STOP }
# Start the worker processes.
workers = @queues.map.with_index do |queue, worker_id|
Process.fork do
run_worker(worker_id, queue)
end
end
# Wait for all workers to finish.
workers.each { |pid| Process.waitpid(pid) }
combine_results
end
def run_worker(worker_id, queue)
# This is the key: a unique database for this worker.
db_name = "test_worker_#{worker_id}"
setup_isolated_database(db_name)
while file = queue.pop
break if file == :STOP
result = run_spec_file(file, db_name)
@results << result
end
end
def setup_isolated_database(db_name)
ActiveRecord::Base.connection.execute(
"CREATE DATABASE IF NOT EXISTS #{db_name}"
)
# Connect this process to its own database.
config = ActiveRecord::Base.configurations['test'].dup
config['database'] = db_name
ActiveRecord::Base.establish_connection(config)
# Load the schema into the new database.
load Rails.root.join('db/schema.rb')
end
end
This cut a 20-minute test run down to 5 minutes for me. The setup is a bit more complex, but the time savings are worth it. The isolation is perfect; a test in worker 2 can’t affect a test in worker 4.
We test for success, but what about failure? In production, networks lag, third-party services go down, servers run out of memory. Chaos testing is about deliberately causing those failures in a test environment to see if the system handles them.
I created a simple “Chaos Monkey” for my Rails app. It randomly injects problems.
# A tool to deliberately break things in a controlled way.
class ChaosMonkey
OPERATIONS = {
network_latency: ->(ms) { sleep(ms / 1000.0) },
service_error: -> { raise ServiceUnavailableError },
memory_pressure: -> { Array.new(100_000) { 'x' * 1024 } },
cpu_stress: -> { 100.times { Math.sqrt(rand(1000)) } }
}.freeze
def initialize(failure_rate: 0.01, enabled: true)
@failure_rate = failure_rate # e.g., 1% of calls will have issues
@enabled = enabled && Rails.env.test?
@injected_failures = Hash.new(0)
end
def inject_failure(operation_name, *args)
# Only inject a failure sometimes, based on the rate.
return unless @enabled && rand < @failure_rate
operation = OPERATIONS[operation_name]
return unless operation
begin
operation.call(*args)
@injected_failures[operation_name] += 1
rescue => e
Rails.logger.debug("ChaosMonkey: #{operation_name} failed - #{e.message}")
end
end
# A helper to wrap a service call with possible chaos.
def wrap_service_call(service, method_name, *args, &block)
inject_failure(:network_latency, rand(100..500)) # Add up to 500ms delay
inject_failure(:service_error) # Maybe raise an error
begin
block.call
rescue ServiceUnavailableError
# This is where my application's fallback logic kicks in.
handle_service_degradation(service, method_name, args)
end
end
end
# Using it in a test.
describe 'PaymentService with failures' do
let(:chaos) { ChaosMonkey.new(failure_rate: 0.5) } # High rate for testing
it 'handles network latency gracefully' do
service = PaymentService.new
result = chaos.wrap_service_call(service, :charge, order) do
service.charge(order)
end
# My system should either succeed or schedule a retry.
expect(result).to be_success.or be_retry_scheduled
end
end
By doing this, I found that a background job would fail immediately if an email service was down. I added a retry mechanism with exponential backoff because of this test.
Most tests use specific examples: “given this input, expect this output.” Property-based testing flips this. It says: “for all possible inputs (of a certain type), this property should hold true.” I use it to find edge cases.
# Defining properties about my code that should always be true.
class PropertyTest
GENERATORS = {
integer: -> { rand(-1000..1000) },
string: -> { SecureRandom.alphanumeric(rand(1..50)) },
email: -> { "#{SecureRandom.alphanumeric(10)}@example.com" },
date: -> { rand(1.year.ago..Time.current) },
boolean: -> { [true, false].sample }
}.freeze
def for_all(*types, &property)
# Check the property against 100 random sets of inputs.
100.times do
args = types.map { |type| GENERATORS[type].call }
begin
result = property.call(*args)
unless result
raise PropertyViolationError.new(
"Property failed for args: #{args.inspect}"
)
end
rescue => e
record_failure(types, args, e)
raise if e.is_a?(PropertyViolationError)
end
end
end
def record_failure(types, args, error)
# Save the failing example to study later.
FailureCase.create!(
property_types: types,
arguments: args,
error_message: error.message,
backtrace: error.backtrace.first(5)
)
end
end
# How I use property tests.
describe 'User validation' do
let(:prop_test) { PropertyTest.new }
it 'always accepts valid emails' do
prop_test.for_all(:string, :email) do |name, email|
user = User.new(name: name, email: email)
user.valid?
# The property is: for any name and valid email format, there should be no email error.
user.errors[:email].empty?
end
end
it 'never accepts negative ages' do
prop_test.for_all(:string, :integer) do |name, age|
user = User.new(name: name, age: age)
if age.negative?
# If age is negative, the user should be invalid.
!user.valid? && user.errors[:age].any?
else
true # We're not testing non-negative ages here.
end
end
end
end
This found a bug in my age validation I had missed. My validation only checked if age.present?. The property test generated age: 0. Zero is not negative, so the test passed it through. But in my app, age zero didn’t make sense. I updated the validation to if age.present? && age > 0.
For applications with important user interfaces, a button moving 2 pixels can be a problem. Visual regression testing takes screenshots and compares them to a known good version.
# Catching visual changes automatically.
class VisualRegressionTest
def initialize(screenshot_dir: 'tmp/screenshots', threshold: 0.01)
@screenshot_dir = screenshot_dir
@threshold = threshold # 1% pixel difference allowed
FileUtils.mkdir_p(@screenshot_dir)
end
def capture_page(element_selector = nil)
page = Capybara.current_session
if element_selector
element = page.find(element_selector)
screenshot_path = element_screenshot_path(element_selector)
element.screenshot(screenshot_path)
else
screenshot_path = page_screenshot_path
page.save_screenshot(screenshot_path)
end
screenshot_path
end
def compare_with_baseline(current_path, baseline_name)
baseline_path = baseline_file_path(baseline_name)
unless File.exist?(baseline_path)
# First run: this becomes the standard to compare against.
FileUtils.cp(current_path, baseline_path)
return { match: true, similarity: 1.0 }
end
similarity = calculate_similarity(current_path, baseline_path)
{
match: similarity >= (1 - @threshold),
similarity: similarity,
diff_path: generate_diff_image(current_path, baseline_path) # Helpful for debugging
}
end
def calculate_similarity(image_a_path, image_b_path)
image_a = ChunkyPNG::Image.from_file(image_a_path)
image_b = ChunkyPNG::Image.from_file(image_b_path)
diff_pixels = 0
total_pixels = image_a.width * image_a.height
image_a.height.times do |y|
image_a.row(y).each_with_index do |pixel_a, x|
pixel_b = image_b[x, y]
if pixel_a != pixel_b
diff_pixels += 1
end
end
end
1.0 - (diff_pixels.to_f / total_pixels)
end
end
I run these tests after any CSS or layout change. It once caught a margin change that accidentally pushed a form submit button behind a footer on mobile.
Finally, performance. A feature can work perfectly but be too slow. Performance regression testing monitors the speed of operations over time.
# Watching for slow-downs as code changes.
class PerformanceTest
def initialize(baseline_store: Redis.new)
@store = baseline_store
@measurements = []
end
def measure(operation_name, &block)
start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
result = block.call
end_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
duration = end_time - start_time
@measurements << {
operation: operation_name,
duration: duration,
timestamp: Time.current
}
compare_with_baseline(operation_name, duration)
result
end
def compare_with_baseline(operation_name, current_duration)
baseline_key = "perf:baseline:#{operation_name}"
baseline = @store.get(baseline_key)
if baseline
baseline_duration = baseline.to_f
percent_change = (current_duration - baseline_duration) / baseline_duration
if percent_change > 0.1 # Alert on a 10% slowdown
alert_performance_regression(operation_name, baseline_duration, current_duration)
end
else
# First measurement sets the baseline.
@store.set(baseline_key, current_duration)
end
end
def alert_performance_regression(operation_name, baseline, current)
# Create a record so the team can investigate.
PerformanceAlert.create!(
operation: operation_name,
baseline_duration: baseline,
current_duration: current,
percent_change: (current - baseline) / baseline,
environment: Rails.env
)
end
end
# Using it in a test.
describe 'Order processing performance' do
let(:perf_test) { PerformanceTest.new }
it 'completes within acceptable time' do
order = create(:large_order)
result = perf_test.measure('order_processing') do
OrderProcessor.new(order).process
end
expect(result).to be_success
# The `measure` method will also check against the stored baseline.
end
end
This setup warned me when a new gem I added was making database queries slower. I was able to roll it back before it reached production.
None of these strategies are silver bullets. They are tools. I don’t use all of them on every project. For a simple internal tool, visual regression testing is overkill. For a large e-commerce platform, all seven might be necessary.
The goal is to build confidence. Confidence that when I deploy, the system will work as expected, even when things go wrong. These methods help me find the problems that hide between the lines of my unit tests. They shift testing from just checking if I built the thing right, to checking if I built the right thing, and if it will stay right over time. Start with one that addresses your biggest current worry, and see how it changes your relationship with your test suite.