Let me tell you about a testing method that changed how I think about writing reliable code. You know how most tests work, right? You write a specific example. You say, “When I pass 2 and 2 to my add function, it should return 4.” That’s example-based testing. It’s like checking a single point on a map.
Property-based testing is different. Instead of checking points, it describes the shape of the entire territory. It states a rule that must always be true, no matter what valid input you throw at it. Then, it lets the computer generate hundreds or thousands of random inputs to try and break that rule. My job shifts from thinking of clever examples to defining the fundamental truths of my code.
Think of it like testing a sorting function. An example test says, “Sorting [3, 1, 2] gives [1, 2, 3].” A property test says, “For any list of numbers, after sorting, each element should be less than or equal to the next one.” The property is the universal law. The computer’s job is to find the asteroid that proves the law wrong.
I started using this in Ruby with a library called Rantly. It plugs right into RSpec, which I was already using. The mental shift was the hardest part. I had to stop asking, “What examples should I test?” and start asking, “What is always true about my code?”
Here’s a simple place to begin. Think about reversing an array. What’s a universal truth about that operation?
require 'rantly'
require 'rantly/rspec_extensions'
RSpec.describe 'Array' do
it 'reversing twice gives you the original array' do
property_of {
array(integer) # Generate a random array of integers
}.check { |random_array|
expect(random_array.reverse.reverse).to eq(random_array)
}
end
end
When I run this, Rantly creates hundreds of random arrays. It might try an empty array [], a huge array with 10,000 elements, an array with negative numbers, or an array with duplicate values. My single property test covers all those cases. If reversing twice ever doesn’t give me the original array back, I have a serious bug, and the test will find it.
Let’s look at sorting. The properties are richer.
RSpec.describe 'Array' do
it 'produces elements in order' do
property_of {
array(integer)
}.check { |arr|
sorted = arr.sort
# Property 1: Each element is <= the next one
sorted.each_cons(2) do |a, b|
expect(a <= b).to be true
end
# Property 2: Sorting doesn't create or lose elements
expect(sorted.sort).to match_array(arr)
}
end
end
This test defines two core properties of a correct sort. The first checks ordering. The second checks that the sorted list is a permutation of the original—no elements added or removed. This one test is more powerful than a dozen example tests. It found a bug for me once in a custom comparator where certain equal elements were being dropped.
The real power comes when you move beyond basic types and generate data that looks like your business domain. You don’t just generate random strings; you generate valid emails. You don’t just generate random numbers; you generate plausible transaction amounts, where most are small but a few are large.
Here’s how I build custom generators.
module MyGenerators
# Generate a plausible email address
def email
guard(10) do # Try up to 10 times to generate valid data
local_part = string(:alnum, range(1, 10)) # e.g., "john42"
domain = choose('example.com', 'test.org', 'company.net')
"#{local_part}@#{domain}"
end
end
# Generate a date within a specific range
def date_in_range(start_date, end_date)
range(start_date.to_time.to_i, end_date.to_time.to_i).map do |timestamp|
Time.at(timestamp).to_date
end
end
# Generate transaction amounts: mostly small, rarely huge
def transaction_amount
frequency(
[90, -> { range(0, 1000) }], # 90%: Small, normal transactions
[9, -> { range(1000, 10000) }], # 9%: Larger transactions
[1, -> { range(10000, 100000) }] # 1%: Very large, edge-case transactions
)
end
end
I include this module, and suddenly I can write tests like property_of { { email: email, date: date_in_range(...), amount: transaction_amount } }. The data is random, but it’s meaningfully random. It respects the constraints of my application.
This is where property-based testing starts uncovering bugs I’d never think to write an example for. Let’s test a simple OrderValidator.
class OrderValidator
def validate(order)
errors = []
errors << 'Total must be positive' if order.total <= 0
errors << 'Must have at least one item' if order.items.empty?
errors << 'Customer email required' if order.customer_email.to_s.strip.empty?
errors
end
end
RSpec.describe OrderValidator do
it 'always rejects an order with a zero or negative total' do
property_of {
{
total: range(-1000, 0), # Always generate a bad total
items: array(-> { { id: integer, qty: range(1, 5) } }),
customer_email: string(:printable)
}
}.check(100) { |order_data| # Check 100 random bad orders
validator = OrderValidator.new
order = OpenStruct.new(order_data)
errors = validator.validate(order)
# The universal property: If total <= 0, this error must be present.
expect(errors).to include('Total must be positive')
}
end
it 'always accepts a perfectly valid order' do
property_of {
{
total: range(1, 10000),
items: array(-> { { id: integer, qty: range(1, 5) } }, range(1, 10)), # 1 to 10 items
customer_email: "#{string(:alnum)}@example.com"
}
}.check { |order_data|
validator = OrderValidator.new
order = OpenStruct.new(order_data)
errors = validator.validate(order)
# The universal property: For all valid inputs, the error list is empty.
expect(errors).to be_empty
}
end
end
The first test is fascinating. It says, “For any order with a total less than or equal to zero, the validator must flag it.” The computer will generate all sorts of weird orders with negative totals—some with many items, some with weird emails—but the rule must hold. The second test defines what a “valid” order looks like (positive total, 1-10 items, proper email) and asserts they always pass.
But what about code that has state? What about a shopping cart where you can add and remove items? This is called stateful or state machine property testing. You don’t just test one operation; you test random sequences of operations.
class ShoppingCart
def initialize
@items = {}
end
def add(product_id, quantity)
@items[product_id] = (@items[product_id] || 0) + quantity
end
def remove(product_id, quantity)
current = @items[product_id] || 0
new_qty = current - quantity
if new_qty <= 0
@items.delete(product_id)
else
@items[product_id] = new_qty
end
end
def total_quantity
@items.values.sum
end
end
RSpec.describe ShoppingCart do
it 'never has a negative total quantity, no matter what sequence of operations' do
property_of {
# Generate an array of random operations: either :add or :remove
array(-> {
{
op: choose(:add, :remove),
product_id: integer,
qty: range(1, 5)
}
})
}.check { |sequence|
cart = ShoppingCart.new
# Apply each random operation
sequence.each do |command|
case command[:op]
when :add
cart.add(command[:product_id], command[:qty])
when :remove
cart.remove(command[:product_id], command[:qty])
end
# The INVARIANT: After *every single operation*, this must be true.
expect(cart.total_quantity).to be >= 0
end
}
end
end
This test generates random scripts like [add(5,2), remove(5,1), add(3,4), remove(5,10)] and plays them out. The property, or invariant, is that the cart’s total quantity can never be negative. If a bug in my remove method lets it go negative, this test will find a sequence that breaks the invariant. It’s like having a fuzzer for my object’s API.
Now, the killer feature: shrinking. When property-based testing finds a failure, it doesn’t just shout, “Here’s a huge, messy input that broke your code!” That’s not helpful. Instead, it tries to simplify that failing input to the smallest, most understandable example that still causes the failure. This process is called shrinking.
Imagine a test fails on a 50-element array. The shrinker will try removing elements. Does it still fail with 49? With 10? With 2? It will try reducing numbers. If it failed with total: -873, does it fail with total: -1? With total: 0? It hones in on the core of the problem.
Rantly has built-in shrinking for basic types, but you can build custom shrinkers for your domain objects. Here’s a simplified idea of how you might approach it for an order.
class OrderShrinker
# Given a failing order hash, try to make it smaller but still failing.
def shrink(failing_order, &property)
current = failing_order.dup
# Strategy 1: Try removing items from the list.
if current[:items] && current[:items].size > 1
(current[:items].size-1).downto(0) do |index|
candidate = current.dup
candidate[:items] = current[:items].dup
candidate[:items].delete_at(index)
# If it *still* fails with this simpler data, keep the simpler version.
begin
property.call(candidate)
# If the property passes, this isn't a failing case anymore. Revert.
candidate[:items].insert(index, current[:items][index])
rescue RSpec::Expectations::ExpectationNotMetError
# Still fails! Accept the simpler candidate.
current = candidate
end
end
end
# Strategy 2: Try reducing quantities to 1.
if current[:items]
current[:items].each do |item|
next if item[:qty] == 1 # Already minimal
original_qty = item[:qty]
item[:qty] = 1
begin
property.call(current)
item[:qty] = original_qty # Property passed, revert.
rescue RSpec::Expectations::ExpectationNotMetError
# Keeping qty = 1 is fine.
end
end
end
current # Return the shrunk, minimal failing example.
end
end
When a test fails, I get a report like: “Found a failure. Original failing case was a huge order. Shrunk to: {total: 0, items: [{id: 1, qty: 1}], customer_email: ""}”. Immediately, I see the issue: a total of zero is invalid, and an empty email is invalid. The bug is obvious.
You can mix this with the tools you already use. I often use FactoryBot to create the initial “shape” of realistic data, then let Rantly randomize the details.
RSpec.describe User do
it 'enforces unique email addresses' do
# Start with 50 factory-built users as a realistic base
base_users = Array.new(50) { FactoryBot.build(:user) }
property_of {
# Generate random lists drawn from our base users
array(choose(*base_users), range(2, 20))
}.check { |user_list|
saved_emails = Set.new
user_list.each do |user|
# The property: If the email is already saved, save should fail.
if saved_emails.include?(user.email)
expect(user.save).to be false
expect(user.errors[:email]).to include('has already been taken')
else
saved_emails.add(user.email)
expect(user.save).to be true
end
end
}
end
end
This test uses FactoryBot’s knowledge of what makes a valid User (with a name, encrypted password, etc.) but then property-based testing stresses the uniqueness constraint with random combinations.
Finally, you can test things beyond correctness. You can test performance properties.
RSpec.describe 'Search function' do
it 'scales linearly with input size' do
property_of {
size = range(100, 10000)
{ size: size, data: array(integer, size) }
}.check(20) { |test_case| # Only run 20 large tests
data = test_case[:data]
target = data.sample # Search for an element known to be present
start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
data.find { |x| x == target }
duration = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start_time
# Very rough check: time per element should be roughly constant.
# We allow for some noise but expect no quadratic blow-up.
time_per_element = duration / test_case[:size]
expect(time_per_element).to be < 0.00001 # Some small threshold
}
end
end
This is a sanity check. It won’t give you precise Big O analysis, but if someone accidentally changes a linear search to a quadratic one, this property test will likely fail on a large, random input.
How do you start? Don’t try to convert your entire test suite. That’s overwhelming. Next time you write a test for a pure function—a method that takes values and returns a value based only on those inputs—pause. Ask yourself: “What is always true about the output, given the inputs?” Write that as a property test alongside your example tests.
Start with simple invariants:
- Encoding and then decoding data should give you the original input.
- The result of a calculation should always be within a certain range.
- A filter function should never return more items than you gave it.
- Parsing a string and then formatting it back should be equal to the original (or at least preserve the meaning).
You’ll find that property-based testing makes you think more deeply about your code’s design and contracts. It finds edge cases you missed. It gives you confidence that your code isn’t just correct for the examples you thought of, but for the entire domain of valid inputs. For me, it turned testing from a chore into a puzzle of discovering the fundamental laws of my own programs. Give it a try on one small function. You might be surprised at what you—and the computer—discover.