Document management in Ruby on Rails requires careful consideration of storage, performance, and user experience. I’ve implemented numerous document management systems, and here are the most effective techniques I’ve discovered.
File Versioning Implementation
Version control is crucial for tracking document changes. I implement this using a polymorphic association pattern combining ActiveStorage with custom version tracking.
class Document < ApplicationRecord
has_many :versions
has_one_attached :file
def create_version
versions.create!(
content: file.download,
checksum: calculate_checksum,
version_number: next_version_number,
metadata: extract_metadata
)
end
private
def calculate_checksum
Digest::SHA256.hexdigest(file.download)
end
end
Full-text Search Integration
Elasticsearch provides powerful search capabilities. I integrate it with Rails using the Searchkick gem for optimal performance.
class Document < ApplicationRecord
searchkick
def search_data
{
title: title,
content: extracted_text,
tags: tags.pluck(:name),
metadata: metadata
}
end
def extracted_text
text_extractor = TextExtractor.new(file)
text_extractor.process
end
end
Access Control Implementation
Role-based access control ensures document security. I implement this using Pundit policies.
class DocumentPolicy < ApplicationPolicy
def show?
user.has_access_to?(record)
end
def update?
user.can_edit?(record) && !record.locked?
end
class Scope < Scope
def resolve
scope.joins(:permissions)
.where(permissions: { user_id: user.id })
end
end
end
File Format Conversion
Converting documents to standardized formats improves compatibility. I use background jobs for processing.
class DocumentConversionJob < ApplicationJob
queue_as :default
def perform(document_id)
document = Document.find(document_id)
converter = DocumentConverter.new(document)
converted_file = converter.to_pdf
document.converted_file.attach(
io: converted_file,
filename: "#{document.title}.pdf",
content_type: 'application/pdf'
)
end
end
Document Workflow Automation
State machines help manage document lifecycles effectively.
class Document < ApplicationRecord
include AASM
aasm do
state :draft, initial: true
state :under_review
state :approved
state :archived
event :submit do
transitions from: :draft, to: :under_review
after do
notify_reviewers
create_audit_log
end
end
event :approve do
transitions from: :under_review, to: :approved
after :process_approval
end
end
end
Audit Trail Implementation
Tracking document activities is essential for compliance and monitoring.
class AuditLog < ApplicationRecord
belongs_to :document
belongs_to :user
def self.record_activity(document, user, action)
create!(
document: document,
user: user,
action: action,
ip_address: user.current_sign_in_ip,
metadata: {
browser: user.browser_info,
timestamp: Time.current
}
)
end
end
Cloud Storage Optimization
I optimize cloud storage using configurable providers and caching strategies.
class StorageService
def initialize(provider = Rails.configuration.storage_provider)
@provider = provider
@cache = Rails.cache
end
def store_document(document)
key = generate_storage_key(document)
@provider.store(
key: key,
file: document.file,
metadata: document.metadata,
options: storage_options
)
cache_document_metadata(key, document)
end
private
def cache_document_metadata(key, document)
@cache.write(
"document_metadata:#{key}",
document.metadata,
expires_in: 1.hour
)
end
end
Metadata Management
Effective metadata handling improves document organization and searchability.
class DocumentMetadata
def initialize(document)
@document = document
@metadata = {}
end
def extract
@metadata.merge!(
file_size: @document.file.byte_size,
content_type: @document.file.content_type,
created_at: @document.created_at,
last_modified: Time.current,
author: @document.user.name,
custom_fields: extract_custom_fields
)
end
def extract_custom_fields
parser = MetadataParser.new(@document.file)
parser.extract_metadata
end
end
These techniques form a robust foundation for document management systems. The key is combining them effectively based on specific requirements. I’ve found that focusing on performance optimization and user experience while maintaining security is crucial.
Some practical considerations include implementing batch processing for large documents, using background jobs for resource-intensive operations, and maintaining proper indexes for database queries.
Here’s an example of combining these techniques in a document processor:
class DocumentProcessor
def initialize(document)
@document = document
@storage = StorageService.new
@metadata = DocumentMetadata.new(document)
end
def process
ActiveRecord::Base.transaction do
extract_and_store_metadata
convert_document
create_version
update_search_index
generate_thumbnails
record_audit_log
end
end
private
def extract_and_store_metadata
metadata = @metadata.extract
@document.update!(metadata: metadata)
end
def convert_document
DocumentConversionJob.perform_later(@document.id)
end
def update_search_index
@document.reindex
end
end
Remember to implement proper error handling and monitoring. I recommend using services like Sentry for error tracking and New Relic for performance monitoring.
Regular testing and maintenance ensure system reliability. Implement comprehensive test coverage using RSpec:
RSpec.describe DocumentProcessor do
let(:document) { create(:document) }
let(:processor) { described_class.new(document) }
describe '#process' do
it 'processes the document successfully' do
expect { processor.process }.to change {
document.versions.count
}.by(1)
expect(document.metadata).to be_present
expect(document.search_data).to be_present
end
context 'when processing fails' do
it 'rolls back all changes' do
allow(processor).to receive(:convert_document)
.and_raise(StandardError)
expect {
processor.process
}.to raise_error(StandardError)
expect(document.versions.count).to eq(0)
end
end
end
end