Domain-specific languages (DSLs) offer programmers a way to write concise, expressive code tailored to specific problem domains. As a systems programming language with powerful metaprogramming capabilities, Rust provides an excellent foundation for creating custom DSLs. I’ve spent years working with language design, and I’m excited to share these seven key techniques for building effective DSLs in Rust.
Macro-based DSLs
Rust’s macro system provides a powerful foundation for creating DSLs. Unlike simple text replacement macros in C, Rust macros operate on the abstract syntax tree, providing both safety and expressiveness.
Procedural macros are particularly valuable for DSL creation as they can parse custom syntax into Rust code. They run at compile time and can generate substantial amounts of code from concise input.
use proc_macro::TokenStream;
use quote::{quote, format_ident};
use syn::{parse_macro_input, LitStr};
#[proc_macro]
pub fn html(input: TokenStream) -> TokenStream {
let html_string = parse_macro_input!(input as LitStr).value();
let html_parser = HtmlParser::new(&html_string);
let elements = html_parser.parse().unwrap();
let generated = generate_html_elements(elements);
TokenStream::from(generated)
}
// Usage
let page = html!(`
<div class="container">
<h1>Hello, Rust!</h1>
<p>This is a DSL for HTML in Rust</p>
</div>
`);
When creating macro-based DSLs, I’ve found it helpful to use the syn
and quote
crates to parse and generate code. Focusing on helpful error messages makes all the difference in usability—providing context-specific errors that point users in the right direction.
Builder Pattern for Fluent Interfaces
The builder pattern enables the creation of fluent, chainable interfaces that read almost like natural language. This technique is particularly effective for configuration-heavy DSLs.
pub struct HttpRequestBuilder {
method: Option<Method>,
url: Option<String>,
headers: HashMap<String, String>,
body: Option<Vec<u8>>,
timeout: Option<Duration>,
}
impl HttpRequestBuilder {
pub fn new() -> Self {
Self {
method: None,
url: None,
headers: HashMap::new(),
body: None,
timeout: None,
}
}
pub fn get(mut self, url: impl Into<String>) -> Self {
self.method = Some(Method::GET);
self.url = Some(url.into());
self
}
pub fn post(mut self, url: impl Into<String>) -> Self {
self.method = Some(Method::POST);
self.url = Some(url.into());
self
}
pub fn header(mut self, name: impl Into<String>, value: impl Into<String>) -> Self {
self.headers.insert(name.into(), value.into());
self
}
pub fn body(mut self, data: impl Into<Vec<u8>>) -> Self {
self.body = Some(data.into());
self
}
pub fn timeout(mut self, duration: Duration) -> Self {
self.timeout = Some(duration);
self
}
pub fn build(self) -> Result<HttpRequest, BuilderError> {
// Validate and construct the HttpRequest
let method = self.method.ok_or(BuilderError::MissingMethod)?;
let url = self.url.ok_or(BuilderError::MissingUrl)?;
Ok(HttpRequest {
method,
url,
headers: self.headers,
body: self.body,
timeout: self.timeout.unwrap_or(Duration::from_secs(30)),
})
}
}
// Usage
let request = HttpRequestBuilder::new()
.post("https://api.example.com/data")
.header("Content-Type", "application/json")
.header("Authorization", "Bearer token123")
.body(r#"{"name": "Example", "value": 42}"#)
.timeout(Duration::from_secs(5))
.build()?;
The key to an effective builder-based DSL is maintaining consistent method naming patterns and ensuring each method returns self
to enable chaining. This approach feels natural to users and provides excellent IDE support with autocompletion.
Combinator Patterns
Parser combinators allow you to build complex parsers from simple, reusable components. This approach is particularly effective for text-based DSLs that require sophisticated parsing logic.
use nom::{
IResult,
bytes::complete::{tag, take_while1},
character::complete::{char, digit1, alpha1, multispace0},
sequence::{delimited, tuple},
branch::alt,
combinator::{map, map_res},
};
#[derive(Debug)]
enum Expr {
Number(i64),
Variable(String),
Add(Box<Expr>, Box<Expr>),
Multiply(Box<Expr>, Box<Expr>),
}
fn parse_number(input: &str) -> IResult<&str, Expr> {
map_res(digit1, |s: &str| {
s.parse::<i64>().map(Expr::Number)
})(input)
}
fn parse_variable(input: &str) -> IResult<&str, Expr> {
map(
take_while1(|c: char| c.is_alphabetic() || c == '_'),
|s: &str| Expr::Variable(s.to_string())
)(input)
}
fn parse_parens(input: &str) -> IResult<&str, Expr> {
delimited(
char('('),
delimited(multispace0, parse_expr, multispace0),
char(')')
)(input)
}
fn parse_term(input: &str) -> IResult<&str, Expr> {
alt((parse_number, parse_variable, parse_parens))(input)
}
fn parse_expr(input: &str) -> IResult<&str, Expr> {
let (input, first) = parse_term(input)?;
let (input, _) = multispace0(input)?;
let (input, operations) = nom::multi::many0(
tuple((
alt((char('+'), char('*'))),
multispace0,
parse_term,
multispace0,
))
)(input)?;
Ok((input, operations.into_iter().fold(first, |acc, (op, _, term, _)| {
match op {
'+' => Expr::Add(Box::new(acc), Box::new(term)),
'*' => Expr::Multiply(Box::new(acc), Box::new(term)),
_ => unreachable!(),
}
})))
}
// Usage
let parsed = parse_expr("2 * (x + 5)").unwrap().1;
I’ve found that using libraries like nom
or combine
significantly streamlines this approach. The beauty of combinators is that they compose well—you can build complex parsers incrementally, testing each component as you go.
Embedded Interpreters
Creating a lightweight interpreter allows your DSL to be evaluated at runtime, offering flexibility for dynamic scenarios. This technique is useful when you need to allow users to write scripts that can be loaded and executed without recompilation.
use std::collections::HashMap;
#[derive(Clone, Debug)]
enum Value {
Number(f64),
Boolean(bool),
String(String),
List(Vec<Value>),
}
#[derive(Debug)]
enum Expr {
Literal(Value),
Variable(String),
BinaryOp {
op: Operator,
left: Box<Expr>,
right: Box<Expr>,
},
If {
condition: Box<Expr>,
then_branch: Box<Expr>,
else_branch: Box<Expr>,
},
Function {
params: Vec<String>,
body: Box<Expr>,
},
Call {
function: Box<Expr>,
arguments: Vec<Expr>,
},
}
#[derive(Debug)]
enum Operator {
Add, Subtract, Multiply, Divide,
Equal, NotEqual, Greater, Less,
}
struct Environment {
values: HashMap<String, Value>,
parent: Option<Box<Environment>>,
}
impl Environment {
fn new() -> Self {
Self {
values: HashMap::new(),
parent: None,
}
}
fn with_parent(parent: Environment) -> Self {
Self {
values: HashMap::new(),
parent: Some(Box::new(parent)),
}
}
fn define(&mut self, name: String, value: Value) {
self.values.insert(name, value);
}
fn get(&self, name: &str) -> Option<Value> {
match self.values.get(name) {
Some(value) => Some(value.clone()),
None => match &self.parent {
Some(parent) => parent.get(name),
None => None,
},
}
}
}
struct Interpreter {
environment: Environment,
}
impl Interpreter {
fn new() -> Self {
Self {
environment: Environment::new(),
}
}
fn evaluate(&mut self, expr: &Expr) -> Result<Value, String> {
match expr {
Expr::Literal(val) => Ok(val.clone()),
Expr::Variable(name) => {
self.environment.get(name)
.ok_or_else(|| format!("Undefined variable: {}", name))
},
Expr::BinaryOp { op, left, right } => {
let left_val = self.evaluate(left)?;
let right_val = self.evaluate(right)?;
self.evaluate_binary_op(op, &left_val, &right_val)
},
Expr::If { condition, then_branch, else_branch } => {
let condition_val = self.evaluate(condition)?;
match condition_val {
Value::Boolean(true) => self.evaluate(then_branch),
Value::Boolean(false) => self.evaluate(else_branch),
_ => Err("Condition must evaluate to a boolean".to_string()),
}
},
// Handle other expression types
_ => Err("Not implemented".to_string()),
}
}
fn evaluate_binary_op(&self, op: &Operator, left: &Value, right: &Value) -> Result<Value, String> {
match (op, left, right) {
(Operator::Add, Value::Number(a), Value::Number(b)) => Ok(Value::Number(a + b)),
(Operator::Subtract, Value::Number(a), Value::Number(b)) => Ok(Value::Number(a - b)),
(Operator::Multiply, Value::Number(a), Value::Number(b)) => Ok(Value::Number(a * b)),
(Operator::Divide, Value::Number(a), Value::Number(b)) => {
if *b == 0.0 {
Err("Division by zero".to_string())
} else {
Ok(Value::Number(a / b))
}
},
(Operator::Equal, a, b) => Ok(Value::Boolean(a == b)),
// Handle other operators
_ => Err(format!("Invalid operation: {:?} {:?} {:?}", left, op, right)),
}
}
}
// Usage
let mut interp = Interpreter::new();
let expr = Expr::BinaryOp {
op: Operator::Add,
left: Box::new(Expr::Literal(Value::Number(5.0))),
right: Box::new(Expr::Literal(Value::Number(3.0))),
};
let result = interp.evaluate(&expr).unwrap();
The key to implementing a successful embedded interpreter is breaking down the evaluation process into manageable parts. I typically separate parsing, type checking, and execution phases to maintain clarity and make debugging easier.
Type-driven DSL Design
Rust’s powerful type system can act as a framework for DSL design, enforcing correctness at compile time rather than runtime. This approach uses phantom types, type-level programming, and the type checker to ensure valid DSL expressions.
use std::marker::PhantomData;
// Type-level states
struct Open;
struct Closed;
// Type-level fields
struct Name;
struct Age;
struct Address;
// Our database table builder DSL
struct TableBuilder<State, Fields> {
name: String,
_state: PhantomData<State>,
_fields: PhantomData<Fields>,
}
// Empty tuple type to represent no fields
type NoFields = ();
impl TableBuilder<Open, NoFields> {
fn new(name: impl Into<String>) -> Self {
TableBuilder {
name: name.into(),
_state: PhantomData,
_fields: PhantomData,
}
}
}
// Field definitions with their types
struct Field<N, T> {
name: String,
_name_type: PhantomData<N>,
_value_type: PhantomData<T>,
}
// Add fields to any open table builder
impl<Fields> TableBuilder<Open, Fields> {
fn field<N, T>(self, name: impl Into<String>) -> TableBuilder<Open, (Fields, Field<N, T>)> {
TableBuilder {
name: self.name,
_state: PhantomData,
_fields: PhantomData,
}
}
fn build(self) -> TableBuilder<Closed, Fields> {
TableBuilder {
name: self.name,
_state: PhantomData,
_fields: PhantomData,
}
}
}
// Closed table builders can be used in database operations
impl<Fields> TableBuilder<Closed, Fields> {
fn create_table(&self) -> String {
format!("CREATE TABLE {} (...)", self.name)
}
}
// Type-safe query builder
struct QueryBuilder<Table, Selected, Conditions> {
table: PhantomData<Table>,
_selected: PhantomData<Selected>,
_conditions: PhantomData<Conditions>,
query: String,
}
// Usage
let users_table = TableBuilder::new("users")
.field::<Name, String>("name")
.field::<Age, i32>("age")
.field::<Address, String>("address")
.build();
// The type system ensures we can't add fields after building
// let invalid = users_table.field::<Email, String>("email"); // Compile error!
let create_sql = users_table.create_table();
This technique shines in scenarios where you want to prevent impossible states at compile time. I’ve used it extensively for database query builders, state machines, and protocol implementations where correctness guarantees are critical.
Context-free Grammar Parsing
For more complex DSLs, defining a formal grammar provides structure and clarity. The pest
crate allows you to define context-free grammars in a separate file, which translates to a robust parser.
// In grammar.pest
expression = { term ~ (operator ~ term)* }
term = { number | identifier | "(" ~ expression ~ ")" }
number = @{ ASCII_DIGIT+ ~ ("." ~ ASCII_DIGIT+)? }
identifier = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
operator = { "+" | "-" | "*" | "/" }
whitespace = _{ " " | "\t" | "\n" | "\r" }
// In your Rust code
use pest::{Parser, iterators::Pair};
use pest_derive::Parser;
#[derive(Parser)]
#[grammar = "grammar.pest"]
struct ExpressionParser;
#[derive(Debug)]
enum Expr {
Number(f64),
Identifier(String),
BinaryOp {
op: String,
left: Box<Expr>,
right: Box<Expr>,
},
}
fn parse_expression(pair: Pair<Rule>) -> Expr {
match pair.as_rule() {
Rule::expression => {
let mut pairs = pair.into_inner();
let first = parse_expression(pairs.next().unwrap());
pairs.fold(first, |left, pair| {
match pair.as_rule() {
Rule::operator => {
let op = pair.as_str().to_string();
let right = parse_expression(pairs.next().unwrap());
Expr::BinaryOp {
op,
left: Box::new(left),
right: Box::new(right),
}
}
_ => left,
}
})
}
Rule::term => {
let pair = pair.into_inner().next().unwrap();
parse_expression(pair)
}
Rule::number => Expr::Number(pair.as_str().parse().unwrap()),
Rule::identifier => Expr::Identifier(pair.as_str().to_string()),
_ => unreachable!(),
}
}
fn parse(input: &str) -> Result<Expr, pest::error::Error<Rule>> {
let pairs = ExpressionParser::parse(Rule::expression, input)?;
let expr = parse_expression(pairs.peek().unwrap());
Ok(expr)
}
// Usage
let parsed = parse("2 * (a + 10)").unwrap();
The main advantage of this approach is the separation of grammar definition from parsing logic. The grammar file serves as clear documentation, and the pest
crate handles the heavy lifting of parser generation.
Tree-walking Visitors
The visitor pattern allows for clean separation between your DSL’s structure and the operations performed on it. This technique is especially useful for implementing multiple interpretations of your DSL without modifying its core representation.
trait Visitor<T> {
fn visit_number(&mut self, value: f64) -> T;
fn visit_identifier(&mut self, name: &str) -> T;
fn visit_binary_op(&mut self, op: &str, left: &Expr, right: &Expr) -> T;
}
enum Expr {
Number(f64),
Identifier(String),
BinaryOp {
op: String,
left: Box<Expr>,
right: Box<Expr>,
},
}
impl Expr {
fn accept<T>(&self, visitor: &mut impl Visitor<T>) -> T {
match self {
Expr::Number(n) => visitor.visit_number(*n),
Expr::Identifier(name) => visitor.visit_identifier(name),
Expr::BinaryOp { op, left, right } => visitor.visit_binary_op(op, left, right),
}
}
}
struct Evaluator {
variables: std::collections::HashMap<String, f64>,
}
impl Visitor<Result<f64, String>> for Evaluator {
fn visit_number(&mut self, value: f64) -> Result<f64, String> {
Ok(value)
}
fn visit_identifier(&mut self, name: &str) -> Result<f64, String> {
self.variables.get(name)
.copied()
.ok_or_else(|| format!("Undefined variable: {}", name))
}
fn visit_binary_op(&mut self, op: &str, left: &Expr, right: &Expr) -> Result<f64, String> {
let left_val = left.accept(self)?;
let right_val = right.accept(self)?;
match op {
"+" => Ok(left_val + right_val),
"-" => Ok(left_val - right_val),
"*" => Ok(left_val * right_val),
"/" => {
if right_val == 0.0 {
Err("Division by zero".to_string())
} else {
Ok(left_val / right_val)
}
}
_ => Err(format!("Unknown operator: {}", op)),
}
}
}
struct PrettyPrinter;
impl Visitor<String> for PrettyPrinter {
fn visit_number(&mut self, value: f64) -> String {
value.to_string()
}
fn visit_identifier(&mut self, name: &str) -> String {
name.to_string()
}
fn visit_binary_op(&mut self, op: &str, left: &Expr, right: &Expr) -> String {
format!("({} {} {})", left.accept(self), op, right.accept(self))
}
}
// Usage
let expr = Expr::BinaryOp {
op: "*".to_string(),
left: Box::new(Expr::Number(2.0)),
right: Box::new(Expr::BinaryOp {
op: "+".to_string(),
left: Box::new(Expr::Identifier("x".to_string())),
right: Box::new(Expr::Number(3.0)),
}),
};
let mut evaluator = Evaluator {
variables: [("x".to_string(), 5.0)].into_iter().collect(),
};
let result = expr.accept(&mut evaluator).unwrap(); // 16.0
let mut printer = PrettyPrinter;
let formatted = expr.accept(&mut printer); // "(2 * (x + 3))"
I’ve found that implementing a visitor-based approach early in DSL development pays dividends later—especially as you add more operations like optimization, validation, and code generation without cluttering your core AST implementation.
By combining these seven techniques, you can create DSLs in Rust that are expressive, type-safe, and efficient. Each approach has its strengths, and often the most effective DSLs combine multiple techniques to achieve the best balance of usability and performance.
The most important lesson I’ve learned is to start with the user experience you want to create and work backward to choose the right implementation techniques. Focus on making your DSL feel natural and intuitive to the target users, and the technical implementation decisions will follow more clearly.