10 Java Error Handling Techniques That Keep Your Application Running in Production
Learn 10 proven Java error-handling techniques—from structured exceptions to circuit breakers—that keep your application running when failures strike. Read now.
Errors are part of every production system. I have spent years debugging applications where a single unhandled exception brought down the entire service. The difference between a brittle application and one that survives failures comes down to how you anticipate, catch, and recover from errors. Over time I have collected ten techniques that transformed the way I write Java code. Each one is simple on its own, but together they create a safety net that keeps your application running even when things go wrong.
The first technique is about structuring your exceptions with domain-specific types. Instead of throwing generic Exception or RuntimeException, create a hierarchy that reflects your business logic. For example, start with an abstract AppException that holds an error code. Then create subclasses like PaymentDeclinedException, InsufficientFundsException, and InvalidOrderException. This way, when you catch an exception, you know exactly what went wrong without parsing a string message. Let me show you how I build this.
public abstract class AppException extends RuntimeException {
private final ErrorCode code;
public AppException(ErrorCode code, String message, Throwable cause) {
super(message, cause);
this.code = code;
}
public ErrorCode getCode() { return code; }
}
public class PaymentDeclinedException extends AppException {
private final String transactionId;
public PaymentDeclinedException(String transactionId, String reason) {
super(ErrorCode.PAYMENT_DECLINED, reason, null);
this.transactionId = transactionId;
}
public String getTransactionId() { return transactionId; }
}
public class InsufficientFundsException extends PaymentDeclinedException {
private final BigDecimal balance;
public InsufficientFundsException(String transactionId, BigDecimal balance) {
super(transactionId, "Insufficient funds");
this.balance = balance;
}
public BigDecimal getBalance() { return balance; }
}
I always catch the most specific exception first. If I write catch (PaymentDeclinedException e) I know it is a payment issue, not a database problem. Broad catches like catch (Exception e) hide bugs and make recovery impossible. I also include extra fields like transactionId or balance so the caller can take informed action. This hierarchy makes error handling expressive and gives you the power to react differently to each failure.
The second technique is a consistent global error handler. When you build a web application, you want every unhandled exception to return a standard JSON response instead of a stack trace. Frameworks like Spring provide @ControllerAdvice for this. I write a single class that catches all exceptions and maps them to a consistent error envelope.
@ControllerAdvice
public class GlobalExceptionHandler {
@ExceptionHandler(AppException.class)
public ResponseEntity<ErrorResponse> handleAppException(AppException ex) {
ErrorResponse body = new ErrorResponse(
ex.getCode().toString(),
ex.getMessage(),
extractDetails(ex)
);
return ResponseEntity.status(determineHttpStatus(ex.getCode())).body(body);
}
@ExceptionHandler(ConstraintViolationException.class)
public ResponseEntity<ErrorResponse> handleValidation(ConstraintViolationException ex) {
List<FieldError> errors = ex.getConstraintViolations().stream()
.map(v -> new FieldError(v.getPropertyPath().toString(), v.getMessage()))
.toList();
return ResponseEntity.badRequest()
.body(new ErrorResponse("VALIDATION_FAILED", "Input validation failed", errors));
}
@ExceptionHandler(Exception.class)
public ResponseEntity<ErrorResponse> handleUnknown(Exception ex) {
log.error("Unhandled exception", ex);
return ResponseEntity.status(500)
.body(new ErrorResponse("INTERNAL_ERROR", "An unexpected error occurred"));
}
}
I log the full stack trace server-side, but the client only sees a safe error envelope with a code and a message. Never expose internal details like class names or line numbers. This handler gives you a single place to change how errors look across your entire API.
The third technique is using result types for monadic error handling. Sometimes throwing an exception is too heavy for expected failures, like a declined payment or a missing record. Instead, return a type that represents success or failure. This makes error handling part of the method signature, so callers cannot ignore it.
public sealed interface Result<T> permits Success, Failure {
record Success<T>(T value) implements Result<T> {}
record Failure<T>(Throwable cause, ErrorCode code) implements Result<T> {}
default <U> Result<U> map(Function<T, U> f) {
return switch (this) {
case Success<T> s -> new Success<>(f.apply(s.value()));
case Failure<T> f -> new Failure<>(f.cause(), f.code());
};
}
}
// Usage in service
public Result<Payment> processPayment(Order order) {
try {
Payment payment = paymentGateway.charge(order.getTotal());
return new Success<>(payment);
} catch (GatewayTimeoutException e) {
return new Failure<>(e, ErrorCode.GATEWAY_TIMEOUT);
} catch (GatewayDeclineException e) {
return new Failure<>(e, ErrorCode.PAYMENT_DECLINED);
}
}
// Consumer
Result<Payment> result = paymentService.processPayment(order);
switch (result) {
case Success<Payment> s -> deliverOrder(s.value());
case Failure<Payment> f -> handlePaymentFailure(f);
}
I love this pattern because it forces you to handle every possible outcome. The compiler will complain if you do not cover both Success and Failure. This stops exceptions from bubbling up silently. Libraries like Vavr have a built-in Try type, but you can implement a simple Result yourself.
The fourth technique is retrying with exponential backoff. Transient failures happen all the time – a network hiccup, a database connection timeout, a temporary service outage. Throwing up your hands and crashing is not the answer. I write a small retry executor that waits longer after each attempt and adds random jitter to prevent a thundering herd.
public class RetryExecutor {
public <T> T retry(Callable<T> action, int maxAttempts, Duration baseDelay) {
Exception lastException = null;
for (int attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return action.call();
} catch (Exception e) {
lastException = e;
if (attempt == maxAttempts) break;
long delay = (long) (baseDelay.toMillis() * Math.pow(2, attempt - 1));
delay += ThreadLocalRandom.current().nextLong(0, delay / 2); // jitter
try { Thread.sleep(delay); } catch (InterruptedException ie) { Thread.currentThread().interrupt(); throw new RuntimeException(ie); }
}
}
throw new RetryExhaustedException("All attempts failed", lastException);
}
}
I only retry operations that are idempotent – meaning repeating them does not cause harm. For example, a read from a cache or a payment capture that has already been voided. I also set a maximum number of attempts, usually three, and a base delay of a few hundred milliseconds. The jitter is important because if all threads retry at the same moment, you create a bigger problem. For production systems, I use Resilience4j, which handles configuration and metrics.
The fifth technique is the circuit breaker. When an external service starts failing repeatedly, retrying only makes things worse. The circuit breaker stops further calls immediately and serves a fallback. After a timeout, it allows one request through to test recovery. I built a simple version to understand the concept.
public class CircuitBreaker {
private enum State { CLOSED, OPEN, HALF_OPEN }
private final AtomicReference<State> state = new AtomicReference<>(State.CLOSED);
private final AtomicInteger failureCount = new AtomicInteger(0);
private final int threshold;
private final long timeoutMs;
private volatile long lastFailureTime;
public CircuitBreaker(int threshold, long timeoutMs) {
this.threshold = threshold;
this.timeoutMs = timeoutMs;
}
public <T> T call(Supplier<T> operation, Supplier<T> fallback) {
if (state.get() == State.OPEN) {
if (System.currentTimeMillis() - lastFailureTime > timeoutMs) {
state.compareAndSet(State.OPEN, State.HALF_OPEN);
} else {
return fallback.get();
}
}
try {
T result = operation.get();
failureCount.set(0);
state.set(State.CLOSED);
return result;
} catch (Exception e) {
failureCount.incrementAndGet();
lastFailureTime = System.currentTimeMillis();
if (failureCount.get() >= threshold) {
state.set(State.OPEN);
}
return fallback.get();
}
}
}
The circuit is closed normally. When errors cross a threshold, it opens and blocks all calls. After the timeout, it goes half‑open – if one call succeeds, it closes again; if it fails, it stays open. This protects your system from cascading failures. I always wrap remote calls like HTTP requests or database queries in a circuit breaker.
The sixth technique is using fallback strategies gracefully. When an operation fails and cannot retry, you should provide an alternative result that is acceptable. It might be stale data, a cached response, or a default value. The key is to degrade the experience instead of showing an error page.
public class ProductService {
private final ProductRepository primaryRepo;
private final ProductRepository backupRepo; // stale but available
public Product getProduct(Long id) {
try {
return primaryRepo.findById(id).orElseThrow();
} catch (Exception e) {
log.warn("Primary DB failed, trying backup for product {}", id, e);
try {
return backupRepo.findById(id)
.orElseThrow(() -> new ProductNotFoundException(id));
} catch (Exception backupException) {
log.error("Backup also failed for product {}", id, backupException);
// Last resort: return a static fallback
return new Product(id, "Product temporarily unavailable", "N/A", BigDecimal.ZERO);
}
}
}
}
I make sure the fallback is clearly marked as degraded. For example, I add a header to the HTTP response like X-Content-Stale: true. The user understands that the information may not be current. This is far better than a 500 error that breaks the whole page.
The seventh technique is logging errors with structured context. A stack trace alone does not tell you which order failed or what the user was doing. I always include relevant identifiers and state at the point of failure.
try {
paymentGateway.charge(orderId, amount);
} catch (PaymentException e) {
log.error("Payment failed for order {} amount {} reason {}",
orderId, amount, e.getMessage(), e);
MDC.put("orderId", orderId.toString());
MDC.put("transactionId", e.getTransactionId());
log.warn("Payment failure details", e);
MDC.clear();
}
I use MDC (Mapped Diagnostic Context) from SLF4J to add fields that get attached to every subsequent log line. This way, when I search my logs in Elasticsearch, I can filter by order ID or transaction ID without parsing message text. Structured logging with JSON output makes this even better.
The eighth technique is isolating error-prone components with bulkheads. A bulkhead limits the number of concurrent calls to a failing component so it cannot exhaust your thread pool or database connections. Think of it like a ship with watertight compartments – if one compartment floods, the ship stays afloat.
public class Bulkhead {
private final Semaphore semaphore;
private final int maxConcurrent;
public Bulkhead(int maxConcurrent) {
this.maxConcurrent = maxConcurrent;
this.semaphore = new Semaphore(maxConcurrent);
}
public <T> T call(Callable<T> action, long timeoutMs) throws Exception {
if (!semaphore.tryAcquire(timeoutMs, TimeUnit.MILLISECONDS)) {
throw new BulkheadFullException("Too many concurrent calls");
}
try {
return action.call();
} finally {
semaphore.release();
}
}
}
I use separate bulkheads for different services – one for payments, one for inventory, one for recommendations. If the payment service slows down, it only blocks the payment thread pool, not the inventory one. This prevents a single bad neighbor from taking down your whole application.
The ninth technique is handling graceful degradation with feature flags. Feature flags let you disable problematic functionality at runtime without redeploying. Combine them with automated error detection to turn off a feature when error rates spike.
public class FeatureToggles {
private final AtomicReference<Map<String, Boolean>> flags = new AtomicReference<>(new ConcurrentHashMap<>());
public boolean isEnabled(String feature) {
return flags.get().getOrDefault(feature, true);
}
public void autoDisableOnErrors(String feature, int errorCount, Duration window) {
if (errorCount > threshold) {
flags.updateAndGet(map -> {
Map<String, Boolean> updated = new ConcurrentHashMap<>(map);
updated.put(feature, false);
return updated;
});
log.warn("Auto-disabled feature {} due to errors", feature);
}
}
}
// Usage
if (featureToggles.isEnabled("new-recommendation-engine")) {
return recommendationService.getRecommendations(userId);
} else {
return legacyRecommendationService.getRecommendations(userId);
}
I use this for gradual rollouts. If the new recommendation engine starts throwing exceptions, I can flip the flag and fall back to the old engine. The system keeps running while I fix the issue. Monitoring tools can automatically toggle flags based on error metrics.
The tenth technique is conducting blameless post-mortems. Every significant error should lead to a root cause analysis. I keep a template that asks: What happened? What was the impact? What was the root cause? What changed? How was it detected? What prevented recovery? What improvements are planned? After the analysis, I update the code with new exception types, better logging, and automated recovery steps.
Resilience is not built in one day. With each error I handle, the application becomes stronger. Start with one technique, like structured exceptions, then add the global handler, then retry, and so on. Over time you will find that failures become manageable incidents rather than crises. The goal is to keep your application serving users, even when parts of the system are failing.