Java Flight Recorder Techniques for Production Profiling and Diagnostics
Java Flight Recorder provides low-overhead insights for production systems. I’ve used these ten techniques to resolve performance bottlenecks and memory issues without restarting applications.
Programmatic Recording Control
Starting recordings directly from code offers flexibility. I often use predefined configurations like profile
for detailed metrics during peak loads. The default
configuration works well for continuous monitoring with minimal impact.
public class RecordingManager {
public void beginDiagnosticCapture() {
Configuration config = Configuration.getConfiguration("profile");
FlightRecorder.getFlightRecorder().newRecording()
.setName("ProdDiagnostics")
.setSettings(config.getSettings())
.start();
}
}
This approach helps me initiate diagnostics during specific business operations. I avoid hardcoding settings by fetching configurations dynamically.
Custom Event Tracking
Custom events reveal business-specific patterns. When optimizing payment processing, I created event markers for transaction milestones:
@Label("OrderFulfillment")
class OrderEvent extends Event {
@Label("OrderID")
String orderId;
@Label("ProcessingTime")
long durationMs;
}
public void completeOrder(Order order) {
OrderEvent event = new OrderEvent();
event.orderId = order.getId();
event.durationMs = System.currentTimeMillis() - order.getStartTime();
event.commit();
}
These events appear alongside JVM metrics in Mission Control. I discovered inconsistent database response times by correlating events with jdk.JDBCExecution
recordings.
Memory Leak Identification
Sampling allocations helps detect creeping memory consumption. I combine allocation tracking with heap statistics:
Recording memRecording = new Recording();
memRecording.enable("jdk.ObjectAllocationSample")
.withStackTrace(true)
.withPeriod(Duration.ofSeconds(3));
memRecording.enable("jdk.GCHeapSummary");
memRecording.start();
In one production incident, this revealed a cache implementation retaining references beyond TTL. Stack traces showed the problematic initialization path.
Lock Contention Analysis
Thread blocking significantly impacts throughput. I monitor lock acquisitions with microsecond thresholds:
Recording lockRecording = new Recording();
lockRecording.enable("jdk.JavaMonitorEnter")
.withThreshold(Duration.ofMillis(8));
lockRecording.enable("jdk.ThreadPark");
lockRecording.start();
Adding jdk.ThreadPark
captures condition variable waits. Recently, this exposed a synchronized logger causing request pileups during peak traffic.
Garbage Collection Inspection
GC pauses directly affect user experience. I capture unfiltered collection events:
Recording gcRecording = new Recording();
gcRecording.enable("jdk.GarbageCollection")
.withoutThreshold();
gcRecording.enable("jdk.GCPhasePause");
gcRecording.start();
Correlating these with application metrics revealed young-gen collections triggering during batch jobs. Increasing Eden space reduced pause frequency by 70%.
Exception Monitoring
Tracking errors prevents silent failures. For high-volume systems, I limit stack traces:
Recording errorRecording = new Recording();
errorRecording.enable("jdk.ExceptionThrown")
.withoutStackTrace();
errorRecording.enable("jdk.ExceptionStatistics");
errorRecording.start();
The statistics event provides aggregated counts without overhead. I discovered a misconfigured client throwing 500 unnecessary exceptions per second.
HTTP Endpoint Profiling
Web service optimization requires endpoint-level visibility. I instrument handlers:
@Label("APIRequest")
class ApiEvent extends Event {
@Label("Endpoint")
String path;
@Label("Status")
int statusCode;
@Label("Latency")
long nanos;
}
public void handle(HttpServletRequest req, HttpServletResponse res) {
long start = System.nanoTime();
// Processing logic
ApiEvent event = new ApiEvent();
event.path = req.getRequestURI();
event.statusCode = res.getStatus();
event.nanos = System.nanoTime() - start;
event.commit();
}
Histograms in Mission Control showed P99 latency spikes on /search
endpoints, leading to query optimization.
OutOfMemory Diagnostics
Automated dumps during memory crises capture critical evidence:
java -XX:+FlightRecorder -XX:StartFlightRecording:dumponexit=true \
-XX:FlightRecorderOptions:memorysize=200m \
-XX:OnOutOfMemoryError="jcmd %pid JFR.dump filename=/crash-dumps/oom.jfr"
I configure larger memory buffers for complex heaps. The dump revealed a memory-mapped file library leaking native memory.
Container Configuration
In Kubernetes environments, persistent storage prevents data loss:
java -XX:StartFlightRecording:disk=true,maxsize=2G,maxage=48h \
-XX:FlightRecorderOptions:repository=/persistent/jfr-dumps \
-Djava.io.tmpdir=/scratch
Setting maxage
automatically prunes old files. I mount volumes with write buffering disabled to avoid container restarts losing recordings.
JMX Integration
Remote management enables dynamic control:
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
FlightRecorderMXBean frBean = new PlatformRecorderMXBean(mbs);
long recId = frBean.newRecording();
frBean.setConfiguration(recId, "profile");
frBean.setMaxAge(recId, Duration.ofMinutes(45));
frBean.start(recId);
I integrate this with monitoring dashboards to start recordings when error rates exceed thresholds. The MBean API supports scripting for automated diagnostics.
These techniques provide production-safe visibility. By combining custom events with JVM metrics, I’ve resolved complex performance issues in minutes rather than days. Start with minimal configurations and gradually add detail as needed—the key is sustainable observability without disrupting services.