I remember the first time I tried to make my Java code faster. I had a hunch that using a StringBuilder was better than plain concatenation, and I changed it everywhere. The application did not feel any faster. That was when I realized performance optimization without measurement is just guesswork. You cannot fix what you do not understand.
Java Microbenchmark Harness (JMH) changed everything for me. It is a framework that helps you write accurate benchmarks by controlling for JVM warmup, dead code elimination, and compiler optimizations. It gives you numbers you can trust. Once you have those numbers, you can apply targeted optimizations based on real data, not guesses.
I will walk you through ten strategies I use regularly with JMH. Each one comes with a story, a code example, and a lesson. Stick with me, and by the end you will have a practical toolkit for measuring and improving Java performance.
1. Setting Up a JMH Benchmark Project Correctly
You need a Maven or Gradle project. Add the JMH core dependency and the annotation processor. I use Maven because it is straightforward.
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.37</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.37</version>
<scope>provided</scope>
</dependency>
Then create a simple class.
import org.openjdk.jmh.annotations.*;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class StringConcatBenchmark {
private String a = "Hello";
private String b = "World";
private String c = "!";
@Benchmark
public String plusOperator() {
return a + " " + b + c;
}
@Benchmark
public String stringBuilder() {
return new StringBuilder(a).append(" ").append(b).append(c).toString();
}
}
Run with mvn clean verify and you get results in the target/benchmark-reports folder. The first time I did this, I saw that StringBuilder was about 20% faster than +. That was nice, but the real lesson was that the JVM already optimizes concatenation in many cases. Without JMH, I would never have known.
2. Choosing the Right Benchmark Mode
JMH offers several modes. I use AverageTime when I care about latency – how long each operation takes. I use Throughput when I care about capacity – how many operations per second the system can handle. For understanding worst‑case scenarios, I use SampleTime which gives percentiles.
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
public class MyThroughputBenchmark { ... }
If you are building a web server, use throughput. If you are writing a trading engine, use average time. I once worked on a logging library where tail latency mattered. SampleTime helped me find occasional spikes caused by garbage collection.
3. Controlling Warmup and Measurement Iterations
The JVM needs time to warm up. The Just‑In‑Time compiler (JIT) needs to run a few times before code is fully optimized. You must let that happen before taking measurements.
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
public class MyBenchmark {
// ...
}
I usually start with five warmup iterations. Then ten measurement iterations per fork, and three forks total. This gives me a reliable average and an error margin. If the error is high, I increase the measurement iterations. Never skip warmup – your numbers will be misleading.
4. Avoiding Dead Code Elimination with Blackhole
The JIT is smart. If your benchmark method computes something and does not use the result, the JIT may remove the computation entirely. Your benchmark then shows zero time, which is useless.
@Benchmark
public void consumeResult(Blackhole bh) {
int sum = expensiveComputation();
bh.consume(sum); // tells JMH "I need this result"
}
The Blackhole object is a special JMH class that consumes values in a way the compiler cannot optimize away. If your benchmark method returns a value, JMH automatically uses a blackhole for it. For void methods, you must pass a Blackhole parameter. I always do this – it saved me from countless false positives.
5. Using @Param to Vary Inputs and Prevent Constant Folding
When you write a benchmark, the JIT knows the exact inputs if they are literals. It can fold constants – pre‑compute the result. That is not realistic. You want to test with different sizes or random data.
@State(Scope.Benchmark)
public class SortBenchmark {
@Param({"100", "1000", "10000"})
private int size;
private int[] data;
@Setup
public void setup() {
data = new int[size];
Random rand = new Random(42);
Arrays.setAll(data, i -> rand.nextInt());
}
@Benchmark
public int[] sort() {
Arrays.sort(data.clone());
return data;
}
}
JMH will run the benchmark three times, once for each size. I once compared sorting algorithms and found that insertion sort was faster for small arrays but fell apart at 10,000 elements. Without @Param, I would have only seen the 100‑element case and drawn the wrong conclusion.
6. Measuring Memory Allocation with Profilers
Execution time is not the whole story. High memory allocation can cause frequent garbage collection and slow down your application. JMH integrates with profilers like GCProfiler to report bytes allocated per operation.
public static void main(String[] args) throws Exception {
Options opt = new OptionsBuilder()
.include(SortBenchmark.class.getSimpleName())
.addProfiler(GCProfiler.class)
.build();
new Runner(opt).run();
}
The output will show something like ·gc.alloc.rate and ·gc.alloc.rate.norm. I once found that a simple loop over ArrayList was allocating more than expected because of autoboxing. Switching to primitive collections solved the memory problem and made the code twice as fast.
7. Comparing Different Implementations Fairly
When you need to choose between an ArrayList and a LinkedList, you cannot trust your gut. You need numbers. Write a single benchmark class with multiple methods, each for one variant.
@Benchmark
public List<Integer> arrayListAdd() {
List<Integer> list = new ArrayList<>();
for (int i = 0; i < 1000; i++) list.add(i);
return list;
}
@Benchmark
public List<Integer> linkedListAdd() {
List<Integer> list = new LinkedList<>();
for (int i = 0; i < 1000; i++) list.add(i);
return list;
}
Run with @Fork(3) to reduce random JVM effects. Look at the error bars. If the difference is smaller than the error, it is not statistically significant. I learned this the hard way when I replaced all ArrayList with LinkedList believing it was faster – only to find the performance was the same and the memory overhead was worse.
8. Using @CompilerControl for Forced Inlining Decisions
Sometimes you want to understand how inlining affects performance. The JIT decides which methods to inline. You can override that decision with @CompilerControl. This is useful for diagnostic purposes, not for production.
@Benchmark
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void noInlineVersion() {
helperMethod();
}
@Benchmark
@CompilerControl(CompilerControl.Mode.INLINE)
public void forceInlineVersion() {
helperMethod();
}
I once had a small helper method that was called inside a hot loop. The JIT was inlining it, but I suspected the inlined code was too large. By forcing DONT_INLINE, I saw that the performance dropped significantly. That told me the inlining was beneficial, and I stopped worrying.
9. Running Benchmarks from the Command Line with Defaults
You can run JMH programmatically from a simple main method. This is useful for continuous integration or when you want to pass custom parameters.
public class BenchmarkRunner {
public static void main(String[] args) throws Exception {
org.openjdk.jmh.Main.main(args);
}
}
Then on the command line:
java -jar mybenchmarks.jar -wi 5 -i 10 -f 3 -bm avgt
I use this in my CI pipeline to detect performance regressions. Every commit runs the same benchmarks, and if the score drops by more than 5%, the build fails. This keeps me honest.
10. Analyzing Results and Making Optimization Decisions
The JMH output gives you a score with error margins. The real skill is interpreting it.
Benchmark Mode Cnt Score Error Units
StringConcatBenchmark.plusOperator avgt 30 12.345 ± 0.123 ns/op
StringConcatBenchmark.stringBuilder avgt 30 10.456 ± 0.089 ns/op
A 15% difference in nanoseconds matters only if that code runs millions of times per second. If it runs once per request, you will not notice. Always profile the whole application to see where time is actually spent. I use a profiler together with JMH. The profiler tells me which methods are hot. JMH tells me exactly how hot they are and how much I can improve them.
That is how I approach performance now. I do not guess. I measure. JMH gives me reliable measurements. I apply optimizations only where the numbers say they matter. The result is code that is fast, but not prematurely optimized. You can do the same. Start with one small benchmark. Run it. See what happens. Then improve.
Remember, the goal is not to make every line faster. It is to make the code that runs the most and consumes the most resources as efficient as possible. JMH shows you the way.