When I first started optimizing Java applications for production, I felt like I was adjusting knobs on a black box. The JVM seemed mysterious, but over time I learned that each knob has a clear purpose. The goal of tuning is not to make the application faster in every possible way. The goal is to make it predictable, stable, and efficient within the resources you have. Let me walk you through ten techniques that I have used in real projects, with real pain points and real fixes.
Choosing the right garbage collector is the first decision you make, and it shapes everything else. The JVM comes with several collectors, each good for a different kind of work. If your application is a web server or an API that needs low response times, you want a collector that keeps pause times short. I once ran a trading application where a pause of two hundred milliseconds was too long. We switched to ZGC, and the pauses dropped to under ten milliseconds. For a batch processing job that runs overnight, you might prefer the Parallel GC, which maximizes throughput but can pause for seconds. The default G1GC is a solid middle ground. Here is how you set them:
# Use G1GC (default since JDK 9)
java -XX:+UseG1GC -jar app.jar
# Use ZGC for sub-millisecond pauses
java -XX:+UseZGC -jar app.jar
# Use Shenandoah for low pause times with concurrent compaction
java -XX:+UseShenandoahGC -jar app.jar
# Use Parallel GC for maximum throughput in non‑interactive workloads
java -XX:+UseParallelGC -jar app.jar
I recommend you start with G1GC and only switch if you see pause times that violate your service level objectives. Monitor the GC logs to see how long each pause takes. If you see many full GC events, that is a red flag. ZGC and Shenandoah handle huge heaps better than G1GC, but they also use more CPU during concurrent phases. Test with your actual load before deciding.
Heap size is the most common tuning parameter people touch, and it is also where beginners make the biggest mistake. Setting the heap too small causes frequent garbage collections, which waste CPU and hurt performance. Setting the heap too large leads to long pause times because the collector has more memory to scan. Worse, a huge heap can cause the operating system to swap, killing performance entirely. I once saw a team set the heap to thirty gigabytes on a machine with thirty‑two gigs of physical memory. The JVM used almost all of it, leaving nothing for the OS caches and the metaspace. The application crashed within hours.
The rule of thumb I use is to allocate no more than seventy‑five percent of the available memory to the heap, and leave the rest for the metaspace, thread stacks, and operating system. You can set fixed sizes:
# Fixed heap size for traditional deployments
java -Xms2g -Xmx2g -jar app.jar
But if you run in containers, use percentage‑based flags so the JVM respects the container memory limit:
# Percentage-based heap for container environments
java -XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=50.0 -jar app.jar
The -Xms flag sets the initial heap, and -Xmx sets the maximum. If you make them equal, the JVM does not have to resize the heap during runtime. That resize operation can cause a stop‑the‑world pause, so I always set both to the same value in production. Then I watch the GC logs to see if the heap is sized correctly. If I see many minor collections or high promotion rates, I increase the heap. If the GC logs show long full GCs, I consider reducing the heap or switching to a low‑pause collector.
The young generation is where most objects die. In many applications, ninety percent of objects are garbage before the next minor GC. Tuning the young generation size affects how often these collections happen. A small young generation fills up quickly, causing frequent minor collections. A large young generation delays the promotion of objects to the old generation, which can reduce old generation pressure but also increase the time spent in minor collections.
With G1GC, you do not set the young generation size directly. Instead, you control the region sizes and the target young gen occupancy. I use these flags to give a hint to G1:
# Set the young generation size as a percentage of total heap (G1GC)
java -XX:G1NewSizePercent=10 -XX:G1MaxNewSizePercent=30 -jar app.jar
If you use the Parallel GC, you can set the ratio between young and old generation:
# For Parallel GC, set the explicit ratio
java -XX:NewRatio=3 -jar app.jar
A NewRatio of 3 means the old generation is three times the young generation. That is a good starting point. I learned this the hard way when I worked on a message processing system. The default young generation was too small, so we saw a minor GC every two seconds. The application was spending ten percent of its time just collecting garbage. I increased the young generation by doubling the heap, and the minor GC frequency dropped to once every ten seconds. The throughput improved significantly.
The metaspace holds class metadata. In older JVMs, this was called the permanent generation. Metaspace grows automatically by default, which can cause the JVM to grab extra memory from the OS at runtime. That growth can trigger a full GC or cause out‑of‑memory errors if the metaspace expands without bound. I set both the initial and maximum metaspace sizes to the same value so the JVM never has to resize:
# Set initial and maximum metaspace
java -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=256m -jar app.jar
How much metaspace does your application need? If you load many classes, like a Spring Boot application with hundreds of libraries, you may need 512 MB or more. I check with jstat -gc_metacapacity while the application is under load. I look at the current metaspace usage and set the fixed size a little above the peak. That prevents unnecessary growth without wasting memory.
Compressed object pointers are a free performance boost. The JVM uses 64‑bit pointers by default on 64‑bit systems. But if your heap is smaller than 32 GB, the JVM can use 32‑bit pointers. This reduces the memory required for each object reference by half, which also improves cache locality. I have seen a ten to fifteen percent reduction in heap usage just by enabling compressed OOPs. It is enabled by default, but you can explicitly set it:
# Explicitly enable compressed OOPs (default)
java -XX:+UseCompressedOops -jar app.jar
If your heap exceeds 32 GB, the JVM automatically disables them. In that case, you might consider keeping the heap below 32 GB to enjoy the benefit. For example, if you need 40 GB of memory for your objects, you might actually be better off running two JVM instances each with a 20 GB heap. The compressed pointers save enough memory to offset the duplication. I ran a simulation once where a 30 GB heap with compressed OOPs used less total memory than a 35 GB heap without them. Always test.
Each thread in a Java application has its own stack. The default stack size on most platforms is 1 MB. If you have thousands of threads, that consumes gigabytes of memory. For applications that are I/O bound, you can often reduce the stack size to 256 KB without causing errors. I worked on a high‑concurrency web server where we had two thousand threads. Reducing the stack from 1 MB to 256 KB saved 1.5 GB of memory. The trick is to find the minimum safe value.
# Reduce thread stack size to 256KB
java -Xss256k -jar app.jar
I test by running the application under maximum call depth. I wrote a small script that triggers deep recursion in the code paths that use the most stack. If I see a StackOverflowError, I increase the stack. Otherwise, I keep it low. Some frameworks like Java EE or some JVM libraries expect a certain stack depth, so benchmark with real work.
GC logs are the stethoscope for the JVM. Without them, you are guessing. I enable unified logging in all production applications, with file rotation so logs do not grow forever:
# Unified logging (JDK 9+)
java -Xlog:gc*:file=gc.log:time,uptime,level,tags:filecount=5,filesize=10m -jar app.jar
The logs show you the cause of each GC, how long it took, and how much memory was freed. I look at four things: the frequency of young GCs, the duration of each pause, the promotion rate, and whether full GCs happen. If the promotion rate increases over time, that is a sign of a memory leak. If full GCs happen often, either the heap is too small, or the concurrent mark phase in G1GC cannot keep up. I feed the logs into a free tool like GCeasy, which plots the data and highlights anomalies. That visualization saved me hours of manual inspection.
G1GC has several knobs that control its behavior. The most important is the pause time goal:
# Set a pause time goal (default 200ms)
java -XX:MaxGCPauseMillis=100 -jar app.jar
Setting this too low, like 50 ms, forces G1 to do more small collections, which can increase overhead and might cause more full GCs because the concurrent marking cannot finish. I usually start with the default 200 ms and only lower it if the application needs lower latency. I also adjust the number of background threads:
# Adjust the number of parallel GC threads
java -XX:ParallelGCThreads=4 -jar app.jar
On machines with many cores, the default number of GC threads equals the number of CPU cores. That can cause CPU contention if the application is already CPU‑bound. I cap it at four or eight threads to leave room for application work. The region size matters too:
# Set the region size (power of two, 1MB to 64MB)
java -XX:G1HeapRegionSize=4m -jar app.jar
Larger regions are good for bigger heaps because they reduce the number of regions and the overhead of tracking them. Smaller regions allow more fine‑grained collection. I pick a region size so that there are at least two thousand regions. For a 16 GB heap, 8 MB regions give about two thousand regions. You can calculate it.
Java Flight Recorder (JFR) is a low‑overhead profiling tool built into the JVM. I use it to diagnose runtime issues without external agents. You can start a recording when the application starts:
# Start a JFR recording when launching the application
java -XX:StartFlightRecording=filename=recording.jfr,duration=60s,settings=profile -jar app.jar
The recording contains data on method profiling, lock contention, garbage collection, and memory leaks. I open the .jfr file in JDK Mission Control. I look at the allocation rate. If the application allocates hundreds of megabytes per second, I find the hot methods and optimize them. I once found a library that was creating a new String object for every HTTP header. A small caching change reduced allocation by eighty percent and cut GC pauses in half. JFR also shows thread blocking events. If many threads are stuck on a single lock, that is a bottleneck you need to refactor.
The last technique is monitoring native memory. The heap is not the only memory user. Libraries, threads, the JVM internals, and native code all allocate memory outside the heap. If this grows without control, you can hit an out‑of‑memory error even when the heap is fine. Native Memory Tracking (NMT) helps. Enable it:
# Enable NMT with tracking level
java -XX:NativeMemoryTracking=summary -jar app.jar
Then you can use jcmd to see the breakdown:
jcmd <pid> VM.native_memory summary
I set a baseline at application startup and check again after a few hours of load. If I see significant growth in the “Internal” category, it could mean a thread leak or a native memory leak from a library. I also look at “Metaspace” and “GC” categories. If the native memory keeps growing, I investigate the library that is doing the allocation. I once fixed a bug in a Netty‑based application where a Netty allocator was not returning memory to the OS. NMT made the symptom visible immediately.
These ten techniques are not a checklist you apply blindly. Each one should be tested in a staging environment first. I always change one parameter at a time, run a load test, and compare the GC logs. The JVM is complex, but you do not need to understand every internal detail. You need a systematic approach: measure, hypothesize, change, measure again. The result is a production system that behaves consistently under load, without surprises. That consistency is what your users experience as reliability. And that is the real goal of tuning.