Let’s talk about running Java applications in containers. It’s a common setup now, but if you just throw your old JAR file into a Docker container and run it, you might face some frustrating problems. The application could be slow to start, use too much memory, or get killed unexpectedly by the system. I’ve seen this happen, and the fix usually isn’t about changing the application code. It’s about tuning the interface between the Java Virtual Machine and its new, constrained container environment.
The JVM was designed in an era of dedicated physical servers. It expects to see all the memory and CPU cores on a machine. In a container, those resources are just illusions—limits set by the runtime. If the JVM doesn’t know about these limits, it will make bad decisions. The first and most crucial step is to tell the JVM to look at the container’s limits, not the host’s.
You do this by enabling container support. For modern JVMs (Java 8 update 191 and later, and all Java 10+), this is often automatic, but being explicit is good practice. More importantly, you need to define how much of the container’s memory the heap should use. You don’t want it to use 100%, as the JVM needs memory for other things too.
java -XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0 -jar app.jar
This command tells the JVM to respect container boundaries and to set the maximum heap size to 75% of the container’s total memory limit. If your container is allowed 1GB of RAM, the heap will aim for about 750MB. The remaining 25% is left for stack memory, metaspace (where class definitions live), and direct buffers. This prevents the classic “Out Of Memory” kill from the container runtime, which happens when the total JVM process size exceeds the limit.
Choosing where your application lives starts with the base Docker image. It’s tempting to use openjdk:latest and move on, but that image is large. It contains a full operating system and a complete JDK, most of which you don’t need to run your application. A large image is slow to pull from a registry and takes up more space on every node in your cluster.
Instead, opt for a minimal image. My personal preference is to use a multi-stage Docker build. I compile and package my application in a full JDK image, then copy the results into a tiny runtime image.
# First stage: build
FROM eclipse-temurin:17-jdk-alpine AS builder
WORKDIR /app
COPY . .
RUN ./gradlew bootJar
# Second stage: runtime
FROM eclipse-temurin:17-jre-alpine AS runtime
WORKDIR /app
COPY --from=builder /app/build/libs/*.jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-XX:+UseContainerSupport", "-jar", "/app/app.jar"]
Here, the final image is based on eclipse-temurin:17-jre-alpine. It’s just a Alpine Linux distribution with a Java Runtime Environment. It’s significantly smaller and more secure because it has fewer unnecessary packages. The JRE is all you need to run a pre-compiled JAR file.
Once you have a small image, you can make builds faster. Every time you change a single line of code, rebuilding your Docker image typically means Docker has to re-download all your application dependencies. This is slow. The solution is to structure your application JAR in a way that plays nicely with Docker’s layer caching.
Modern tools like Spring Boot create “layered JARs.” Think of it like a cake with separate layers: one for dependencies, one for your own code, one for resources. In your Dockerfile, you can extract these layers individually.
FROM eclipse-temurin:17-jre-alpine AS runtime
WORKDIR /app
COPY build/libs/application.jar app.jar
RUN java -Djarmode=tools -jar app.jar extract --layers && rm app.jar
ENTRYPOINT ["java", "-XX:+UseContainerSupport", "org.springframework.boot.loader.JarLauncher"]
This technique uses Spring Boot’s tooling to split the JAR. The key benefit is Docker caching. The layer containing your third-party dependencies (like Spring, Apache Commons, etc.) is huge but changes rarely. The layer with your application classes is small and changes often. Docker can cache the dependency layer. When you change your code and rebuild, it reuses the cached dependency layer and only rebuilds the small application layer. This turns a 3-minute build into a 30-second one.
Performance tuning inside a container is different. The garbage collector is a key component. In a large server, you might optimize for maximum throughput. In a container, especially one with limited CPU, you often need to optimize for predictable pause times to keep your application responsive.
The G1 Garbage Collector is a good default for containerized workloads. You can give it a target for maximum pause time.
java -XX:+UseContainerSupport -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -jar app.jar
This tells G1 to try hard to keep any individual garbage collection pause under 200 milliseconds. For very low-latency applications, you might consider more advanced collectors like ZGC or Shenandoah. They can achieve pause times of just a few milliseconds, but they use more CPU. This is a trade-off. In a container with a strict CPU limit, using ZGC might leave less CPU for your actual application logic. You have to test and see what works for your specific workload and resource limits.
Speaking of CPU, this is another area where the JVM can be fooled. The JVM looks at the host machine to determine the number of available processors. It uses this number to size internal thread pools, like those for the Just-In-Time compiler and the garbage collector. In a container, you might only be allowed to use one or two cores.
If you don’t specify a CPU limit, the JVM might create dozens of compiler threads, wasting memory and causing contention. The fix is to always set explicit CPU limits in your Kubernetes pod spec or Docker run command. This lets the underlying cgroups system report the correct number of available CPUs to the JVM.
# In your Kubernetes deployment.yaml
spec:
containers:
- name: java-app
resources:
limits:
cpu: "1.5"
memory: "1Gi"
requests:
cpu: "0.5"
memory: "512Mi"
With this configuration, the JVM will correctly size its internal pools for roughly one and a half cores. The requests field tells the scheduler what you need at a minimum to run.
Memory management goes beyond the heap. The JVM uses “native memory” for many things: the metaspace for class metadata, memory allocated by thread stacks, and direct byte buffers used by libraries like Netty for network operations. If your heap is fine but the total container memory is exceeded by these other areas, your container will still be killed.
To see what’s happening, enable Native Memory Tracking. It adds a small overhead, so you might not run it in production all the time, but it’s invaluable for diagnosis.
java -XX:+UseContainerSupport -XX:NativeMemoryTracking=summary -jar app.jar
You can then connect to the running container and ask for a report.
# Find the PID (it's often 1 in a container)
jcmd 1 VM.native_memory summary
This will print a breakdown. You might see that your metaspace is growing because of dynamic class generation, or that you’ve allocated 500MB of direct buffers. This insight guides your next move—perhaps increasing the container memory limit, tuning metaspace size with -XX:MaxMetaspaceSize, or investigating a library that’s allocating too many direct buffers.
Containers are managed by orchestrators like Kubernetes. They need to know the health of your application. Is it alive? Is it ready to receive traffic? If your application takes 30 seconds to warm up its cache, you don’t want Kubernetes to send it user requests before it’s done.
You need to expose health endpoints. In Spring Boot, this is simple.
# application.properties
management.endpoints.web.exposure.include=health,info
management.endpoint.health.probes.enabled=true
server.port=8081 # Use a different port for management
This creates two endpoints: /actuator/health/liveness and /actuator/health/readiness. Liveness tells Kubernetes if the process is running. Readiness tells Kubernetes if the application is initialized and healthy enough to handle HTTP requests.
You then configure these in your Kubernetes manifest.
spec:
containers:
- name: app
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8081
initialDelaySeconds: 90
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8081
initialDelaySeconds: 30
The initialDelaySeconds is critical. It gives your Java application time to start before Kubernetes starts poking it. Without this, Kubernetes might kill your pod during a slow startup, thinking it’s dead.
Startup speed matters more than ever with patterns like serverless or jobs that run in ephemeral containers. You want your container to be ready in seconds, not minutes. There are several JVM-level tricks.
You can disable the highest levels of Just-In-Time compilation initially. The JVM uses tiered compilation: it starts by quickly compiling code with minimal optimizations (Level 1, 2, 3) and later recompiles hot methods with aggressive optimizations (Level 4). That final level is costly.
java -XX:+UseContainerSupport -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -jar app.jar
This command tells the JVM to stop at compilation level 1. The code will run a bit slower initially, but the application will start much faster. The JVM can later recompile hot methods if the application runs long enough, but for a short-lived job, you trade peak performance for rapid startup.
Application design plays a role too. Avoid scanning large parts of the classpath on startup. Use lazy initialization for non-critical components. I once worked on an app that loaded a massive XML configuration file on startup. Moving that to a lazy load saved nearly 10 seconds.
Logging in containers follows a different philosophy. In a traditional server, you write logs to files on a disk. In a container, the filesystem is usually ephemeral. When the container stops, the logs are gone. The standard practice is to write all logs to the standard output (stdout) and standard error (stderr) streams.
The container runtime captures these streams. You should configure your logging framework, like Logback or Log4j2, to output to the console in a structured format like JSON.
# logback-spring.xml example for Spring Boot
<configuration>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
</appender>
<root level="info">
<appender-ref ref="CONSOLE" />
</root>
</configuration>
Never use System.out.println. Use your logging framework. This ensures your logs have timestamps, log levels, and can be easily parsed and shipped to a central system like the ELK stack or Datadog by your cluster’s log collector.
Finally, containers are meant to be stopped. They are terminated during scaling events, deployments, and node maintenance. If your application is killed mid-request, you can corrupt data or drop important messages.
You must handle shutdown signals gracefully. When Kubernetes decides to terminate a pod, it sends a SIGTERM signal to the process. Your application has a short period (default 30 seconds) to shut down cleanly before it gets a SIGKILL.
In a Spring Boot application, enabling graceful shutdown is often built-in. But you should also ensure your own resources are closed. Register a shutdown hook if you manage connections directly.
@Bean
public DisposableBean myResourceCleanup() {
return () -> {
// Close custom network connections
// Flush in-memory buffers to disk
// Send a final metric to your monitoring system
System.out.println("Performing graceful shutdown...");
};
}
The goal is to finish processing current requests, close database connections, and deregister from service discovery (like Eureka or Consul). This ensures a user doesn’t get a connection error because they were routed to a pod that’s halfway dead.
Pulling this all together, running Java in containers is a shift in mindset. It’s not about maxing out a single machine’s performance. It’s about predictability, density, and working within declarative limits. You configure the JVM to understand its sandbox. You build small, layered images. You expose your health clearly. You log to the streams. You shut down politely.
These adjustments help your Java applications fit into the cloud-native world. They start fast, use resources efficiently, and behave in a predictable way that orchestrators can manage. It turns a traditional Java monolith into a good container citizen, ready to scale up and down with demand. The work happens in the configuration and the build process, not necessarily the business logic, but the impact on stability and cost is real.