10 Java Virtual Thread Patterns That Simplify High-Concurrency Server Development
Master Java virtual threads with 10 proven concurrency patterns. Learn to avoid thread pinning, use structured concurrency, and build high-throughput apps. Read now.
I remember the first time I tried to write a high‑concurrency server in Java. I had a fixed thread pool, maybe fifty threads, and each request blocked on a database call. When traffic spiked, the threads would sit idle waiting for the database, and new requests would queue up. I switched to an asynchronous framework and my code turned into a maze of callbacks and futures. It worked, but it was hard to read and harder to debug. Virtual threads changed that.
Virtual threads are lightweight threads managed by the Java runtime, not the operating system. You can create millions of them without crashing your machine. They suspend and resume automatically whenever you do something that blocks, like reading from a socket or waiting for a database query. The code looks like traditional synchronous code, but the runtime handles the multiplexing behind the scenes. This means you can write simple, linear logic and still get the throughput of reactive systems. In this article I will walk through ten patterns I use every day to build high‑concurrency applications with virtual threads. I will show you the code, explain the traps, and tell you why each pattern matters.
Using Virtual Threads for Request‑Per‑Thread Web Servers
The simplest pattern is the one that makes the biggest difference: serving each incoming request on its own virtual thread. Before virtual threads, we had to choose between a fixed thread pool (which limits concurrency) or asynchronous frameworks (which complicate code). With virtual threads, you get the best of both worlds.
You can use Executors.newVirtualThreadPerTaskExecutor(). It creates a new virtual thread for every task you submit. Inside that virtual thread, you handle the entire request from start to finish. Blocking calls – like reading the request body, querying a database, calling a remote service – will unmount the virtual thread from the carrier OS thread. The carrier thread can then pick up another virtual thread. When your blocking operation completes, the virtual thread is mounted again and continues where it left off.
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
while (true) {
Socket socket = serverSocket.accept(); // blocking, but that's fine
executor.submit(() -> handle(socket));
}
}
Every handle(socket) runs on a fresh virtual thread. The accept() call itself is blocking, but that’s acceptable because there is only one thread waiting for new connections. The real magic is inside handle(): suppose you parse an HTTP request, look up a user in the database, and return a response. All those blocking calls are serial in your code, but the runtime lets other virtual threads run while each one is blocked.
I have seen Spring Boot applications switch to virtual threads with a single configuration property. Tomcat creates a virtual thread for each request. The same synchronous controllers you wrote years ago now handle thousands of concurrent requests without any rewrites. That is the promise, and it works.
Avoiding Thread Pinning with Synchronized Blocks
Here is where many people get stuck. Virtual threads are designed to be unmounted when they block. However, if a virtual thread enters a synchronized block or method, it can become pinned to the carrier thread. Pinning means the runtime cannot unmount that virtual thread until it exits the synchronized section. If many virtual threads become pinned on the same carrier thread, the carrier thread becomes a bottleneck. Your fancy concurrent system suddenly behaves like it has a few platform threads.
The fix is simple: use ReentrantLock instead of synchronized. The Lock interface in Java does not cause pinning because it is implemented using the same parking mechanism that virtual threads use for blocking. So when a virtual thread calls lock.lock(), it is unmounted properly.
private final Lock lock = new ReentrantLock();
public void updateBalance(long accountId, double amount) {
lock.lock();
try {
Account account = findAccount(accountId);
account.setBalance(account.getBalance() + amount);
saveAccount(account);
} finally {
lock.unlock();
}
}
If you absolutely must use synchronized (for example, because you are working with a third‑party library that uses it internally), keep the critical section as short as possible. Do not call any blocking I/O inside a synchronized block. That includes database calls, HTTP requests, or even waiting on a CountDownLatch. The moment you block inside a pinned section, you tie up a carrier thread for the whole duration. Run your application with the JVM flag -Djdk.tracePinnedThreads=short to log every pinning event. That flag prints the stack trace when a virtual thread is pinned, so you can find the offending code quickly.
Using Structured Concurrency for Scoped Tasks
One of the hardest things about managing many threads is making sure they all finish gracefully when something goes wrong. Suppose you start three parallel tasks: fetch user details, fetch the user’s orders, and fetch recommendations. If one of those tasks throws an exception, you want to cancel the other two immediately so they do not waste resources. Before structured concurrency, you had to manage Future objects manually, cancel them, and handle the complexity.
StructuredTaskScope changes that. You create a scope, fork tasks inside it, and the scope itself manages the lifecycle. When the scope closes (usually via try‑with‑resources), all forked tasks that are still running get cancelled. The ShutdownOnFailure policy means the scope shuts down as soon as any subtask fails, and throwIfFailed() re‑throws that exception.
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
Future<User> userFuture = scope.fork(() -> fetchUser(userId));
Future<Order> orderFuture = scope.fork(() -> fetchOrder(orderId));
scope.join(); // wait for all or until one fails
scope.throwIfFailed(); // propagate the first failure
User user = userFuture.resultNow();
Order order = orderFuture.resultNow();
return new Response(user, order);
}
This pattern is especially powerful in serverless or request‑scoped contexts. The scope’s lifetime matches the HTTP request. If the request times out or the client disconnects, you can close the scope and all in‑flight work stops. I use this in every service that calls multiple downstream APIs. It prevents orphan threads and makes error handling predictable. The code reads like a sequential script, but underneath it runs the three fetches in parallel, and any failure cancels the others automatically.
Combining Virtual Threads with CompletableFuture
Even though virtual threads let you write blocking code, there are times when you need the composition power of CompletableFuture. For example, you might want to call two services and combine their results, but also apply a timeout or a fallback. You can still use CompletableFuture and run its tasks on a virtual‑thread executor.
CompletableFuture.supplyAsync(Supplier, Executor) runs the supplier on the given executor. If you pass an executor that creates virtual threads, each stage of the future runs on a virtual thread. The future’s non‑blocking methods like thenApply or orTimeout still work as expected. The virtual threads that execute the stages will be unmounted while waiting for the timeout, so no carrier thread is wasted.
var executor = Executors.newVirtualThreadPerTaskExecutor();
CompletableFuture<String> future = CompletableFuture
.supplyAsync(() -> fetchFromServiceA(), executor)
.thenCombine(
CompletableFuture.supplyAsync(() -> fetchFromServiceB(), executor),
(a, b) -> merge(a, b)
)
.orTimeout(3, TimeUnit.SECONDS)
.exceptionally(ex -> fallbackValue());
You can also start a virtual thread and then block on a CompletableFuture. That blocking call will unmount the virtual thread until the future completes. This hybrid approach works well when integrating with libraries that are not yet virtual‑thread‑aware but provide asynchronous APIs.
One word of caution: avoid wrapping simple blocking calls in CompletableFuture just to pretend you are asynchronous. Virtual threads already handle blocking efficiently. Only reach for CompletableFuture when you need composition, timeouts, or error recovery that goes beyond what a simple try‑catch can provide.
Setting a Reasonable Limit on Virtual Thread Creation
Creating millions of virtual threads is technically possible, but that does not mean you should do it without bound. Each virtual thread consumes memory for its stack. The default stack size is around 1 MB (on most platforms), but the JVM can shrink stacks that are not fully used (via the JVM flag -XX:VirtualThreadStackSize). Still, a million virtual threads with 1 MB stacks would use 1 TB of memory – not feasible.
The trick is that virtual threads are cheap only when they are short‑lived or mostly blocked. If you have a long‑running task that rarely blocks, it will hold onto that stack memory for a long time. To protect yourself, use a semaphore or a bounded executor to limit the number of concurrent virtual threads that access a specific resource.
var semaphore = new Semaphore(200);
var executor = Executors.newVirtualThreadPerTaskExecutor();
for (Task task : allTasks) {
executor.submit(() -> {
semaphore.acquire(); // blocks here, virtual thread unmounts
try {
task.process();
} finally {
semaphore.release();
}
});
}
Each virtual thread will block on semaphore.acquire() until a permit is available. While blocked, the virtual thread is unmounted, so it uses almost no CPU. This pattern lets you queue many tasks without exhausting memory or overwhelming a downstream database that can only handle 200 connections at once.
I always use this when processing bulk files or batch jobs. Without the semaphore, the application would try to open 10,000 database connections simultaneously. With the semaphore, only 200 are active at any time, and the rest are waiting politely.
Choosing the Right Stack Size for Virtual Threads
The default virtual thread stack size is 1 MB. That is large enough for most purposes, but if you create many concurrent virtual threads, the memory footprint adds up. You can reduce the stack size with the JVM flag -XX:VirtualThreadStackSize. Values like 256 KB or even 128 KB work fine for simple request handlers that do not use deep recursion or large local arrays.
Beware: if you set the stack too small, you will get StackOverflowError. Test with realistic workloads. Use JFR (JDK Flight Recorder) to measure the maximum stack depth your virtual threads actually use. The jdk.VirtualThreadEnd event includes information about peak stack size. Increase the stack size only when you see a failure.
# Example: set stack to 512 KB
-XX:VirtualThreadStackSize=512k
I typically start with 512 KB for web applications and increase only if I see stack overflow errors. For data‑processing tasks that involve recursion (like traversing a tree), I keep the default 1 MB. Remember that virtual threads are not zero‑cost; they are just much cheaper than platform threads. Managing stack size is an easy way to reduce memory pressure when you have millions of concurrent threads.
Testing Virtual Thread Behaviour with Realistic Concurrency
Testing with virtual threads requires a different mindset. A test that spawns 10,000 platform threads may fail because the OS cannot create that many. The same test with virtual threads runs instantly. This is great, but it also means you need to test for race conditions that only appear under high concurrency.
Write a test that creates many virtual threads that all access a shared resource. Use a thread‑safe data structure (like ConcurrentLinkedQueue) and a CountDownLatch to ensure all threads complete before checking the results.
@Test
void testConcurrentAccess() throws Exception {
var executor = Executors.newVirtualThreadPerTaskExecutor();
var results = new ConcurrentLinkedQueue<String>();
int threadCount = 100_000;
var latch = new CountDownLatch(threadCount);
for (int i = 0; i < threadCount; i++) {
int finalI = i;
executor.submit(() -> {
results.add("task-" + finalI);
latch.countDown();
});
}
boolean completed = latch.await(30, TimeUnit.SECONDS);
assertTrue(completed);
assertEquals(threadCount, results.size());
}
This test verifies that all threads ran and produced a result. But it does not prove that your code is free of race conditions. Add explicit stress tests where virtual threads read and update shared state without proper synchronisation. Use the jdk.tracePinnedThreads flag to detect pinning during tests. I also run the same tests with platform threads to compare behaviour.
Another important test: simulate a situation where a virtual thread is pinned and then blocked on I/O. Verify that the carrier thread does not get stuck and that the system still makes progress. This kind of testing catches the subtle scalability bugs that only appear under load.
Using Virtual Threads with JDBC and Blocking Drivers
A common worry is that JDBC is blocking, so it will ruin the benefits of virtual threads. The truth is exactly the opposite: JDBC blocking is perfect for virtual threads because the runtime unmounts the virtual thread while waiting for the database. The carrier thread is free to run other virtual threads. The database query is still synchronous in your code, and the connection pool still limits the number of physical connections.
public User findUser(long id) {
return entityManager.find(User.class, id); // This blocks, but that's fine
}
The catch is connection pool size. If you have 1,000 virtual threads executing the above method at the same time, they all need a JDBC connection. If your pool only has 50 connections, 950 virtual threads will be blocked on pool.getConnection(). That is fine – they will be unmounted while waiting. But if your queries are slow (seconds each), those 50 connections may not be enough. You might need to increase the pool size, or optimise your queries.
Never wrap a blocking JDBC call in a CompletableFuture or use a separate thread pool for it. That adds unnecessary complexity. Just let the virtual thread block naturally. The only place you need to be careful is inside a synchronized block – if you call JDBC there, the virtual thread will be pinned and the carrier thread will be wasted. So keep your locks out of database access paths.
Managing ThreadLocals Carefully
ThreadLocals have always been tricky. With virtual threads, they become even more subtle because a virtual thread is a single Thread object. When you set a ThreadLocal inside a virtual thread, that value stays with that virtual thread for its entire lifetime. That seems correct. But many frameworks (like logging MDC, transaction context, or security context) rely on ThreadLocal to propagate values from a parent thread to child tasks.
With platform threads, you could use InheritableThreadLocal. With virtual threads, InheritableThreadLocal works only if you fork the virtual thread from a platform thread. If you submit a task to a virtual‑thread executor, the new virtual thread inherits the ThreadLocal values of the carrier thread, not the calling thread. That is rarely what you want.
The JDK 21 introduced ScopedValue (incubator) as a better alternative. A ScopedValue is bound within a scope of a single thread (or virtual thread) and is immutable. It is not inherited by child threads; you must explicitly share it. This avoids the confusion of inherited thread locals.
private static final ScopedValue<String> REQUEST_ID = ScopedValue.newInstance();
public void handleRequest(Request request) {
ScopedValue.where(REQUEST_ID, request.id())
.run(() -> {
// Inside this runnable, REQUEST_ID.get() returns the request.id
process();
});
}
If you need to pass context to virtual threads you create, pass it as explicit method arguments or use a context object stored in a field. For logging, most modern logging frameworks (like Log4j 2) already support ThreadLocal with virtual threads by using a thread‑local map that is cleared when the virtual thread is done. Test your framework’s behaviour before relying on it.
Profiling and Monitoring Virtual Threads in Production
Virtual threads change the way you look at performance. A classic thread dump will show hundreds of virtual threads with the same stack trace, all waiting on the same database call. That is not a problem – they are unmounted. But a few virtual threads that are pinned will show up as active on carrier threads. The JDK provides tools to examine this.
Start JFR recording with the profile setting, which includes events for virtual threads. Use JDK Mission Control (JMC) to open the recording. Look at the “Virtual Threads” tab. It shows the number of mounted vs unmounted threads, the locations where virtual threads blocked, and the duration of each block. The event jdk.VirtualThreadPinned tells you exactly where pinning occurred.
java -XX:StartFlightRecording=filename=recording.jfr,settings=profile -jar app.jar
You can also use the jcmd command to dump the virtual thread stacks live:
jcmd <pid> Thread.vthread_dump
Monitor the ratio of mounted to unmounted threads in production. If the number of mounted threads stays close to the number of carrier threads (which by default equals the number of CPU cores), the system is handling the load well. If you see many mounted threads, it means many virtual threads are pinned and not yielding the carrier. That is a sign you need to review your synchronized blocks or native code.
Another metric: the time virtual threads spend blocked. If your database query takes 10 milliseconds and there are 1,000 concurrent queries, that is fine. But if the query takes 10 seconds and you have only 50 database connections, you will see many virtual threads waiting for the connection pool. That is also fine, but you may want to increase the pool or optimise the query.
Virtual threads are not magic. They make blocking cheap, but they do not make slow I/O fast. Profile your system, find the real bottlenecks, and fix them. The monitoring tools are your friends.
I have used these ten patterns in production systems for over a year. They have made my code simpler and my services more resilient. The key is to remember that virtual threads are a tool, not a silver bullet. Avoid pinning, use structured concurrency for scoped tasks, bound your resource usage, and monitor what is happening. If you do that, you can write high‑concurrency applications that are easy to understand and easy to maintain. And that is the whole point.