Java 21 Virtual Threads and Structured Concurrency: Write Simple Code That Handles Millions of Requests

Learn how Java 21 virtual threads and structured concurrency simplify concurrent programming. Write clean, sequential code that handles millions of tasks efficiently.

Java 21 Virtual Threads and Structured Concurrency: Write Simple Code That Handles Millions of Requests

I spent years wrestling with thread pools, async callbacks, and race conditions. Every time I wanted to scale an application to handle thousands of concurrent requests, I had to choose between writing simple code that blocked or writing complex async code that was hard to debug. Java 21 changed that with virtual threads and structured concurrency. These two features let me write straightforward, sequential code that can handle millions of simultaneous tasks without breaking a sweat.

Let me walk you through the patterns I use every day. I’ll explain them the way I wish someone had explained them to me: slowly, with code you can copy and run, and with the mistakes I made along the way so you can avoid them.

When you create a platform thread, the operating system allocates a stack of about one megabyte and a kernel thread for each one. That limits you to a few thousand threads on most machines. Virtual threads are different. They live entirely inside the JVM. A virtual thread only consumes a small object and a tiny stack that can grow and shrink as needed. The JVM mounts thousands of virtual threads onto a small pool of carrier threads (typically the same number as CPU cores). When a virtual thread does something that blocks — like reading from a socket or waiting for a database query — the JVM unmounts it from the carrier thread and mounts another one. This unmounting is almost free.

The simplest way to create a virtual thread is using Thread.ofVirtual(). Here’s how I start a single virtual thread that runs a blocking I/O operation:

Thread vt = Thread.ofVirtual()
    .name("check-order-status")
    .start(() -> {
        String status = callExternalApi(orderId);
        updateDatabase(orderId, status);
    });

vt.join();  // Wait for it to finish

Notice I called join(). Even though I only have one virtual thread, I often use join() to make my program’s flow predictable. The important thing is that the blocking inside callExternalApi and updateDatabase doesn’t waste a whole operating system thread. The carrier thread just picks up another virtual thread while this one waits for the network.

Now, if you have hundreds of these tasks, you don’t want to manage them one by one. That’s where the Executors.newVirtualThreadPerTaskExecutor() comes in. It’s an executor that creates a brand new virtual thread for every task you submit. No more sizing thread pools. No more worrying about thread starvation.

try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    List<Order> orders = fetchPendingOrders();
    List<Future<Boolean>> futures = new ArrayList<>();
    for (Order order : orders) {
        futures.add(executor.submit(() -> processOrder(order)));
    }
    // Now collect results
    for (Future<Boolean> future : futures) {
        boolean success = future.get();
        logResult(success);
    }
}

This code looks almost identical to a traditional thread pool, but it can handle ten thousand orders without breaking a sweat. Each virtual thread is cheap. The try-with-resources closes the executor automatically when all tasks are done, which is a nice improvement over the old shutdown() dance.

But there’s a catch. When you use Future.get(), you still have to wait for each task in the order you submitted them. If the first task takes ten seconds and the second finishes in one second, you’ll wait ten seconds before you see the second result. That’s not ideal. Structured concurrency fixes this by treating a group of related tasks as a single unit of work.

Let’s look at StructuredTaskScope. Think of it as a try block for concurrent tasks. You fork a few subtasks inside the scope, then join them. If any subtask throws an exception, the scope automatically cancels the remaining ones. The scope ensures that by the time you exit the try block, all subtasks have finished (or been cancelled).

try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    Future<String> userName = scope.fork(() -> fetchUserName(userId));
    Future<List<Order>> orders = scope.fork(() -> fetchOrders(userId));

    scope.join();              // Wait for both
    scope.throwIfFailed();     // If either failed, propagate the exception

    return new UserProfile(userName.resultNow(), orders.resultNow());
}

This is so much cleaner than CompletableFuture chains or manual CountDownLatch management. I can see exactly where I wait and where errors propagate. The scope also shuts down any tasks that are still running after the first failure, which prevents wasted work.

Sometimes you need the fastest response from multiple sources. For example, you call three different payment gateways and take the first successful reply. StructuredTaskScope.ShutdownOnSuccess is made for that.

try (var scope = new StructuredTaskScope.ShutdownOnSuccess<PaymentResult>()) {
    scope.fork(() -> chargeGatewayA(order));
    scope.fork(() -> chargeGatewayB(order));
    scope.fork(() -> chargeGatewayC(order));

    PaymentResult result = scope.join(Duration.ofSeconds(2)).resultNow();
    return result;
} catch (TimeoutException e) {
    return fallbackPayment();
}

The scope cancels the other gateways as soon as one succeeds. The join(Duration) adds a total timeout. If no gateway responds within two seconds, we get a TimeoutException and fall back. This pattern is simple and safe, because the scope ensures all forked tasks stop when the scope closes.

Virtual threads shine when the underlying libraries release the carrier thread during blocking. Most modern Java libraries do this automatically for network calls, database queries, and file I/O. But watch out for code that uses synchronized blocks. When a virtual thread enters a synchronized block, it “pins” the carrier thread. That means the carrier thread cannot be reused for another virtual thread while the lock is held. If you have a long synchronized method with a blocking operation inside, you lose the scalability benefit.

I once saw a legacy application that wrapped every database call inside a synchronized method to “ensure thread safety.” That killed virtual thread performance. If you must synchronize, keep the lock scope tiny and avoid blocking inside it. Better yet, use ReentrantLock or atomic classes.

// Bad: synchronized method with long I/O
public synchronized void updateAccount(Account a) {
    database.save(a);  // This blocks and pins the carrier thread
}

// Better: use Lock and unlock outside the blocking call
private final Lock lock = new ReentrantLock();

public void updateAccount(Account a) {
    lock.lock();
    try {
        // only protect the critical section
        Account existing = cache.get(a.id());
        if (existing == null) {
            lock.unlock();
            // Do I/O outside lock
            database.save(a);
            return;
        }
        // update cache logic...
    } finally {
        if (lock.isHeldByCurrentThread()) lock.unlock();
    }
}

If you can’t change the synchronized code, you can still use virtual threads, but you might need to increase the number of carrier threads using -Djdk.virtualThreadScheduler.parallelism=64. That’s a workaround, not a fix.

What about reactive code? I know teams that use Project Reactor or RxJava for high throughput. You can absolutely mix virtual threads with reactive streams. The trick is to run blocking operations on a scheduler backed by virtual threads. I create a small utility:

var virtualThreadScheduler = Schedulers.fromExecutor(
    Executors.newVirtualThreadPerTaskExecutor()
);

Mono<String> result = Mono.fromCallable(() -> legacyBlockingDbCall())
    .subscribeOn(virtualThreadScheduler);

Now the blocking call runs on a virtual thread, and the rest of your reactive pipeline stays non-blocking. This lets you migrate gradually without rewriting your whole reactive stack.

Thread locals have a dark side with virtual threads. Because virtual threads can be suspended and resumed on different carrier threads, a thread local value set by a virtual thread can leak into another virtual thread if you’re not careful. Also, virtual threads that are pooled by the executor don’t clean up thread locals after the task ends. I once had a memory leak because a virtual thread stored a large database connection in a thread local and the connection was never closed.

The safer alternative is ScopedValue, still a preview feature in Java 21. Scoped values let you bind a value to a specific scope, and it’s automatically cleaned up when the scope exits.

private static final ScopedValue<Connection> CONN = ScopedValue.newInstance();

void handleRequest() {
    ScopedValue.where(CONN, pool.getConnection())
        .run(() -> {
            // Inside this lambda, CONN.get() returns the connection
            doWork();
        });
    // Outside, no value exists
}

I use scoped values for passing request context, database connections, and authentication tokens. They are immutable and don’t leak.

Testing concurrent code used to be painful. With structured concurrency, I can write deterministic tests because the subtasks run inside a controlled scope. I mock the services and verify that the scope cancels tasks correctly.

@Test
void testFirstSuccess() throws Exception {
    try (var scope = new StructuredTaskScope.ShutdownOnSuccess<String>()) {
        scope.fork(() -> {
            Thread.sleep(100);  // slow task
            return "slow";
        });
        scope.fork(() -> "fast");

        scope.join();
        assertEquals("fast", scope.resultNow());
    }
}

The test completes in about 100 milliseconds because the fast task finishes first, the scope shuts down the slow task (which is sleeping, so it gets interrupted), and we assert the correct result.

In production, I monitor virtual threads using JFR (Java Flight Recorder). I enable it on startup:

java -XX:StartFlightRecording=filename=vt.jfr,settings=profile -jar myapp.jar

Then I can inspect the recording with JDK Mission Control to see how many virtual threads were mounted, how often pinning occurred, and how long tasks waited. I also use jcmd to check the live count:

jcmd <pid> Thread.vthread_count

A high pinning count tells me to find and fix synchronized blocks that block for too long.

Finally, migrating existing code. I don’t rewrite everything at once. I start by changing new Thread(() -> ...).start() to Thread.ofVirtual().start(...). Then I replace Executors.newFixedThreadPool(n) with Executors.newVirtualThreadPerTaskExecutor(). The rest of the code stays the same. The difference in throughput is immediate. On one web server, I replaced a thread pool of 200 platform threads with virtual threads and saw a 20x increase in supported concurrent connections without adding more memory.

One thing I learned the hard way: don’t use virtual threads for CPU‑intensive tasks. Virtual threads are meant for I/O‑bound work where they spend most of their time waiting. If you have a task that does heavy computation, it will occupy the carrier thread anyway. Use platform threads or a traditional thread pool for CPU‑bound tasks.

I also recommend setting a limit on the number of virtual threads you create per scope using a Semaphore if you’re worried about memory. For instance, in a web application that accepts thousands of requests, each request might create dozens of subtasks. Without any limit, you could create millions of virtual threads in seconds, and while each is small, they still consume memory for stack frames. A rate limiter or a bounded StructuredTaskScope can help.

Here’s a pattern I use for rate‑limited fork:

private static final Semaphore semaphore = new Semaphore(5000);

void processBatch(List<Item> items) throws InterruptedException {
    try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
        for (Item item : items) {
            semaphore.acquire();  // blocks until below limit
            scope.fork(() -> {
                try {
                    return handleItem(item);
                } finally {
                    semaphore.release();
                }
            });
        }
        scope.join();
        scope.throwIfFailed();
    }
}

This keeps the number of in‑flight virtual threads under 5000.

Virtual threads and structured concurrency have made my life easier. I no longer dread writing concurrent code. The code is simple, the performance is great, and I can debug it like a sequential program. If you haven’t tried them yet, start with a small service that calls a few APIs or databases. Replace your executor service with newVirtualThreadPerTaskExecutor() and see how many more requests your server can handle. Then add structured concurrency to groups of related tasks. You’ll wonder why you ever did it the old way.


// Keep Reading

Similar Articles