JEP proposed to target JDK 19: 425: Virtual Threads (Preview)

dang · on May 2, 2022

Recent and related:

Achieving 5M persistent connections with Project Loom virtual threads - https://news.ycombinator.com/item?id=31214253 - April 2022 (142 comments)

nu11ptr · on May 2, 2022

From http://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part1.h...:

"Whereas the OS can support up to a few thousand active threads, the Java runtime can support millions of virtual threads. Every unit of concurrency in the application domain can be represented by its own thread, making programming concurrent applications easier. Forget about thread-pools, just spawn a new thread, one per task. You’ve already spawned a new virtual thread to handle an incoming HTTP request, but now, in the course of handling the request, you want to simultaneously query a database and issue outgoing requests to three other services? No problem — spawn more threads. You need to wait for something to happen without wasting precious resources? Forget about callbacks or reactive stream chaining — just block. Write straightforward, boring code. All the benefits threads give us — control flow, exception context, debugging flow, profiling organization — are preserved by virtual threads; only the runtime cost in footprint and performance is gone. There is no loss in flexibility compared to asynchronous programming because, as we’ll see, we have not ceded fine-grained control over scheduling."

This seems so obvious in hindsight as the "right way". I wonder why we went down that whole async/await craze with so many languages?

pron · on May 2, 2022

> I wonder why we went down that whole async/await craze with so many languages?

We addressed this, albeit very briefly, in the Alternatives section of the JEP: https://openjdk.java.net/jeps/425

There are multiple reasons:

1. Languages that don't already have threads have an implicit assumption built into all existing code that program state cannot change concurrently. This is why scheduling points need to be marked explicitly, and why adding threads — whether user-mode or OS — might break existing code in some very tricky ways. That's the case of JavaScript.

2. Some languages target an IR without control over its backend, making an efficient implementation of user-mode threads difficult, if not impossible. async/await requires only changes to the frontend compiler. That's the case of Kotlin, and, perhaps to a lesser extent, Rust.

3. Some languages have technical features that make implementing user-mode threads efficiently more difficult than in others. Pointers into the stack and careful control over memory allocation make this more challenging in languages like C++ and Rust than in Java.

signal11 · on May 2, 2022

Ron, thank you for your work on Loom, it’s very exciting and I’m looking forward to using it in production code!

As an aside, Ron had a JUG talk 6 months ago which I found really helpful where he went into more detail about why they chose this approach: https://youtu.be/KmMU5Y_r0Uk (27m20s mark, and from 2m50 there’s a more general introduction to Loom).

I’m sure there are other videos/papers as well, but this was a pretty good overview of Java vs other languages’ approach to async.

kevincox · on May 2, 2022

I think you are missing a key benefit of async/await. It can be implemented incredibly efficiently. This is because it is "stackless". Or in other words you know the exact amount of "stack" space required and you can allocate this exactly instead of allocating a real stack which can be very much larger.

For example if I want to implement a "sleep" with async/await I probably only need to store the wake time as state, if I want to make a virtual thread to do the same I likely need to allocate a large stack just in case I use it.

Of course this can be mitigated with stack caching, small stacks, segmented stacks or other tricks. But doing this is still more expensive than knowing how much "stack" you need up-front and allocating only that.

pron · on May 2, 2022

That's not a benefit of async/await as the same could be done with user-mode threads. In fact, that's what we do with virtual threads. But it might be a benefit of async/await in some particular languages.

kibwen · on May 2, 2022

> But it might be a benefit of async/await in some particular languages.

Rather than saying it's a benefit for particular languages, I'd say it's a benefit in particular contexts, e.g. in contexts where you don't have a heap. Of course it's true that some (most) languages don't support such contexts at all (for a host of good reasons), but the languages that do are shaped by that decision.

pron · on May 2, 2022

The use case of interest here is having many concurrent operations (hundreds of thousands or millions). If you don't have a heap, where do you store the (unbounded number of) async/await frames? There are other use-cases where stackless coroutines are useful without being plentiful — e.g. generators — but that's not the use-case we're targeting here (and is probably a use-case of lower importance in general).

Many languages/runtimes want just a single coroutine/continuation construct to cover both concurrency and generators — which is a good idea in principle — but then they, especially low-level languages, optimise for the less useful of the two. I've seen some very cool demos of C++ coroutines that are useful for very narrow domains, and yet they offer a single construct that sacrifices the more common, more useful, usage for the less common one.

There was one particular presentation about context-switching coroutines in the shadow of cache misses. It was extremely impressive, yet amounted to little more than a party trick. For one, it was extremely sensitive to precise sizing of the coroutine frames, which goes against the point of having a simple, transparent language construct, and for another, it simplifies small code that has to be very carefully written and optimised to the instruction level even after the simplification.

kibwen · on May 2, 2022

Yes, I am (perhaps a bit sloppily) using "particular contexts" to refer to particular use cases. And while your use case is the C5M problem, since we're bringing up other languages (which optimize for different contexts) I think it's worth emphasizing that these features also lend themselves to other use cases. Here's an example of using Rust's async/await on embedded devices, for reasons other than serving millions of concurrent connections: https://ferrous-systems.com/blog/async-on-embedded/

> Many languages/runtimes want just a single coroutine/continuation construct to cover both concurrency and generators — which is a good idea in principle — but then they, especially low-level languages, optimise for the less useful of the two.

Notably Rust appears to be the opposite here, as it is first focusing on providing higher-level async/await support rather than providing general coroutine support, but its async/await is implemented atop a coroutine abstraction which it does hope to expose directly someday.

I'm sure you don't need to be told most of this, but I bring all this up to help answer the more general question of why not every language builds in a green thread runtime, and why one approach is not necessarily strictly superior to another.

pron · on May 2, 2022

If generators or embedded devices that don't have threads are indeed the reason for picking one design over the other, the question then becomes why did some languages prioritise those domains over more common ones, even for them?

kibwen · on May 2, 2022

Indeed, to which the answer is: it's a dirty job, but somebody's got to do it. :) As long as C exists, it's worth trying to improve on what C does without giving up on C's use cases. Of course, that doesn't mean that all use cases are equivalently common, nor does it mean that a language like Rust will ever be as widely used as Java, nor does it mean that Java was wrong for integrating virtual threads (I think they're probably the right solution for a language in Java's domain).

lumost · on May 3, 2022

A common theme in rust development is the notion that no one could produce more optimal code by hand. This is a great feature, but in the case of async/await we are sacrificing a lot to get it. To the extent that a user trying to make their first http request with reqwest will now get conflicting documentation and guidance on whether they need tokio and other packages to pull in async.

kevincox · on May 2, 2022

Can you explain how this is done? Is the current stack copied onto the heap (to the size it currently is)? How are new frames allocated once a thread is suspended?

pron · on May 2, 2022

A portion of the stack is copied to the heap when the virtual thread is suspended, and upon successive yields those "stack chunks" are either reused or new ones allocated and form a linked list. When resuming a virtual thread, however, we don't copy its entire stack back from the heap to the stack, but we do it lazily, by installing a "return barrier" by patching the return address, so as you return from a method, its caller (or several callers) is lazily "thawed" back from the heap. This copying of small chunks of memory into a region that's likely in the cache is very efficient.

The entire mechanism is rather efficient because in Java we don't have pointers into the stack, so we don't need to pin anything to a specific address, and stacks can be freely moved around.

SemanticStrengh · on May 2, 2022

I wonder the implications and opportunities for https://github.com/microsoft/openjdk-proposals/blob/main/sta...

U1F984 · on May 3, 2022

Apparently it's already being taken into consideration:

> The optimization should work with Project Loom when it becomes available.

dboreham · on May 2, 2022

> It can be implemented incredibly efficiently.

At the cost of breaking : conceptual model of concurrency ; debugging ; performance analysis ; tracing ; logging

but yeah...great stuff.

Matthias247 · on May 3, 2022

This is a reason that is brought up once in a while - but even after working 10 years in the domain of high-concurrency services I've never seen compelling data that shows clearly whether stackless coroutines are more efficient than stackful ones. People unfortunately rarely write applications in both approaches to tell.

While the "stack is optimally sized" argument exists, it might not always be true: E.g. implementations could require far more memory being required for the "virtual" stack than what is actually required due to implementation challenges. That for example applies in various situations in Rust. Then a more classical stackless implementation which allocates state for each callback on the heap (like if you manually write boost asio code) which have quite some allocation and memcpy churn. And besides that a "virtual stack" might be more fragmented and less cache friendly than a contiguous stack, which also impacts efficiency.

vbezhenar · on May 2, 2022

OS virtual memory ensures that overhead will not be that big. OS will allocate memory page by page as software touches corresponding virtual addresses. So thread stack will use only as much memory as its maximum stack usage requires (rounded by page size). Async/await of course is more efficient, but in real world native stacks might be good enough, especially when RAM is not very expensive.

addaon · on May 2, 2022

"Rounded by page size" is a pretty huge caveat here, though, no? With a 4 kB minimum page size on most platforms, 5M threads is 20 GB of stack virtual mappings, minimum. And cycling through those threads even once will make every page of that 20 GB resident.

Matthias247 · on May 3, 2022

Maybe. Realistically it won't matter, because any real world server would either need a lot more memory anyway to actually handle application-specific concerns or support a much lower number of clients. Keep in mind that even with a tiny and receive buffer of 16kB plus maybe some TLS state of > 30kB per connection the baseline memory usage of doing anything useful is already much higher than 4kB - unless the only thing you want to do is building a large-scale TCP ping service.

lumost · on May 3, 2022

That still means I can effectively use 5 million threads on a small server. Which is effectively 3 orders of magnitude more threads than I can currently run with Java.

astrange · on May 3, 2022

macOS/iOS is a popular platform where the page size is 16KB and RAM is moderately-to-very expensive.

It might be interesting to try something like Mesh (https://github.com/plasma-umass/Mesh) to share pages.

toast0 · on May 3, 2022

macOS/iOS aren't a realistic server platform for high loads. They don't even have syncookies, so anything TCP is out.

closeparen · on May 2, 2022

How does async/await mitigate #1? Interleaved execution is enough to give you data races; you don't need actual parallelism.

pron · on May 2, 2022

Yes, but it requires a special call site (transitively, all the way up the stack) that permits the interleaving, and so cannot sneak into existing code that might implicitly assume no interleaving.

gpderetta · on May 3, 2022

but good old callback-based code still allows for interleaving and AFAIK JS doesn't require any callsite allocation for that.

pron · on May 3, 2022

It does not allow for interleaving. Interleaving means that state can change in the same subroutine.

gpderetta · on May 4, 2022

What I mean is that subroutine can observe its own state being changed even after a call of a non-sync marked function if that function directly or indirectly calls into a closure closing over that subroutine state.

I.e. IMHO async offers very weak reentrancy guarantees that are better enforced via other means (rust-like lifetimes, immutability annotations, atomic contructs, etc).

SemanticStrengh · on May 2, 2022

A major issue with loom is that it consume much more %CPU https://github.com/ebarlas/project-loom-comparison/blob/main... Edit no it is more efficient although it consume surprisingly high gpus at higher throughput than the others.

didibus · on May 2, 2022

Wouldn't that be expected when it also delivers more throughput and better latencies? It's handling more requests concurrently, so I'd expect the CPU usage to be higher, how else could it serve more requests faster otherwise?

SemanticStrengh · on May 2, 2022

Yes indeed I just find the consumption increase a but abrupt after 10K

brokencode · on May 2, 2022

Looking at the graphs, it uses less CPU for a given throughout, so it’s actually more efficient for CPU. It also provides lower latency and higher max throughout. It does seem to require more memory, though.

pron · on May 2, 2022

We expect to improve the memory consumption significantly in future releases. Some things had to be cut to make this release.

SemanticStrengh · on May 2, 2022

The existing data already looks excellent. I wonder if you could leverage SIMD/the vector api for speeding up some things. Or if value types will have an impact.

noncoml · on May 2, 2022

All these are implementation details.

The programmers should be seen as “users” of the language.

What you give here is a list of excuses on why system X doesn’t do what is best for its users.

ecnahc515 · on May 2, 2022

Implementation details are often also defacto features, because the behavior may be relied upon by users. Unless you designed the language up front to consider these things, it's often very much a challenge to tell your users that their code is broken, especially if you did not have a language specification clarifying it.

For languages like Python for example, this is a big issue, and reason why alternative concurrency patterns to async/await haven't made much progress.

noncoml · on May 2, 2022

Totally agree. But that was not the point of GP

vbezhenar · on May 2, 2022

As I grow older, implementation details are all that matters to me in the end. I hate go language, IMO it's ugly and terrible to work with. But its compiler and toolset are golden and I'll use it just because of its implementation details. I don't have time to wait until language developers will implement implementation details I need, if ever. I need to ship software tomorrow.

rileyphone · on May 2, 2022

The compiler and toolset are the user facing aspects of a language like go, how the parser works or internal functions in the standard library would be the implementation details.

noncoml · on May 2, 2022

Hmmm. Implementation details are the stuff that are not visible to you as a “user” of the language and the toolset.

I don’t know why people confuse this so much

Hercuros · on May 3, 2022

I would say that implementation details could be visible to you as a user, but should not be relied upon because they are not part of the documented API.

E.g. it might be visible to you that a certain operation runs quickly on certain inputs, or that a particular output is chosen for a particular input, even though the documentation does not specify the exact output.

pie_flavor · on May 2, 2022

No, explicit pointers are not an implementation detail. Nor is knowing when you are on an OS thread in a language where you are likely to integrate with OS functions that depend on what thread you are on. Different languages exist for different purposes requiring them to solve problems in different ways.

noncoml · on May 2, 2022

Two out of three arguments of the GP are implementation details. You can tell that they are implementation details by the way the language is used.

“It’s difficult to do A given B”

As a user I don’t care about B. I just want A.

For your example, the concept of explicit pointers are orthogonal to threads. That’s why you can have OS threads with explicit pointers.

Just because it’s difficult to make them work together doesn’t mean they are incompatible as concepts.

It’s still an implementation detail.

For example, during Win95 one could argue it’s impossible for a crash program not to crash the whole system.

As a user I don’t care what’s going under the hood. I just don’t like it when my Windows 95 app can crash the system. It is an implementation detail

noncoml · on May 2, 2022

Damn folks. Downvoting is not meant for showing your disagreement

Targeting a specific IR is an implementation detail.

Having explicit pointers and thread are not mutually exclusive concepts. It’s the implementation details that make them difficult.

HN can be so annoying sometimes

yCombLinks · on May 2, 2022

It's not about disagreement with your opinion, it's about your rudeness presenting it.

kjeetgill · on May 2, 2022

Choices come with trade-offs and in turn make languages suitable for different use-cases and users. There's no uniform "best for it's users" choices that apply to all languages.

Well, ironically, the exception that applies to all languages being: "does my code still work?" choices... which is what pron was addressing.

noncoml · on May 2, 2022

I can buy the argument that async/await is a design choice. In which case this is what is “best if the user”

But the GP replied to a comment that was claiming threads of execution are better than async. So in the context of the reply the threads are the “best for the the user”

pkulak · on May 2, 2022

Well, this kinda thing isn't free. You need a rather large and complicated runtime to make it work. Fine for the JVM, which is already a large and complicated runtime, but it turns out that making a runtime like this optional is not really possible. So languages that don't have a runtime are stuck with async/await, which can be zero-cost.

And, to be fair, languages like Erlang and Go went this route LONG ago.

moonchild · on May 3, 2022

> languages like Erlang and Go went this route LONG ago

So did java! It originally used green threads, before switching to exclusively kernel threads.

SemanticStrengh · on May 2, 2022

Except GraalVM bring support to almost any language

Comevius · on May 2, 2022

Garbage collectors and preemptively scheduled virtual threads require a runtime. Even GraalVM Native Image has a large runtime for this reason. There is no free lunch here.

pjmlp · on May 2, 2022

All high level languages have a runtime, the only difference among them is how big it is in practice.

Even Assembly can be considered to have one, in case of microcoded CPUs.

SemanticStrengh · on May 2, 2022

I'm just saying the cost of developing the runtime can now easily be mutualized between multiple languages

thfuran · on May 2, 2022

That's some hefty sleight of hand.

kodablah · on May 2, 2022

> I wonder why we went down that whole async/await craze with so many languages?

For single-threaded event loops because most of those languages did not want to put the concept of thread safety onto the developer (e.g. JS and Python). And even the ones that do concern the user w/ thread safety don't have an intermediate representation and VM to automatically sequence instructions (e.g. Rust).

merb · on May 2, 2022

because green threads and async/await have both upsides and downsides. if it would've been as easy as the comment would've described it, they would've shipped it in jdk 8, but they didn't. they ship a preview in jdk 19.

rust's rfc explains some of the drawbacks: https://github.com/rust-lang/rfcs/blob/master/text/0230-remo... (which basically explains why a preview took until jdk 19)

// Edit: also:

    void handle(Request request, Response response) {
    var url1 = ...
    var url2 = ...
 
    try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
        var future1 = executor.submit(() -> fetchURL(url1));
        var future2 = executor.submit(() -> fetchURL(url2));
        response.send(future1.get() + future2.get());
    } catch (ExecutionException | InterruptedException e) {
        response.fail(e);
    }
    }

is not better or worse than:

    async Task<IActionResult> Handle()
    {
        var url1 = ...
        var url2 = ...

        try {
            var requestTask1 = FetchURL(url1);
            var requestTask2 = FetchURL(url2);
            return Ok((await requestTask1) + (await requestTask2));
        }
        catch (Exception ex)
        {
            return BadRequest();
        }
    }

of course one uses "colored" functions the other not, but both have one thing in common: a bad programmer can make serious mistakes with both.

pron · on May 2, 2022

> is not better or worse than

It is better for two reasons:

1. Virtual threads are threads as far as the Java platform is concerned, meaning you get the same troubleshooting (stack traces), debugging, and profiling support by the runtime and its tools as for platform (OS-backed) threads.

2. There is no split-world of APIs, with separate and largely incompatible flavours, that need to be developed for two constructs that are semantically (almost) equivalent.

You are correct, however, that implementing user-mode threads is harder than implementing async/await. While the latter requires only changes to the frontend compiler (until you want to add tooling support), the former requires deep changes to the backend.

merb · on May 2, 2022

> 1. Virtual threads are threads, meaning you get the same troubleshooting (stack traces), debugging, and profiling support by the runtime and its tools as for "platform" (i.e. OS-backed) threads.

I never had any problem with debugging and profiling Futures/async/await. Neither in Rust/C#/Scala. And stack traces like Exceptions were never a problem, in fact go panics were way more troublesome... (and still are, most often you dont run into them...) Most of the time depending how "concurrent" your program is, it also makes just no sense to debug them anyway, if you deal with tons of concurrent code you should learn to make good log statements, it's bread and butter, especially once your in production.

> 2. There is no split-world of APIs, with separate and largely incompatible flavours, that need to be developed for two constructs that are semantically (almost) equivalent.

well yeah... but the question is why do you try to hide it? that programmers don't see that the code might run concurrent? most goroutine bugs I've seen happened because of the hidden concurrency. and it's odd that an important concept gets hidden, for the sake of split-world apis. heck thats exactly why there is a type system, to express the meaning of code. with green threads it's not really obvious on first sight what the code exactly expresses.

(P.S.: I switched pretty early in my career to Scala, now to Rust/Golang/C# so maybe I'm more biased when it comes to a strong type system and I prefer a strong typesystem with all it's quirks, i've seen nasty bugs with goroutines and with async/await, so I doubt there is any real world benefit over the two worlds)

pron · on May 2, 2022

> I never had any problem with debugging and profiling Futures/async/await.

When you profile a server written in the asynchornous style (with, say, JFR), the server can be under heavy load and yet the profile will only show idle thread pools. I don't know what you mean by having no problem profiling, but the Java platform offers no mechanism to profile asynchronous code.

> but the question is why do you try to hide it?

We don't. Virtual threads are threads, and just like threads today Java or Scala or Rust, you need some explicit operation to perform a concurrent task on some other thread.

> and I prefer a strong typesystem

Great, but that has nothing to do with threads. Rust and Scala support threads, too, and do so without any indication in the subroutine's type for blocking operations.

pie_flavor · on May 2, 2022

If you assume it's not concurrent when it is, that's the programmer's fault. This is true of both goroutines and threads in equal capacity; distinguishing between the two won't stop that from happening. If you assume it is concurrent when it isn't, nothing bad happens. The point of 'function color' is that hiding it implies it needs to be shown, which it doesn't. Rust hides from you which register a local variable is stored in, because it is thoroughly irrelevant to you and exposing it would require you to care when Rust is perfectly capable of managing it for you.

There are legitimate reasons to prefer a colored approach to asynchrony. But the two points are good points and you should prefer the colored approach only when said legitimate reasons outweigh those points given the language's design.

Thaxll · on May 2, 2022

Troubleshooting async is a pain in every language I worked with , that being node or Java ( vertx ). It's just terrible to get the real error, stack trace that don't even show the real issue, in Java I remember many times getting 150 lines of useles stacktraces, still figuring out from where the error was thrown.

A panic in Go will always show you exactly where the problem is in the first lines of the panic.

nu11ptr · on May 2, 2022

To be fair, the comment above is about how easy it is for the _user_. Having read docs on go's scheduler design I don't think many would argue it is easy to write such a runtime for the core devs.

merb · on May 2, 2022

I added a snippt which also explains the user side, I doubt either has a benefit. both can be misued. I already seen bad go code because of this and I've also seen bad C# code. In fact go code is harder to reason because of this, since you do not know exactly when a function spawns a new goroutine.

nu11ptr · on May 2, 2022

I don't think it is a huge difference, but I still prefer the straight threading model. It uses less keywords, looks like regular blocking code, and most importantly, there is exactly "one" world - not a sync vs. async world that have to coexist. I'm speaking generically, not specific to Java (which I have not written in many years).

> In fact go code is harder to reason because of this, since you do not know exactly when a function spawns a new goroutine.

This is no different in the async/await world, as you never know what function will launch a new "task" (this concept exists in Rust/Python and I assume others). If you don't need a new "thread", both models allow waiting for multiple things simultaneously.

noncoml · on May 2, 2022

> This seems so obvious in hindsight as the "right way". I wonder why we went down that whole async/await craze with so many languages?

Hindsight? There are many of us that have been advocating against the async/await madness.

I said it before and I will say it again. Async is today what OOP used to be in the nineties.

kjeetgill · on May 2, 2022

OOP isn't that bad in moderation! My personal proprietary blend looks like 5%/20%/35%/40% declarative/functional/OOP/imperative these days. I'm definitely wary of letting too much functional or OOP zealotry. I try to avoid all the monad/applicative/functor stuff and too many abstract base classes, excessive interface indirection.

But I am a Java pleb.

zmmmmm · on May 2, 2022

OOP is bad in the way that many magnificently successful things are bad. People only work up the energy to fully critique things when they are so pervasive and ubiquitous. At which point they are entirely associated with their flaws and none of their positive attributes which are all treated as orthodoxy.

jayd16 · on May 2, 2022

Having more threads doesn't solve inter-thread concurrency like async/await does. With async/await you can achieve concurrency while ensuring there parallelism as well as is no parallelism (to avoid explicit critical sections).

I don't think you could, for example, use this model of many virtual threads with a traditional GUI framework as designs are still predominately single threaded by nature.

Said another way, cooperative scheduling has its place and will continue to.

I'm excited to see some threaded UI designs shake out of this, though.

kjeetgill · on May 2, 2022

Yea, it won't solve those things. I think the appeal for languages that already used locks/queues/atomics with their existing threads don't find this to be too, too disruptive.

Spivak · on May 3, 2022

You can’t get parallelism with just async/await. Concurrency yes, but as soon as you want to run two async jobs at the same time you take on all the complexities of threads again (because that’s how you actually get parallelism).

jayd16 · on May 3, 2022

>you take on all the complexities of threads

Well no, not _all_ of the complexities because you have await and explicit yielding. It's much easier to pass data from one asynchronous task to another using await than it is to, say, manually code the locks or state machine necessary for the message passing and/or callbacks.

Spivak · on May 3, 2022

Await and explicit yielding do nothing as soon as you want to actually run multiple tasks at the same time in parallel instead of sequentially but interleaved.

jayd16 · on May 3, 2022

You can just start the tasks and then await all? At least in C# it also provides a clear system on how to switch scheduling contexts at an await.

gorjusborg · on May 2, 2022

async/await doesn't solve anything with concurrency, working on a single thread does.

you can do that as a design choice in languages that aren't exclusively single threaded.

jayd16 · on May 3, 2022

What do you mean by this? Async/await provides a standard way to handle message passing between threads, a cleaner syntax than nested callbacks and a lot of other niceties over raw threading.

Async/await is in many single and multi-threaded languages so I'm not clear on what you're saying.

gorjusborg · on May 4, 2022

What I mean is: async/await is just syntactic sugar over promises/futures and those just buy you a way to schedule future computation. The js runtimes schedule all your code on a single thread. It's the fact your code only runs on a single thread that 'solves concurrency' because data races can't happen if one thread of code is allowed to access memory at a time.

You can set up the same limitation in a threaded program/language, and people have (see vert.x, clojure STM, and others).

gpderetta · on May 3, 2022

If you are using threads you wouldn't use nested callbacks in the first place.

In a proper threaded model you can use exactly the same syntax that you would use with async/await (futures, future combinators, message passing what have you) except you do not have to randomly annotate your code with awaits.

stonemetal12 · on May 2, 2022

>Every unit of concurrency in the application domain can be represented by its own thread, making programming concurrent applications easier.

Funny, I have never heard anyone describe threads as the thing that made concurrent programming easier, or more threads would make it easier.

What makes concurrent programming easier is thread interaction models that make it hard to lock the process.

async\await, give you concurrency without the need to for the developer to explicitly write lock code. Much easier than having to manually handle that interaction.

cogman10 · on May 2, 2022

> async\await, give you concurrency without the need to for the developer to explicitly write lock code.

async\await in no way relieve the developer from needing to worry about locks. If you have shared mutable memory, you have locks. If you aren't considering that, then you've got broken code.

> What makes concurrent programming easier is thread interaction models that make it hard to lock the process.

Yes, the way you do that is by focusing on message passing and immutable data when you can get away with it and, ideally, prebuilt thread safe datastructures when you can't.

What async\await buys is lightweight threading when lightweight threading isn't available. It allows you to have millions of concurrent processes running at the same time. HOWEVER, the cost of this is the "colored function" problem. You HAVE to mark up your code to let the compiler/framework know "this is code that can block and thus needs to be able to give up control and resume". This problem is difficult because you can't simply call an async function from a non-async function. You have to do some wrapping/juggling to make everything play nice.

The reason lightweight threading is nice is because you no longer have the colored function problem. From any reference or context you can say `CompletableFuture.supplyAsync(()->calculateValue())` and have a new concurrent action spawned. If that action does IO or whatever, it doesn't hog a thread, it just yields (like async/await does) and lets another task move forward.

The only reason for async/await is OS threads are expensive and have pretty large negative implications on operating systems.

jayd16 · on May 2, 2022

>async\await in no way relieve the developer from needing to worry about locks. If you have shared mutable memory, you have locks.

This isn't really true. For example, UI code often uses async/await for the concurrency but a single UI thread to prevent locking. Because this design is only concurrent and not parallel you don't need to worry about thread safety while still getting a way to yield execution to other code.

pmoleri · on May 3, 2022

I think it depends on the language. What you say is true for JS because it's single threaded but .Net has a thread pool, that means that you can have concurrent and even parallel tasks attending different events from your single thread UI.

jayd16 · on May 3, 2022

You cannot use a single UI thread in a parallel way. Its a contradictory statement. Parallel is the opposite of single threaded.

C# does have a thread pool that you can use with async/await with multiple threads but that would be different from a UI thread or its synchronization context.

pmoleri · on May 3, 2022

It's not contradictory, the UI is single threaded but on the dispatched events you can perfectly have long running tasks that can perform parallel work and affefct common memory.

It's also my understanding that if you stick to async/await and don't create your own tasks, the synchronization context will be the one of the UI thread thus never run in parallel. However, this doesn't prevent you from doing so.

jayd16 · on May 3, 2022

Ah I see what you mean. Sure, you can do that if you desire. My point was simply that the system allows you to not do that. It's easy to get concurrency without parallelism. And it's also easy to get parallelism. But yes, In C# you're only guarded by self control and the slightly cumbersome calls it takes to jump threading context. It's not hard to stay in a single context but it's not enforced.

gpderetta · on May 3, 2022

But you can do the same with threads: pin them to a single cpu/scheduler and use a non-preemptive scheduling strategy.

At some point you realize that explicit continuation passing, threads, coroutines and async/await are all the same thing, and the only thing that async/await gives you is a static bound on your suspended stack size (by enforcing suspension of a single activation frame).

Nullabillity · on May 2, 2022

> async\await in no way relieve the developer from needing to worry about locks. If you have shared mutable memory, you have locks. If you aren't considering that, then you've got broken code.

The huge difference is that futures (and thus async/await) change the primary concurrency concept you think about into "wait until this value is available" rather than managing the locks and shared memory yourself.

The latter is still available (and still as footgunny as ever) but aren't as much of a problem when better options are more ergonomic.

Futures also compose far better than threads do, especially when you want to wait for "any of X" rather than "all of X".

cogman10 · on May 2, 2022

> The huge difference is that futures (and thus async/await) change the primary concurrency concept you think about into "wait until this value is available" rather than managing the locks and shared memory yourself.

Java has had Futures for a while (and pretty good ones since Java 8's addition of "CompletableFuture")

That being said, the biggest issue with Java Futures has not been the futures themselves, but rather the management of the ThreadPools for when you want to do IO in the futures.

As pron points out, virtual threads make that WAY better to work with.

pron · on May 2, 2022

I don't understand the distinction you make between futures and threads. Virtual threads make working with futures more pleasant.

SemanticStrengh · on May 2, 2022

JS async is lock free though.

cogman10 · on May 2, 2022

JS has an uber lock. JS has a strict requirement that only one thread can run javascript code. That means that whenever you get to that `await` block, you've effectively established a whole application lock which prevents other awaits from proceeding (until your code yields). In otherwords JS isn't lock free, it's got 1 really big lock that covers all user code.

If you really wanted to, you could simulate this behavior in threaded languages by having a global lock you acquire before you run anything.

It's the python GIL problem [1]

Languages like kotlin, rust, C++, or C# with async/await MUST concern themselves with locks because they don't acquire whole application locks anytime they run a sliver of code.

[1] https://realpython.com/python-gil/

SemanticStrengh · on May 2, 2022

Great explanation in general except web workers are a thing and sharedarraybuffer allow shared memory between them. Thats why js has atomics. Although while js can have conçurent races via shared memory, it doesn't expose locks IIRC? If so that's bad

cogman10 · on May 2, 2022

Web workers are certainly weird... and it's generally best to ignore them :D

But if you don't, then yeah, you sort of run right into the need for an actual lock since you can have shared memory in concurrently executing interpreters. Entire: Web Lock [1] Specifically designed to resolve this issue.

In typical JS programming this is not an issue, but if you are trying to really abuse that javascript VM then this is the way you'd do it.

[1] https://developer.mozilla.org/en-US/docs/Web/API/Web_Locks_A...

SemanticStrengh · on May 2, 2022

Oh thank u web locks are a thing then. Well in all seriousness there are many valid uses for web workers, e.g. For offline processing of localstorage

ackfoobar · on May 2, 2022

You will need locks if you need mutual exclusion across an `await`.

Matthias247 · on May 3, 2022

> This seems so obvious in hindsight as the "right way". I wonder why we went down that whole async/await craze with so many languages?

My theory on is that it evolved as a consequence of people trying to solve problems on a level they were familiar and comfortable with. E.g. let's say there were C and C++ application programmers who wanted to scale their applications to handle more clients than a OS supported at that point in time. Their solution was to use event-driven OS APIs, and build abstractions on top of that (libuv, boost) that they found comfortable to use. Then we went one level further, and compiler and programming language experts got aware about the problem, which tried to solve it with language extensions (async/await).

I fully confess I'm guilty of this too, by having used/promoted/evolved various async frameworks in the last 10 years. It's a super interesting problem to work on, a bit on the research side, and it feels fulfilling to design some elegant solutions on top of the "i have to async paradigms" problem.

But ultimately the compiler magic on top of callbacks on top of event-driven OS APIs workarounds definitely doesn't seem to be best solution for regular application developers, since the abstractions are very leaky and now application developers need to be aware about how all those things work and work together.

I really like the Loom solution and look forward to it, since it means the application space doesn't have to deal with colored functions anymore. But ultimately I'm wondering in the meantime whether we should rather try to find an OS level fix for those OS level problems instead of trying to work around them on a higher level. There's probably 100x the amount of code written on async runtimes than for the kernel schedulers and IO subsystems themselves, which feels wrong.

zozbot234 · on May 3, 2022

Funny thing, we used to call those "userland threads" or "green threads", and they were even a selling point for Java before 1-to-1 mapping with system threads became more popular. How long before we insist again on having special OS-scheduler support for those "virtual" threads?'

sporkland · on May 2, 2022

Eric Brewer (CAP theorem) wrote about it in 2003: https://web.stanford.edu/class/cs240e/papers/threads-hotos-2...

dboreham · on May 2, 2022

Big thank you from me -- so good to see someone pushing back hard against the async insanity.

jpgvm · on May 2, 2022

It's much easier to implement async/await, especially in languages that don't have as pervasive a runtime as JVM and/or have substantial amounts of native code extensions that can't be managed, i.e Python/JS/Ruby.

fabioz · on May 2, 2022

Well,

https://github.com/python-greenlet/greenlet

has been available for quite some time in Python (gevent probably being its most used flavor).

Note that it it actually predates the async/await approach which was incorporated into the Python language (so, in Python it was implemented as a third-party library -- even async/await had an implementation based on Python 2 using yield and some decorators: https://pypi.org/project/trollius/).

jpgvm · on May 2, 2022

Yeah I am familiar but using gevent w/monkey-patching is nowhere near the same experience as using Loom. Not to mention that if you patch threading to make it more gevent friendly you can also run into all sorts of fun with locking that wasn't designed for it etc.

I really do think Python should have instead adopted gevent as it's async approach instead of asyncio and async/await etc.

fabioz · on May 2, 2022

I must say that I definitely don't like the gevent monkey-patching myself -- I prefer to just use the different APIs, although yes, that can end up in a blocking call when there should be none, but that hasn't been a problem I've been bitten by so far -- although it's been a while since I had to use it too ;)

nu11ptr · on May 2, 2022

Yeah I would imagine anything blocking would need special support, which is definitely a downside. Async/await is no different in that regard, but since explicitely marked, you know anything that isn't "awaited" and yet blocks is gonna break things, so might be easier to avoid mishaps in practice.

RandomBK · on May 3, 2022

There was an article making the rounds a few years ago [0] that made some rather blunt and pointed arguments _against_ stackful concurrency. I'm not an expert in this field, but it would seem to me that while some of these arguments don't apply in the JVM world, many others do.

Does anyone know how Project Loom has gotten around these constraints?

[0] http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p136...

gpderetta · on May 3, 2022

There were WG21 papers arguing exactly to opposite of that, so eh.

One big (and IMHO the only) advantage of the async/await model is that the stack size can be bounded. But if you control the VM like in the Loom case, you can expand and shrink stacks as needed, so it is much less of an issue.

olliej · on May 2, 2022

> This seems so obvious in hindsight as the "right way". I wonder why we went down that whole async/await craze with so many languages?

That’s basically what async swift does?

It is also orthogonal to async/await.

Async/await is a mechanism a programming language uses to allow you to write straight line code without having to write blocking code. You could resolve that by creating a hulking thread, but that just means you need to await a thread result :)

Also JS isn’t going anywhere, and it seems unlikely it will ever get multiple execution threads

pmoleri · on May 3, 2022

async/await reveals to the consumer that at some point it will exit the code realm for some I/O task and that the result will be asynchronous, you like it or not, this is what really happens. Threads approach conceals that fact and wait for the return without you noticing. So, one disadvantage I see, is that you'll need to guess where the time consuming tasks will be done and create threads for them. The most perfomant code will be the one written by who knows better the underlying code.

akvadrako · on May 2, 2022

That quote is wrong. Linux can support over a million threads. Of course they can't all be "active" since there isn't more than a few hundred hardware threads, but that's also true of green threads.

chrisseaton · on May 2, 2022

> Linux can support over a million threads.

How many servers have a spare half-terabyte of RAM to do that though?

((512 * 1024 * 1_000_000) / 1024 / 1024 / 1024 = ~500)

akvadrako · on May 2, 2022

First of all, 512GB of RAM isn't that unusual. If you are supporting a million clients, most applications will need a beefy server. If only 0.1% of your clients are active, that's 1000 cores.

Secondly, you don't need 512kb per thread. The kernel will only use pages that are actually used, so by default new threads will use 4kb.

chrisseaton · on May 2, 2022

I think, but can't prove right now, that Java pre-commits the stack so it has enough head-room to deal with exceptions. Maybe it shouldn't, but it currently does.

And the point still-stands - hardware threads are heavier than virtual threads.

ec109685 · on May 2, 2022

It allocates the entire stack, but it doesn't write into it, so it doesn't take up resident memory until used.

chrisseaton · on May 2, 2022

I'm not a HotSpot expert, but my understanding is it puts the stack through add_reserved_region to commit it.

CraigJPerry · on May 2, 2022

In large envs, 512gb ram servers were circa 2015. The servers being racked in those envs today can have > 10Tb and it would be unlikely to rack one with less than 2Tb, that would be a specialised node for some specific workload.

chrisseaton · on May 2, 2022

If you have a 2 TB server, which most cloud instances for example are not, that's still 1/4 of your RAM... just for stacks. That can't be an efficient use of RAM.

CraigJPerry · on May 2, 2022

Cells in hardware RAM are only used to hold actual data (and some are wasted to alignment). I may claim to have reserved 512gb of ram for stacks, but unless I’m actually using the space (in which case there’s no issue) then physical ram is not being consumed.

chrisseaton · on May 2, 2022

Doesn't Java pre-commit stacks?

And even if it doesn't... those TLB entries are also not free.

CraigJPerry · on May 2, 2022

As in write to each requested memory location to guarantee physical allocation? No, that would be really slow. It’s not like heap memory where you probably want to memset it first, with stack you have a stack pointer so you know whats been written by you and what’s random garbage.

Java just requests the memory and linux just smirks and always says “yes sure, whatever you need” - it can do that even while the OOM killer is trying to decide which process looks like it has completed the most useful work so far :-)

TLB entries dont really come into it (the size isnt an issue from our stack usage but even if it were, we could use larger pages).

TLB flushes could be an issue with that many threads floating around but that’s a different story and not really a memory usage so much as a performance and latency issue.

chrisseaton · on May 2, 2022

> As in write to each requested memory location to guarantee physical allocation?

No you use the advisory bits when you map the memory.

It does this so it isn't unable to commit more memory while trying to handle an exception.

> Java just requests the memory and linux just smirks and always says “yes sure, whatever you need”

Linux doesn't ignore even LOCKED does it? And it certainly can't ignore actual writes to pages, which is what for example pretouch_memory does.

CraigJPerry · on May 2, 2022

>> No you use the advisory bits when you map the memory.

Is that really what’s happening when we create another thread? That doesn’t fit with my understanding here. When i create a new thread, i expect its stack to be allocated before the break so mlock wouldn’t come into it. Even on a jvm with Xms smaller than Xmx (i.e. so the jvm could have to request to move the break and grow the heap to accomodate my new threads private jvm stack), I’m expecting you would have to explicitly request this behaviour somehow (because it’s quite unneighbourly and probably counter productive for most apps). I don’t know how you would do that, maybe there’s a call available under the unsafe packages?

I’m conscious I’m saying the jvm - i mean hotspot.

>> so it isn't unable to commit more memory while trying to handle an exception

Again this doesn’t fit with my understanding - you can receive a further OOME while handling an exception and that’s just bad luck.

>> Linux doesn't ignore even LOCKED does it? And it certainly can't ignore actual writes to pages

Sure but I’m going on the expectation that doesn’t apply here

chrisseaton · on May 2, 2022

> When i create a new thread, i expect its stack to be allocated before the break

Why? Stacks are just memory like any other. You can put them wherever you want.

> so the jvm could have to request to move the break and grow the heap

The JVM allocates heap space using mmap, not by moving the break.

> you can receive a further OOME while handling an exception and that’s just bad luck

If you're being reasonable you don't - the JVM maintains emergency heap storage (pre-committed!) space so it can keep allocating even when out-of-memory.

CraigJPerry · on May 2, 2022

P.s is that you up ticking all my comments? Now i feel un-neighbourly not having done it back… fixing now

rossjudson · on May 2, 2022

Last time I looked at this in detail:

The JVM commits a suitable number of pages for the stack. Stack growth can mean new page commits to expand the stack. After a thread is idle for some defined period of time, the JVM may release back some of the pages used to expand the stack.

If you actually want to take advantage of this behavior, you have to pay attention to which threads are being scheduled to do work -- because if you randomly choose a thread from a pool they're all doing work, and the JVM thinks none of them are idle. You need to focus on particular threads.

And at the end of the day if you're allocating a certain number of threads to do "task-like" work, you need to be prepared to deal with the expansion of thread stack/overhead.

So overall...I like this new proposal. Building cooperative scheduling deep into the libraries makes sense, as does breaking the 1:1 ratio to OS threads.

In other cooperative scheduling libraries I've seen, that 1:1 ratio is retained, and this inevitably leads to a schism in programming: Those who believe that the "lightweight" thread truly is, and those who have learned otherwise.

_old_dude_ · on May 2, 2022

The problem is not Linux per se, it's the fact that pthread_create [1] reserves 2M for the stack, so the process runs out of memory if you have 1 million threads (1M * 2M == 2T)

[1] https://man7.org/linux/man-pages/man3/pthread_create.3.html

akvadrako · on May 2, 2022

It only reserves virtual memory.

cogman10 · on May 2, 2022

Which has a real nasty habit of becoming real memory as applications process stuff.

akira2501 · on May 2, 2022

Well.. by default. You can set it to the minimum if you like, which is 16384 bytes.

PaulDavisThe1st · on May 3, 2022

The stack size is configurable ahead of time. It's not fixed at 2M.

ec109685 · on May 2, 2022

There's OS level context switching as threads move on and off the CPU that you don't have to pay with the green threads approach.

davidgay · on May 3, 2022

> I wonder why we went down that whole async/await craze with so many languages?

Because it's a useful abstraction for managing N concurrent activities. In the threaded world, it's known as fork-join.

moomin · on May 3, 2022

A more interesting question for me is: has this been tried before? In Java? And why did it fail?

orasis · on May 2, 2022

async/await is FAR less error prone than explicit threaded concurrency.

ferdowsi · on May 2, 2022

Congratulations to everyone who thought this through and implemented it. I'll go so far as to say that Project Loom makes Java a more desirable web programming language to me than any language that has colored asynchronousity, like Rust, C#, Python, etc. Async/await is one of the worst design patterns to emerge in the last few years and I'm glad that the tech world is coming to its senses.

bestinterest · on May 2, 2022

Java will join the (imo) better concurrency path of higher level languages with green like threads, Golang (goroutine), Erlang, Java (Loom, virtual threads) & Ruby (soon under Ractors?).

I really think C# & Python will be jealous of the languages mentioned and puts them in an odd spot from language design perspective.

fabioz · on May 2, 2022

Well,

https://github.com/python-greenlet/greenlet

has been available for quite some time in Python (gevent probably being its most used flavor).

i.e.: just because it's not in the standard library doesn't mean it's not available (so, you can actually choose whether you'd like to use async/await or greenlets).

azinman2 · on May 2, 2022

Because it’s not in the standard library means there are various issues with compatibility. Everything needs to be designed/used for greenlets in mind. It’s not quite the same as the VM itself saying we’ve sorted it all out.

fabioz · on May 2, 2022

Well, whenever you add threads to the mix (native or virtual threads) if you don't design for it, I don't think things pan out properly (but I guess that I agree that if someone designed with native threads going for virtual threads where the same APIs can now make IO async switch threads it's probably easier to go that route).

I must say that the main usage I personally had of greenlets didn't have it in mind initially (it was used in an existing application where making the coloring wasn't really feasible as it was a huge app already and adopting green threads was much less work).

nu11ptr · on May 2, 2022

As much as I love Rust for serious code and Python for scripting, I do much prefer the green thread model to async/await in general.

abledon · on May 2, 2022

for anyone interested in Rust's choice to not use the green thread model:

https://www.reddit.com/r/rust/comments/7x0icm/regarding_gree...

https://github.com/rust-lang/rfcs/blob/master/text/0230-remo...

> Initially, Rust supported only the green threading model. Later, native threading was added and ultimately became the default.

nu11ptr · on May 2, 2022

I get why Rust opted for native threads vs. green threads as a default, esp. when they want to promote zero cost abstractions. What the above links don't cover, and what I'm interested in, is why async/await instead of M:N/green threads for the secondary "light weight" threading model. Was async/await easier to make "zero cost"? Or what was the reason?

amaranth · on May 2, 2022

In theory async/await can be compiled to something as efficient as if you wrote all the non-blocking logic by hand so yeah, it's a "zero cost" abstraction.

For green threads the stdlib would have to be designed around them (again) or they'd feel like something bolted on the side with two sets of APIs for everything. This is also the case for async/await but that means green threads don't have an advantage there. If you have to worry about Task vs Thread, make sure you all the right APIs from each, figure out what to do for TLS in Tasks, etc then you may as well just use the one that is more efficient.

kmac_ · on May 2, 2022

Green threads and async/await are orthogonal concepts. Both can be used at the same time and separately. Jep 425 brigns VM threads, but the API remains same as the old one. Without deep language extensions any concurrency improvements, in terms of usage, aren't possible. C#, Rust, JS made the successful leap, and imo Go route is very interesting.

aardvark179 · on May 3, 2022

For Ruby I’d say the closest equivalent would be Fibers with the scheduler support introduced at 3.0. They take a different approach with pros and cons compared to virtual threads, but they are similar.

SemanticStrengh · on May 2, 2022

Kotlin state of the art coroutines should be mentioned too

deepsun · on May 2, 2022

Isn't Kotlin coroutines have the same coloring problem (functions must be marked with "suspend").l?

Although the coloring problem is only a problem if red functions are harder to use. Not sure marking all functions with "suspend" would make it any worse, besides infecting the codebase.

ackfoobar · on May 2, 2022

Here the Kotlin lead argued that it is not a "problem".

https://elizarov.medium.com/how-do-you-color-your-functions-...

vips7L · on May 2, 2022

I don't really see how that's not a problem. While not having to specifically await you still need to mark functions as suspend and only call suspending functions from other suspending functions.

I am also not sure how this explicit marking would work with interfaces. Can you create an interface that can be implemented by both suspending and synchronous functions?

ackfoobar · on May 2, 2022

> you still need to mark functions as suspend and only call suspending functions from other suspending functions.

You restated the idea of function colouring without further elaborating why it is a problem.

If your thread can afford to block and want to call a suspend function, use `runBlocking`. If your thread cannot block, having `suspend` in the type just saves you from a bug.

vips7L · on May 2, 2022

> You restated the idea of function colouring without further elaborating why it is a problem

Did you not read my second paragraph?

ackfoobar · on May 2, 2022

> I am also not sure

It felt like a question more than a complaint.

> create an interface that can be implemented by both suspending and synchronous functions

If interface has a suspending method, the implementations will also be suspending. But you can choose not to make any suspend calls in the method body. Not being able to say "this particular implementation of a red interface is blue" has never bothered me.

vips7L · on May 2, 2022

I guess my complaint is that it doesn’t work with object oriented programming. It’s the same reason why I don’t like checked exceptions, checked nulls, or Result types.

SemanticStrengh · on May 2, 2022

runBlocking can't run regular functions right? Othwerwise what's preventing you from making the main() runblocking and have all functions under a context that allow both types?

If runblocking can't allow non-suspending function then you got a problem because that mean a method implementing an interface can either be suspending or not and therefore for one of two implementation, fail.

ackfoobar · on May 2, 2022

You can absolutely do this.

`fun main() = runBlocking<Unit> {`

https://kotlinlang.org/docs/composing-suspending-functions.h...

SemanticStrengh · on May 2, 2022

Then you'll have to explain why that isn't the default main behavior then. Synchronous functions cannot call async functions (without defining N runblocking scopes) but a runblocking main would make kotlin seems color-less and should be seen as best practice. What's the catch?

ackfoobar · on May 2, 2022

> would make kotlin seems color-less

Quoting the post:

> Having to mark asynchronous functions with suspend modifier is a small price to pay, but in return you get better insight into your code.

Also you can do `suspend fun main()`

SemanticStrengh · on May 2, 2022

> Can you create an interface that can be implemented by both suspending and synchronous functions? I don't remember.

I do know you can at least specify suspend in an interface method to enforce its implementers to be suspending too. Note that this is e.g impossible in typescript..

SemanticStrengh · on May 2, 2022

The main coloring problem beyond async is the wrapping of return types in futures, which Kotlin groundbreakingly make transparent.

criddell · on May 2, 2022

If it’s a better model, why wouldn’t C# and Python adopt it?

jayd16 · on May 2, 2022

Yeah I really don't see an issue with C# building a virtual thread system if the paradigm really shifts that way. Its aggressively pragmatic in that way.

In theory, such a system could be implemented on C# with the same sort of gotchas as Java virtual threads. I don't think there's a fundamental design conflict.

mike_hearn · on May 3, 2022

There's no design conflict I'm aware of but you don't "just" implement virtual threads. The PR to do it in HotSpot is notoriously huge:

https://github.com/openjdk/jdk/pull/8166

1,140 files, (+98,553 −9,862 LOC) and those lines of code are mostly horribly fiddly low level assembly/compiler hacking. This is partly why it took years of development.

Most people don't realize this, especially because in the later stages of course many others contributed, but Loom is in some sense one man's journey. Before he worked at Oracle Ron Pressler spent years writing a library called Quasar which implemented fibers on top of the JVM using bytecode rewriting and some low level hackery with internal APIs. At some point he became available for hiring and Oracle brought him on board, as by that point he was not only an expert in fiber implementations but also the JVM. That was the genesis of Loom.

Something I've learned from following the intricacies of VM development is that what we get is very much a result of hidden human stories as well as technical decisions. Features happen or don't happen on the basis of who was available to be hired at the time, as much as cold calculations of performance impacts. In turn that depends heavily on the vagaries of personal lives. The skills needed to do this work aren't that easy to find on the open market and training takes a long time.

For other VM implementors to do this, and realistically only .NET has the sort of languages where it makes sense (JS doesn't), well, it'd take a long time even if they start today and there's no guarantee of success.

RcouF1uZ4gsC · on May 2, 2022

Although I am not a Java user, this is great news. Asynchronous code, even with a await, adds a lot of complexity. It leads to the “color” problem of functions, as well as increased complexity in languages without garbage collection (my understanding is that async is what drove a lot of the requirement for Pin in Rust.

There are also advantages in debug ability and performance tracing for threads.

It would be nice to have (virtual) threads that are lightweight and efficient enough that we did not need async for writing highly concurrent software.

I wonder if 20 years from now, programmers will look at async, kind of like current programmers look at segmented memory or manual register allocation —- something that was necessary for performance in a bygone era, but is now not needed.

pjmlp · on May 2, 2022

Lets not forget that Modula-2 already had co-routines, Ada tasks, Active Oberon active objects, Concurrent Pascal, and plenty of other examples.

If anything, this is yet another example of the decades how old concepts still take to become widespread across all major mainstream languages.

reggieband · on May 2, 2022

> When code running in a virtual thread calls a blocking I/O operation in the java.* API, the runtime performs a non-blocking OS call and automatically suspends the virtual thread until it can be resumed later.

How does it know that an operation is blocking on I/O? Is this limited to stuff baked into the language or standard library?

What is the mechanism of suspension and resumption? They mention elsewhere that any platform thread could pick up any virtual thread so I assume they must be storing the stack somewhere. Is there a cost transferring stacks on virtual threads across platform threads? Does this introduce new security implications if there are less OS level restrictions on memory access between platform threads?

What happens in cases where an application has user defined platform threads? How does the system determine what platform threads are available to the virtual threading system?

I think this is probably the right decision for Java and I agree with all of their motivations. I personally like the explicitness of async/await, however I assume Java devs are very familiar with threading in general. I believe this allows an easier path to migrating existing Java code.

jeremyjh · on May 2, 2022

> How does it know that an operation is blocking on I/O? Is this limited to stuff baked into the language or standard library?

Most libraries use the I/O primitives from the platform standard libraries, so the behaviour is going to trickle out from there. If you aren't using those libraries, the only other way to do I/O would be to use native code such as via JNI, and the runtime would schedule that on a thread pool and so it would tie up an OS thread for the duration of a function invocation.

zmmmmm · on May 2, 2022

thanks, this explains the mysterious magic that I think is probably bugging a lot of people.

It will be interesting to see how much of a coloring problem this creates, if any. I guess the great thing about having it be standardised and baked into the language / VM is it will quickly become de facto best practice to make any such library code compliant with virtual threads. And for all its various downsides, the java ecosystem has always leaned away from native code and treated it as a last resort rather than leaning into it like Python etc. So likely it will not be a huge issue the way the coloring problem is in the async/await situation.

samus · on May 5, 2022

As a preparation, some parts of the standard library indeed had to be rewritten in pure Java to be compatible with Loom, for example the socket implementations.

https://openjdk.java.net/jeps/373

http://openjdk.java.net/jeps/353

jeremyjh · on May 2, 2022

Yes in other languages with this baked into the platform - such as Haskell, Go and Erlang - it just works and there is no color problem.

kevincox · on May 2, 2022

> Is there a cost transferring stacks on virtual threads across platform threads?

You shouldn't need to "transfer" the stack. There isn't really a "platform stack", that is just whatever your stack pointer is pointing at. So it is perfectly fine to allocate many stacks and switch between them within one OS thread just by changing the stack pointer.

Of course if you are allocating a "full stack" then the main question is what is the point? IIUC the biggest cost of threads is the memory allocated to the stack. So unless you are doing something clever you don't get much benefit. There are many approaches here but I guess it is up to the runtime to pick one.

aardvark179 · on May 3, 2022

The largest cost is that you are allocating a fixed size stack, and that usually has to be big enough for whatever thread you might run. Virtual threads only have a stack as big as they require, and that’s likely to be fairly small. A lot of thought went into the foot print of virtual threads, we had long discussions about individual fields!

aardvark179 · on May 3, 2022

> What is the mechanism of suspension and resumption? They mention elsewhere that any platform thread could pick up any virtual thread so I assume they must be storing the stack somewhere.

Virtual threads are built on top of underlying delimited one shot continuations, and those store the stack. A large part of the engineering effort has been in making this as efficient as possible.

> Is there a cost transferring stacks on virtual threads across platform threads?

A stack always has to be at least partially copied to the carrier thread’s stack, and it doesn’t really matter which OS thread that is. I say partially because initially only the current stack frame will be copied as most threads will have a deep stack compared to the number of frames active in any operation likely to yield.

> Does this introduce new security implications if there are less OS level restrictions on memory access between platform threads?

The security model remains intact. A virtual thread performing some privileged operation will be privileged no matter which OS thread it is run on, and one which is not privileged will not be no matter the OS thread it runs on.

> What happens in cases where an application has user defined platform threads? How does the system determine what platform threads are available to the virtual threading system?

Virtual threads don’t just run on a random OS level thread. They run on a scheduler (commonly a fork join pool) and are effectively a series of tasks fed to that scheduler.

samus · on May 5, 2022

When the security manager is removed, the issue of privileged threads will become moot.

ackfoobar · on May 3, 2022

> How does it know that an operation is blocking on I/O?

https://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part1....

"All Your Blocking Are Belong to Us"

Not sure how updated the document is. But the section title says it all. (It's a reference to a meme.)

pgh · on May 2, 2022

This is very reminiscent of the m:n thread support of Java on Solaris. https://docs.oracle.com/cd/E19620-01/805-4031/6j3qv1oej/inde...

SemanticStrengh · on May 2, 2022

What is old is new

twic · on May 2, 2022

One question: virtual threads are lightweight, but are the executors which create them?

That is, should i organise my code so i create a single executor and use it across many operations, or is it okay to create and destroy executors all over the place?

And are the executors threadsafe - both for use from virtual threads, and from native ones?

As an aside, these are things which is important to know, but which are very rarely documented. We had some headaches a while ago because we wrote code which treated the new HttpClient as lightweight when it is very much not.

kjeetgill · on May 2, 2022

The heaviest part of executors are the task queues and the threads themselves. If you're using 2 or 3 executors over a single one, you're presumably using the same amount of space in the queue and number of threads, so the "weight" doesn't change except that the queues are concurrent data structures so can be sources of contention ... in really odd hyperactive scenarios.

Executors tend to be units of management (await()/shutdown()) etc. and monitoring, so that'll be your main reason to use more than just one for everything.

mike_hearn · on May 2, 2022

What this means is: Loom ships in preview form in September, and if all goes well will probably graduate to a fully supported feature a year later. So it'll be September 2023 at the earliest before this can used. During that period you have to pass extra command line arguments to opt-in.

The good news is that long term supported releases got more frequent in recent times and the next one is due in ... September 2023. So there hopefully won't be any delays for people who have to wait for an LTS release.

pron · on May 2, 2022

Preview features [1] are fully supported — and are part of the official Java SE specification — they're just not finalised or "permanent" (and are, therefore, disabled by default). I.e. the API might experience some changes due to feedback, so those who wish to use it acknowledge, by means of the --enable-preview flag, that they accept the possibility of the API changing. However, an API is marked as Preview (as opposed to incubator[2]) only when it is close to being finalised, meaning that we believe it can be finalised in two releases (one year).

[1]: https://openjdk.java.net/jeps/12

[2]: https://openjdk.java.net/jeps/11

binarynate · on May 2, 2022

With Loom, what's the equivalent of the following Promise concepts?

1. Promise constructor (to turn an existing callback-based or async operation into one that can be blocked with Loom). JS example:

  await new Promise(resolve => someApiThatTakesACallback(resolve));

2. Methods like Promise.all() and Promise.any() to run multiple async operations concurrently. JS example:

  const [result1, result2] = await Promise.all([doSomething1(), doSomething2()]);

twic · on May 2, 2022

Here are my guesses.

1. Same as with normal threads:

  CompletableFuture<Foo> future = new CompletableFuture<>();
  someApiThatTakesACallback(future::complete);
  Foo result = future.get(); // this can take a timeout

2. If you mean that the operations are blocking and should be called in parallel using virtual threads, the equivalent of Promise.any::

  ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();
  Future<Foo> future1 = executor.submit(this::doSomething1);
  Future<Bar> future2 = executor.submit(this::doSomething2);
  Foo result1 = future1.get();
  Bar result2 = future2.get();

There is a version of that using ExecutorService::invokeAll, but i'm not sure it's any more concise, because you still need to unpack the futures one by one.

For Promise.any:

  ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();
  Foo result = executor.invokeAny(List.of(this::doSomething1, this::doSomething2));

The asymmetry between invokeAny and invokeAll here is slightly grating, but makes sense.

Note that in both cases, you probably already have an executor lying around that you can use for this.

samus · on May 5, 2022

Do note that the `Executor` interface now extends `AutoCloseable` and can now be used with the `try-with-resources`-statement. This is the most important building block for the new Structured Concurrency approach!

binarynate · on May 2, 2022

Thanks! Your examples are super helpful.

ackfoobar · on May 2, 2022

1. Callback interop

I was expecting something like call/cc, or `suspend(Cancellable)Coroutine` in Kotlin. But judging from the list of preview items[1] it seems the only way is to have the callback write the result to a Future and blocking wait on it.

2.1 Promise.all() can be done with `.map(future => future.get())` But it is a bit more complicated than that. [2]

2.2 Promise.any()

This is somewhat relevant http://mail.openjdk.java.net/pipermail/loom-dev/2020-Februar...

[1] https://download.java.net/java/early_access/loom/docs/api/pr...

[2] https://kotlin.github.io/kotlinx.coroutines/kotlinx-coroutin... "This function is not equivalent to deferreds.map { it.await() } which fails only when it sequentially gets to wait for the failing deferred, while this awaitAll fails immediately as soon as any of the deferreds fail."

binarynate · on May 2, 2022

Thanks!

papercrane · on May 3, 2022

In addition to the other comments, the other way to achieve this with loom is to use the new structured concurrency API.

This is overkill for a single task, but for your second example you could do a an invokeAny with something like this:

    try (var scope = new StructuredTaskScope.ShutdownOnSuccess<String>()) {

        scope.fork(() -> doSomething1());
        scope.fork(() -> doSomething2());

        scope.join(); // or if you wanted a 5 second timeout .joinUntil(Instant.now().plusSeconds(5))

        String result = scope.result(); // This will return the result of either doSomething1() or doSomething2(), whichever won the race to finish first.
        ...
    }

https://download.java.net/java/early_access/loom/docs/api/jd...

teh64 · on May 2, 2022

For the first one, I believe this is the correct method [2], which takes a Callable (the other submit methods take a Runnable, which does not seem to be generic).

For the second one, you can use an implementation of ExecutorService[1], which has the methods invokeAll and invokeAny.

Futures in Java are similar to Promises in Javascript, in that they represent a calculation that might not be completed yet.

1. https://docs.oracle.com/en/java/javase/17/docs/api/java.base... 2. https://docs.oracle.com/en/java/javase/17/docs/api/java.base...

ackfoobar · on May 2, 2022

The parent comment meant `callbackTakingFunction: ((Res, Throwable) -> ()) -> ()`, not `blockingFunction: () -> Res`

`ExecutorService#submit` works with the latter.

binarynate · on May 2, 2022

Thank you!

christophilus · on May 2, 2022

My guess for the latter is something like goroutines and channels, though I’m not sure. So: spin up a thread per “doSomething” and then join those in your main thread.

exabrial · on May 2, 2022

I think Java has been working on getting the API "correct" for a number of years. I appreciate the pace at which they are moving! Once this is released, we're going to have it for a very long time

jpgvm · on May 2, 2022

Rate of new features out of the Java team has been incredibly impressive.

djanogo · on May 2, 2022

The world biggest deployment of Java API is android, I wish Google had just bought out Java instead of copying it's API's, fighting it in court, and now using kotlin (with it's _IMO_ ugly syntactic sugar) to avoid legal issues.

pron · on May 2, 2022

> The world biggest deployment of Java API is android

Not even close. The entire mobile space — Android and iOS combined — is drastically smaller than just Java alone (not counting Android). You can confirm this by going to job websites that allow you to analyse their postings. Java, together with its sister leading languages, JavaScript and Python, is so popular that it's in a completely different ballgame than most languages.

SemanticStrengh · on May 2, 2022

It's only a matter of time before the amount of bullshit at Google reach a too strong threshold and are obligated to switch to full openjdk vs their abandonware. Unfortunately they are so mediocre they have made an incompatible bytecode so they will need some patches/translation layers atop jdk. BTW they are claiming JDK 11 support for Android 13. Of course this is a lie that only apply to the stdlib and not runtime support such as e.g constant dynamics.

uluyol · on May 2, 2022

Why would Oracle sell Java to Google? I'm pretty sure that a main attraction for them in buying it was the possibility of collecting $$$ from Google.

vips7L · on May 2, 2022

I believe OP means that Google had the opportunity to buy Sun but didn't.

homarp · on May 2, 2022

see https://news.ycombinator.com/item?id=31214253 "Achieving 5M persistent connections with Project Loom virtual threads"

theptip · on May 2, 2022

I found the JEP to be very well-written. A good summarizing paragraph:

> Virtual threads are a lightweight implementation of threads that is provided by the JDK rather than the OS. They are a form of user-mode threads, which have been successful in other multithreaded languages (e.g., goroutines in Go and processes in Erlang). User-mode threads even featured as so-called "green threads" in early versions of Java, when OS threads were not yet mature and widespread. However, Java's green threads all shared one OS thread (M:1 scheduling) and were eventually outperformed by platform threads, implemented as wrappers for OS threads (1:1 scheduling). Virtual threads employ M:N scheduling, where a large number (M) of virtual threads is scheduled to run on a smaller number (N) of OS threads

olliej · on May 2, 2022

Are these like “fibers” or whatever that weird not-a-full thread thing on windows?

I genuinely don’t know as I have never actually seen them being used, but they always seem to be described as a “lightweight” thread