> Green threads are different. The memory of a green thread is allocated on the heap. But all of this comes with a cost: As they aren't managed by the OS, they can't take advantage of multiple cores inherently. But for I/O-bound operations, they are a good fit.
this is clearly not true? Am I missing some nuance here, as I'm sure the author knows what they're talking about?
Green threads can totally use a multi-threaded runtime, like e.g. Go does, and it works just fine. The main hurdle with them is arguably FFI.
What this likely means is for you to take advantage of the underlying runtime multiplexing green threads over multiple physical ones running on multiple cores, you need to explicitly fork the execution flow.
This could be as simple as a web server firing off a new green thread or a goroutine for an incoming request, or as contrived as doing so manually within a function scope.
In practice, there really is not much difference with async/await. "Green threads" is a combination of implementation details and a subset of what async/await abstractions achieve.
Effectively, Goroutines are in many ways similar to C# Task<T>s. The difference is that in Go you are expected to explicitly send the result via a channel or some other data structure and then synchronize the completion of the execution, where-as with tasks you simply await that.
There could be an argument made about preference of implicit suspend (Go, Java, BEAM family) over explicit suspend (C#/F#, Rust, JS, Python, C++ co_await, Swift), but for practical purposes invoking a function with 'go' keyword in Golang is very similar to firing off a synchronous method with Task.Run in C#, or calling an asynchronous method (with sufficiently short body before first yield) and not immediately awaiting it.
As I usually post it on HN, tasks make the following patterns trivial:
using var http = new HttpClient {
BaseAddress = new("https://news.ycombinator.com/")
};
// not immediately awaited requests are executed in parallel
var frontPage = http.GetStringAsync("news?p=1");
var secondPage = http.GetStringAsync("news?p=2");
Console.WriteLine($"{await frontPage}\n\n{await secondPage}");
> The difference is that in Go you are expected to explicitly send the result via a channel or some other data structure and then synchronize the completion of the execution, where-as with tasks you simply await that.
That may be be the case in Go but it's not an inherent property of green threads. See, for example, Gleam Tasks [0] which are based on green threads and provide the syntatic convenience of being able to await the result rather than receiving a message:
let task = task.async(fn() { do_some_work() })
let value = do_some_other_work()
value + task.await(task, 100)
They do so without the disadvantage of bifurcating the code base into sync and async functions.
The discussion regarding Goroutines is to highlight that, despite prevalent claims of otherwise, they are not doing something unique and for developers who are used to languages with powerful concurrency primitives look like an incomplete task abstraction. "Green Threads" really is an implementation detail, in many ways orthogonal to pros/cons of implicit and explicit suspend points.
I hope your opinion about C#'s task system has improved since the last time[0], given what Gleam (and, in many ways, Elixir) does looks practically identical :)
>I hope your opinion about C#'s task system has improved since the last time[0], given what Gleam (and, in many ways, Elixir) does looks practically identical :)
Well no, not really I'm afraid. My reservation has always been with the codebase birfircation per my previous post. Gleam/BEAM languages, Go, and now Java, don't have async and sync functions. They have one kind of function which can be called either synchronously or asynchronously. The difference is in who decides: the function caller or function implementer. That a sync function can't call an async one amplifies the problem.
I know you dislike the "coloured function" metaphor [0] but for me it's a significant issue. I look at lots of C# and Python code, and see libraries now encumbered with both sync and async function variants (e.g. [1] [2]). That, to me, is a significant downside to async/await as implemented in those languages.
The Gleam example has all the convenience and readability of its C#/Python counterpart - but without the downsides.
Does anyone actually do anything other than immediately await the async thing? But all callers need to wrap everything in Task<> and awaits and async and whatnot...
If you want to do some parallel processing on a collection I'm sure we could find a way to do that instead of adding all the clutter we have now
> The Gleam example has all the convenience and readability of its C#/Python counterpart - but without the downsides.
This was mentioned in the write-up, but the big downside is interop. Green threads have significant downside when going across OS threads.
This is the same reason why Rust ended up with async. Async is basically the cost you pay for C interop. However, C# runtime-async will likely be much simpler than Rust async since ownership is GC-managed and doesn't need to be transferred across threads.
All that said, I'm also not convinced the codebase bifurcation is a bad thing. Async ~= I/O. As a regular C# user, I'm not particularly unhappy about splitting my app into "I/O things" and "not I/O" things.
Assuming it is unlikely Gleam does something that makes it outperform Erlang, which it compiles to. Now, it comes down to just how BEAM VM is, but nonetheless an argument - C# is a close to the metal programming language where your choices directly affect what happens under the hood, including the use of async/await. For some it may be an undesirable trait but it's precisely what makes it so fast, in domains where languages that are "more abstracted away" used to historically struggle and continue to have lower performance ceiling.
As usual, the "coloring" point misses the patterns that async/await enables, and that it is in many ways an "I/O Monad". Still, mixing faux-concept "differently colored threads" in .NET does not come with the same degree of pain it does in Rust or elsewhere (and there are good reasons for that).
You can block threads if you have to (which includes synchronously waiting for tasks), and Threadpool is desiged to deal with that appropriately and increase/decrease worker threads count to maintain optimal throughput. You just don't have to pay for it always, and as multitude of alternate implementations suggest there is no free lunch, as usual.
Also, sync vs non-async overloads, as I previously discussed, often do I/O in a completely different way. Other languages sell it with loud buzzwords like "NIO", while .NET keeps it boring - the workload will scale without your explicit effort, never throttling independent execution flows.
I continue to be convinced of the sheer degree of harm done by "that one article", and applying C# in practice cures this perception once you stop worrying and love the easy concurrency and parallelism that come with it.
you've made the performance argument before and I don't refute that C# shows up better in benchmarks than BEAM languages. Or many others for that matter (including Python).
The reductive argument there is that if performance is the sole priority then write machine code. That's extreme. A more robust one is that, according to the same benchmarks you reference, Rust is meaningfully faster than C# and C faster still. So if performance is the overriding objective then use one of those.
You'll justifiably push back on that and raise other factors in favour of .Net. And that's the point: it's about trade-offs and preferences.
For the apps I've built and been involved with, real world performance has been within commercial tolerance using languages that, at least according to benchmarks, are slower than the top performers. In teams of moderate size and above, managing codebase size and evolution is usually a bigger challenge. Requiring sync and async variants of functions detracts from that: not to mention the overhead of ensuring some level of consistency in when to use each form.
> I continue to be convinced of the sheer degree of harm done by "that one article"
We'll just have to disagree agreeably on that one. I see the coloured function metaphor as an elegant articulation of an important limitation, and it has served the community well in describing the problem.
> applying C# in practice cures this perception once you stop worrying and love the easy concurrency and parallelism that come with it.
Another disagree agreeably. In isolation yes, but with a non-trivial cost in bifurcation and the need for async/sync variants.
The claim was not that the performance is absolute. Instead, I'm poking at the assertion that Gleam's implementation does not have downsides, which is rather silly in the context of our discussion, is it not? (also async in C# and in Python are very different)
Not to mention, in the original reply this was raised as a discussion of implicit vs explicit suspend points as means to asynchrony and M:N threading and their trade-offs. Instead, you felt like reframing this as purely Language A vs Language B. I too am guilty of this, but I try to do better. In either case it tends to be less productive and derails the discussion, and is just not very nice.
On the "bifurcation of sync/async in .NET" question which seems to be what the argument in its confusion revolves around, I have written a long-form post and extracted it into a gist to avoid polluting the discussion: https://gist.github.com/neon-sunset/640a38f9f2af73ad888cb5b0...
Still, this subject deserves a better, proper, much more information-dense overview, ideally accessible to people unfamiliar with details of either async/await and tasks/futures or implicit suspend, how they relate to implementation strategies available to each of them, etc.
Unfortunately, I only have so much time and can spend so much effort on this, nor am sure whether there's value in that - I'm getting an impression that these replies come from a place focused on just seeking to confirm their point of view and signing boring praises to how Erlang and its derivatives is the one and only approach rather than understanding what drives different design decisions for achieving concurrency/parallelism across programming languages.
Also a big reason why C++ co-routines are the way they are, is that they were originally modeled in C# async/await as per Microsoft design in C++/CX before submission to the WG21 process.
With the big difference that all those magic classes, that also exist in a similar form in .NET, do have support in Windows Concurrency Runtime, and later C++/WinRT.
However since WG21 left the runtime part as exercise for the reader, we have the current mess of C++ co-routine talks at each conference, and even so, not everyone gets them.
I don't agree with OP about I/O-bound ops, I think if you're looking to green threading, you've taken a wrong approach.
> [0] the Task.Runmethod offloads the provided action to the thread pool, and the await keyword yields control back to the caller until the task completes.
All async code must be in an async call stack, virtual threads are 100% transparent because its the runtime scheduling them so you get a but more control than relying on the yeild of dotnet at least as I see it.
Again I don't see the huge demand for it personally, but I barely touch dotnet too often so take this with a grain of salt.
> I don't agree with OP about I/O-bound ops, I think if you're looking to green threading, you've taken a wrong
It depends in the implementation. In Go for example, all I/O is async and suspend your green thread, replacing it with another runnable green thread.
This works the same as if you managed an event loop on your own for the purpose of I/O, which is the best way to handle I/O outside for regular user space code. It’s just automatic with your code resembling a simple, blocking scenario.
OPs note on threading would be C# or runtime specific - green threads have no problem with parallelism, with runtimes commonly having a thread per core (or more) and having them all run green threads in parallel.
They will never be transparently/fundamentalally managed by the OS alone. The runtime will need to determine how to juggle green threads across multiple OS threads. In that way, this mapping is not inherent.
It can be designed around but that itself is a runtime design decision and I would not say it's akin to default vs custom.
With respect, it's not particularly relevant how you use "inherent". It's a standard usage. Rather than asking the whole rest of the world to change, you should probably learn the definition.
“Inherently” means “intrinsically”, meaning it’s a characteristic that can’t be changed without changing the nature of the thing. It doesn’t mean “by default”.
Presumably, it just means there needs to be explicit forking of the green thread for cpu bound operations, otherwise everything will run synchronously (because there’s no point where the green thread is paused to wait for an IO IRQ).
That is unless your compiler or JIT injects occasional yields into your synchronous code!
The efficiency and complexity of user mode threads heavily depend on constraints imposed by the particular language. E.g. if the language supports pointers into the stack, user mode threads would be less efficient; if the language is largely dependent on manual memory management -- user mode threads would be more expensive; if the language already has some other concurrency primitives (like async/await) -- user mode threads will be more expensive (although in this case in terms of complexity rather than runtime efficiency). Because Java exposes relatively little of its implementation details, we've been able to implement efficient user mode threads even without any FFI overhead.
The cost for exposing very little tends to be that marshaling costs more due to the requirement that values be copied between domains rather than shared.
Calling a C function in a shared library (dll, so) from Java using the new FFM API has the same overhead as calling such a function from C++ (although the overhead is higher if the called function upcalls into Java again, though that is relatively rare, or if the function blocks, only that makes the additional overhead negligible). But the FFM API does not directly expose Java objects to native code at all, although it does allow Java code to access and mutate "off-heap" native memory (C data) from Java code as efficiently as accessing and mutating Java heap memory. So if your goal is to expose Java objects to native code, then yes, that would require marshalling (although ideally you should do the opposite and expose native memory to Java code as trhough a Java interface, which would have no overhead).
However, relying on FFI in Java is far less common than in Python, Rust, or even C# or Go, and in the rare cases it's done it's easy to do it cheaply as I described. So I guess it's true to say that if you wanted FFI to work in the same manner it is employed in those other languages then yes, it would be more expensive as it would require marshalling, but that's just not the case in Java given the combination of Java's performance and size of its ecosystem of libraries.
Languages with worse performance or with smaller ecosystems do need to rely much more heavily on FFI and so they often choose to sacrifice the flexibility of their implementation in favour of a more direct flavour of FFI.
I agree with your general point, that it depends on your specific problem how difficult this is, but I disagree about how common or easy to work around.
Regarding
> But the FFM API does not directly expose Java objects to native code at all, although it does allow Java code to access and mutate "off-heap" native memory (C data) from Java code as efficiently as accessing and mutating Java heap memory
I just don’t buy it. First, I think it’s very common to want to expose managed memory to native. In fact, it might be the dominant case. If I want to call out to perform a crypto operation on a block of bytes I got from a Java operation, I don’t want to copy them first.
Second, I think you’re missing the use case for manipulating system APIs. If you want to perform some system call and the call requires setting up some structures as arguments, that’s going to be pretty expensive in Java. For things that are called a lot it can add up. For example, windows has a profiling and eventing system called ETW. To use it you create a set of events and call the system. It’s not uncommon to do this for thousands or millions of events per second. The way C# handles this is stack allocating an event blob and calling directly. I can’t imagine a Java workaround that would be as fast or simple. It seems like you’d have to pool a native event blob allocation and fill it in from Java.
It’s true that most Java programmers aren’t blocked by this but I think that’s because many Java programmers don’t try to use Java for these tasks. They don’t write systems software in Java and they don’t embed into big, performance-sensitive native apps, like games.
> First, I think it’s very common to want to expose managed memory to native. In fact, it might be the dominant case. If I want to call out to perform a crypto operation on a block of bytes I got from a Java operation, I don’t want to copy them first.
Doing it this way is not so common in Java anyway. First, primitive operations for crypto are intrinsics in Java and operate without FFI at all. Second, IO input and output buffers in high-performance applications are typically in off-heap buffers anyway (i.e. you serialize data to an off-heap buffer and then do crypto and then send it over the wire, or you receive data in an off-heap buffer, do crypto, and then deserialize).
> Second, I think you’re missing the use case for manipulating system APIs. If you want to perform some system call and the call requires setting up some structures as arguments, that’s going to be pretty expensive in Java.
It's not, because FFM allows you to manipulate native structs with no overhead. You do this efficient kind of stack allocation of native structures with FFM's Arenas and SegmentAllocator (https://docs.oracle.com/en/java/javase/22/docs/api/java.base...)
> They don’t write systems software in Java and they don’t embed into big, performance-sensitive native apps, like games.
It's true low-level programs are typically not written in Java, but the applications programming market is bigger. I wouldn't be at all surprised if applications written in Java alone comprise a bigger market than all intrinsically low-level applications combined. As for embedding in another application, there is no intrinsic reason not to do it in Java, but 1. traditionally and for "environmental" reasons Java hasn't been huge in the games space (except for Minecraft, of course) and 2. it's been less than six months since FFM became a permanent feature in the JDK; JNI, the FFI mechanism that preceded FFM was really quite cumbersome to use so it's not surprising people opted for more convenient FFI.
> First, primitive operations for crypto are intrinsics in Java and operate without FFI at all.
This is a pretty strange assertion given that I didn’t specify the crypto operation I wanted to perform. Is XAES-256-GCM available in the Java standard library?
> Doing it this way is not so common in Java anyway
Sure, because doing it the other way would be very expensive. But that doesn’t mean applications which can’t front or backload native processing don’t exist, it just means they will have slower throughput in Java.
It’s fine for a language to make that tradeoff, but it is a tradeoff
> Is XAES-256-GCM available in the Java standard library?
No (is it in any language's standard library?) but everything you need to implement it in Java is available.
> But that doesn’t mean applications which can’t front or backload native processing don’t exist, it just means they will have slower throughput in Java.
They won't, because working with native memory is just as efficient as working with heap memory. You store your bytes in a MemorySegment and you don't care if it's backed by an on- or off-heap buffer. I guess you could say, oh, but when working with FFI in Java you may need to keep some buffers off-heap if you don't want to copy bytes, but that's common practice in Java since JDK 1.4 (2002).
> It’s fine for a language to make that tradeoff, but it is a tradeoff
There is a tradeoff, but it's not on performance. Rather than expose Java heap objects directly to native code (which is possible with the old JNI, but not the recommended approach), Java says keep the bytes that you want to efficiently pass to native code off-heap and makes it easy to do (through the same interface for on- and off-heap data).
Rather than constrain the implementation, which could have performance implications always, Java gives you the choice to have no FFI overhead at the cost of a tiny bit of convenience when doing FFI. Given how rare FFI is in Java compared to many other languages, that is obviously the right design decision and it helps performance rather than harms it. So there is a tradeoff, but you're clearly trading away less than you would have if FFI were more common and the core implementation were impacted by it.
Ultimately, the question of "is it better to sacrifice language performance and flexibility in exchange for doing X (without significant performance overhead) in 3 lines instead of 30" depends entirely on the answer to the question how often users of the language need to do X. If the language is Java and X is FFI, the answer is "rarely" and so you're paying a small cost for a large gain. The tradeoff between the convenience of low/no-overhead FFI and language performance and flexibility becomes much more difficult and impactful in languages where FFI is more common.
I'm not sure that the original description is precisely correct, but yours isn't correct either.
Basically, you can't treat green threads just like "a multi-threaded runtime" and have it just work. That is, a 1:1 mapping between green threads and OS threads is just OS threads.
So fundamentally if you bounce your green stacks off of the actual stack they're going to need to go somewhere... and that place must be the heap.
There are pluses and minuses to this implementation, but the biggest minus is that it makes FFI very complicated. C# has an extremely rich native-interop history (having historically been used to integrate closely with Windows C++ applications) and therefore this approach raised some serious challenges.
In some sense, async is the cost for clean interop with the C/system ABI. Transition across OS threads requires something like async.
I meant that you can have a multi-threaded runtime that will be executing your green threads in a multi-threaded fashion. Like in Go you have (by default) as many worker OS threads as CPUs, and the Go runtime will take care of scheduling your green threads on those worker OS threads (+ create threads as needed for blocking syscalls if I remember correctly, but that's getting way to deep into the details). And this will, in fact, "just work" from the user's perspective.
And yes, as both you said, and I said at the end of my previous comment, the main hurdle of green threads imo is FFI, but it's not what the article mentions, which is what surprised me.
Ah, I see. You were saying that green threads can usually be scheduled on multiple os threads and take advantage of parallelism. Yup, I agree. Apologies for the confusion.
> Green threads are different. The memory of a green thread is allocated on the heap. But all of this comes with a cost: As they aren't managed by the OS, they can't take advantage of multiple cores inherently. But for I/O-bound operations, they are a good fit.
this is clearly not true? Am I missing some nuance here, as I'm sure the author knows what they're talking about?
Green threads can totally use a multi-threaded runtime, like e.g. Go does, and it works just fine. The main hurdle with them is arguably FFI.