enkiTS: A C and C++ Task Scheduler for creating parallel programs

exDM69 · on April 4, 2021

I hate to voice a negative opinion but there are multiple red flags in this project.

There are lots of user space spinlocks in the code, for the scheduler and the mcsp queue. Questionable use of atomics and memory barrier orderings. NIH futex using atomics and a WINAPI semaphore for backoff. No cpu_relax hints. Not great test coverage.

This all could be achieved using mutexes and condition variables and it would have equally good best case performance (when using a futex based mutex), and it would be easier to test for reliability and it would have more predictable performance characteristics under high contention.

To be fair, this seems to be targeted at games for Windows and Xbox and I'm not sure how good the concurrency primitives (mutex, condvar) are for those platforms. Maybe it makes sense to spin wait on a gaming console. Or maybe the typical intended use case never hits the high contention cases or it doesn't matter if a thread is pre-empted when holding a spin lock.

I am not claiming this code is buggy, but 15 minutes of reading the code and tests does not convince me that it is not.

I've written and tested a lot of this kind of code and it is not straightforward. Sometimes I hit a problem after running a stress test on all CPU cores for 10 minutes. The tests/examples in this project run 10, 100 or 1000 times in a loop. That is inadequate to hit the pathological thread scheduling to to trigger race conditions with a high probability.

flohofwoe · on April 4, 2021

On the other hand, the project is in continuous development since 2015, created by a (game-)industry veteran and used in a (or at least one) released game. IME such projects born out of real-world requirements are much more robust (and easier to integrate) than the typical 'academic' or (worse) 'Google-scale' framework.

exDM69 · on April 4, 2021

Yeah, under the kind of typical workload I imagine this goes through in a gamedev environment, I don't expect it to ever hit the contention cases.

But it could as well use mutex/condvar because it is unlikely to be contended.

If there is a bug in the atomics code, it will manifest as sporadic performance. They are not easy to debug.

DSingularity · on April 4, 2021

I agree. Absent contention those primitive are manipulated via user-space operations anyway. The condition variable even has some optimistic spinning in the NPTL implementation iirc. So basically, he is reinventing the wheel.

pjmlp · on April 4, 2021

From that point of view, without judging anything about enkiTS and playing devil's advocate, there are lots of software proving their value in production, without having one single line of code subjected to the all best practices that are discussed here or in conferences.

So we should put aside all those YAGNI, TDD, endless review processes, and just focus on delivering actual product value.

gpderetta · on April 4, 2021

I think that this days windows has keyed events which are the equivalent of a futex.

Const-me · on April 4, 2021

Windows includes reasonably good thread pools as a part of the OS, the API is SubmitThreadpoolWork.

macOS too, the API is dispatch_async.

Const-me · on April 4, 2021

For completeness, modern C and C++ compilers ship yet another thread pool, in OpenMP language extension. I sometimes write my own threading support for unusual use cases, but that’s hard to do correctly even with experience. For this reason, in 80% of cases I pick an off-the-shelf implementation instead of doing my own.

posharma · on April 4, 2021

1. How is this different from Intel TBB? 2. Does it have support for pipeline parallelism?

jcelerier · on April 4, 2021

Same questions for https://github.com/taskflow/taskflow

stephc_int13 · on April 4, 2021

I checked Intel TBB and I have not found a way to see the source code, the licence or even something to download and test without buying or downloading the demo of a commercial product.

So far I only had bad experiences with Intel libraries, huge, opaque, bloated and difficult or impossible to use on different architectures.

They usually have very high performance code under the hood so this is a shame.

jcelerier · on April 4, 2021

https://github.com/intel/tbb

marco_craveiro · on April 4, 2021

Whats the difference between "oneTBB" and regular TBB, do you know? Cheers

jcelerier · on April 4, 2021

oneTBB is the continuation of TBB. Intel took all their OSS projects and put them under the "OneAPI" umbrella last year: https://www.oneapi.com/open-source/

MKL, etc..

brandmeyer · on April 4, 2021

`apt-get install libtbb-dev libtbb-doc`

`apt-get source libtbb-dev`

eps · on April 4, 2021

This appears to be basically a callback scheduler with support for callback threading and sequencing. A fairly simple thing to code.

The catch here is that if you ever end up needing something like this, it's far more prudent to write your own version than to learn someone else's solution.