I didn't use atomic ops in Java or C. It still works. The risk is that a thread ...

mattgreenrocks · on Oct 1, 2022

The data race on the volatile version of the C code is technically UB, so you are at the mercy of the architecture, optimization settings and phase of the moon.

This is a great answer for more detail: https://stackoverflow.com/a/60482370

samsquire · on Oct 1, 2022

Thanks Matt. I shall need to do more testing.

The right answer is probably use atomics but that has a performance cost.

I am still trying to find out if atomics suffer from contendedness.

My perfect scenario is the atomic increment should be as cheap as the non atomic increment.

But I think the bus pausing and cache coherence protocols mean that the data is flushed from the store buffer to the memory, which is slow. I don't know if it acts as a lock and has an uncontended option