Nice. And it satisfies my curiosity about whether trading firms are switching to...

bitcharmer · on April 22, 2022

In HFT single-threaded performance is king so that's why we're all still on Intel. AMD is making progress but not just quite there yet.

kolbe · on April 22, 2022

Huh. My experience has been that AMD wins that unless your application is so small that it can fit into Intel's smaller cache. And the new 3D architecture from AMD I thought would make your developers drool, allowing them to actually inline everything instead of being scared of building apps that are too big to fit into cache

bitcharmer · on April 22, 2022

Not my experience at all and I work across different teams who own different latency sensitive apps. Most of them have unhygienically huge working sets.

cgaebel · on April 22, 2022

To be clear: bitcharmer says "we" to mean "fellow HFTs" not "Jane Street".

bitcharmer · on April 22, 2022

Yes. Thanks, should have made that more explicit.

isogon · on April 22, 2022

For low-latency strategies, AMD's lack of DDIO [0] makes it a non-starter. The memory latency is a big gap to close.

[0] https://www.intel.com/content/www/us/en/io/data-direct-i-o-t...

kolbe · on April 22, 2022

Do you know this for a fact? I've done some work in the industry where I needed to make fast software, but never the like sub-microsecond tick-to-trade type fast, so I really don't know.

There was a great presentation from 2017 about some of Optiver's low latency techniques[1]. I had assumed they released it because the had obviated all of them by switching to FPGAs, but I don't know. Either way, he suggested that if you ever needed to ping main memory for anything, you already lost. So, I wouldn't have thought DDIO plays into their thinking much.

[1] https://www.youtube.com/watch?v=NH1Tta7purM

isogon · on April 22, 2022

The idea is precisely that you want to avoid pinging main memory at all, which is possible (in the happy case) if you do things correctly with DDIO. Not everything is done in hardware where I am. I am wary of saying much because my employer frowns on it, and admittedly I work on the software more than the hardware, but DDIO is certainly important to us.

b20000 · on April 22, 2022

how do you access this DDIO feature if you are writing a C or C++ application? intrinsics?

tbr1 · on April 22, 2022

DDIO operates mostly transparently to software, with the I/O controller feeding DMAs into a slice of L3. Hardware can opt out by setting PCIe TLP header hints, and you have some system-wide configurability via MSRs, but it's not something a userspace application can take into its own hands.

b20000 · on April 22, 2022

so is this taken advantage of by the OnLoad drivers of solarflare cards, for example?

isogon · on April 24, 2022

Noticed this just now. It is.

medawsonjr · on April 25, 2022

It's configurable via MSR. You can also disable it system-wide or on a PCIe port basis. I detailed it all here:

https://www.jabperf.com/skip-the-line-with-intel-ddio/

jeffbee · on April 22, 2022

I don’t know that this definitively answers that question. It’s possible to use a different architecture based on cost/performance and keep a small population of Intel machines in service because you want access to their superior PMUs. Most of what you learn on the latter would still apply to the former.