(Disclaimer: I don't design ISAs) This goes over a decent amount of nuance that ...

zozbot234 · on Jan 18, 2024

Instruction fusion is a perfectly cromulent approach. These days RISC-V extensions are even written with instruction fusion in mind, such as the recently proposed Zicond - which just adds a couple "conditionally move value or zero" three-register insns. It turns out that this is enough to support lots of "conditional insn" patterns that other ISA's have to encode explicitly.

saagarjha · on Jan 18, 2024

Look I didn't want to name names but if you willingly volunteer to serve as an example of what I was talking about I am more than happy to let you do that.

Joker_vD · on Jan 18, 2024

...RISC-V spec literally says things like "we define the canonical sequence to be MULH/MUL, in this order. Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies".

saagarjha · on Jan 19, 2024

I'm not going to respond to your specific claim directly, because I just said I wouldn't do that above. If you really want to hear my opinions bring this up in some other thread, or just wait for someone else to make the argument ;) But I would like to ask you if what you're saying is validated by actual silicon, and if so, under which constraints. Does "microarchitectures can then fuse these" actually pan out? Do the (size, mainly?) savings actually help in the contexts it is claimed it targets (embedded?). How do other contexts (server, desktop) feel about this? Is it useful for them? Perhaps it is actively harmful for what they want to do?

brucehoult · on Jan 18, 2024

Not even necessarily fuse in that case, just cache the operands and result in internal registers and don't run the multiply again if the operands are the same. Same for DIV/REM.

mbitsnbites · on Jan 19, 2024

In that case you don't need a canonical order. The instructions do not even have to be next to each other.

o11c · on Jan 18, 2024

Designing for fusion is valid, but RISC-V has a lot of cases that boil down to "use a 12-byte fused instruction where other architectures do it in 4 bytes".

L1i matters, people!

_chris_ · on Jan 18, 2024

> L1i matters, people!

RISC-V consistently wins on L1i footprint.

The complaining is about number of dynamic instructions ("path length"), which can hit you if you don't fuse. Of course, path length might not actually be the bottleneck to raw performance, but it's an easy metric to argue, so a lot of people latch on to it.

snvzz · on Jan 19, 2024

>The complaining is about number of dynamic instructions ("path length"), which can hit you if you don't fuse.

Ironically, RISC-V does great there[0]. Note this is despite these researchers did not even consider fusion.

0. https://dl.acm.org/doi/pdf/10.1145/3624062.3624233

dzaima · on Jan 20, 2024

Dunno about "great" - "For 6 out of 10 mini-app+compiler pairs, Arm has a shorter path length, with the overall average difference when weighting each benchmark equally being 2.3% longer for RISC-V."

snvzz · on Jan 20, 2024

While applying the worst possible reading to RISC-V, and despite not considering fusion, it is not worse than ARM.

That's awesome.

dzaima · on Jan 20, 2024

Isn't shorter path length the goal here? And ARM is better by both those metrics. Am I misunderstanding something?

ARM of course would also benefit from fusion too; but camel-cdr's mention of it being only rv64g is a pretty significant caveat.

snvzz · on Jan 21, 2024

Yes, shorter path is the goal.

No, winning 4 and losing 6, by a small margin, isn't "being worse than arm". The paper's authors even explicitly conclude it is not losing to ARM.

This is even ignoring whether code is within or outside loops, counting fuseable instructions as always non-fused, and not considering any instructions from extensions after 2019's ratified (actually unchanged from 2017) rv64g... any of those would have a favorable effect on RISC-V.

This is an excellent result for RISC-V, that clears any doubts in terms of path length. On top of what we already know about RISC-V leading in code density in 64bit.

dzaima · on Jan 21, 2024

Might not be "worse" (I'd definitely agree that the difference is plenty small enough to be considered equal within error bounds), but is certainly not something worthy of RISC-V being noted as doing "great" either.

Excluding extensions is perhaps a significant question, but, for example, Debian RISC-V currently targets rv64gc, which should have the same instruction counts as rv64g does, so software compiled for Debian can't use the later extensions for most code anyway. (never mind that ARMv8 also has excluded extensions, namely NEON, which is always present on ARMv8 and is not designed to be ignored)

And, of course, even being better than ARM is not equivalent to being the best it could be; ARMv8 isn't some attempt at a magical optimal instruction set, it's designed for whatever ARM needed, and that includes being able to efficiently share hardware with ARMv7 for backwards compatibility.

snvzz · on Jan 24, 2024

If RISC-V is not worse (it is not) and yet it is much simpler (it is), that is a huge win.

Simplicity has enormous value.

camel-cdr · on Jan 20, 2024

it's also targeting just rv64g

snvzz · on Jan 20, 2024

Right. Bitmanip would also, on its own, reduce instruction count considerably.

brucehoult · on Jan 18, 2024

Also the difference in number of instructions on real programs is in the 10% range, which could well be compensated by other factors. For example, keeping to simpler instructions might well result in a 10% higher clock speed and lower silicon area too, equalising matters if not gaining an advantage.