I don't think flops/$ is enough to really capture the difference here.
You couldn't replicate the scale of compute this allows no matter how many V100's you had.
A huge amount of cost here is embodied in networking and memory.
If one were to design a chip that cared only for flop/$ without caring for all of the interconnect and memory, then the 4090 is a much fairer comparison, and even then that card isn't designed for a flop/$ optimisation.
You couldn't replicate the scale of compute this allows no matter how many V100's you had.
A huge amount of cost here is embodied in networking and memory.
If one were to design a chip that cared only for flop/$ without caring for all of the interconnect and memory, then the 4090 is a much fairer comparison, and even then that card isn't designed for a flop/$ optimisation.