By the looks of it, it will take a couple more follow up PRs to clean things up a bit and get the most performance from MTP. I hope that by that point it will be easier to add more spec decoding types.
In the meantime I've benchmarked Orthrus some more and got some quite promising results. So I'd be glad if my prediction that it may take some time until it lands in llama.cpp turns out to be wrong.
In the meantime I've benchmarked Orthrus some more and got some quite promising results. So I'd be glad if my prediction that it may take some time until it lands in llama.cpp turns out to be wrong.