Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am running 70B models on M2 Max with 96 GB of RAM and it works very well. As HW evolves, it will become a standard


Out of curiosity, what degree of quantization are you applying to these 70B models?


Q4_K_S. While not as good as top commercial models like chatgpt, they are still quite capable and I like that there are also uncensored/abliterated models like Dolphin.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: