Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Has anyone successfully run this on a Mac? The installation instructions appear to assume an NVIDIA GPU (CUDA, FlashAttention), and I’m not sure whether it works with PyTorch’s Metal/MPS backend.




FWIW you can run the demo without FlashAttention using --no-flash-attn command-line parameter, I do that since I'm on Windows and haven't gotten FlashAttention2 to work.

It seems to depend on FlashAttention, so the short answer is no. Hopefully someone does the work of porting the inference code over!


Thanks! Simon's example uses the custom voice model (creating a voice from instructions). But that comment led me eventually to this page, which shows how to use mlx-audio for custom voices:

https://huggingface.co/mlx-community/Qwen3-TTS-12Hz-0.6B-Bas...

  uv tool install --force git+https://github.com/Blaizzy/mlx-audio.git --prerelease=allow
    
  python -m mlx_audio.tts.generate --model mlx-community/Qwen3-TTS-12Hz-0.6B-Base-bf16 --text "Hello, this is a test." --ref_audio path_to_audio.wav --ref_text "Transcript of the reference audio." --play

I recommend using modal for renting the metal.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: