I love the idea and I would like to build something like this. But the few attem...

codethief · 2025-05-13T02:35:16 1747103716

> Maybe I've just had a bad microphone.

Yeah, I would definitely double-check your setup. At work we use Whisper to live-transcribe-and-translate all-hands meetings and it works exceptionally well.

s3p · 2025-05-13T03:03:54 1747105434

+1 this. Whisper works insanely well. I've been using the medium model as it has yet to mis transcribe anything noticeable, and it's very lightweight. I even converted it to a coreML model so it runs accelerated on apple silicon. It doesn't run *that* much faster than before.. but it ran really fast to begin with. For anyone tinkering, ive had much success with whisper.cpp.

azinman2 · 2025-05-13T06:25:57 1747117557

What was the process of converting it like? I assume you then had to write all of the inference code as well?

tough · 2025-05-13T11:45:00 1747136700

not the gp but found this https://github.com/ggml-org/whisper.cpp/blob/master/models/c...

Grimblewald · 2025-05-14T00:48:17 1747183697

I'd agree with your experience. I simply sit my phone (~200 dollar motorola, cheap phone) in centre of room, split voice file into chunks using voice prints/ID's I get from a voice embedding model I trained, then feed labelled chunks through whisper, and get a nice transcript of everything said. I combine that with my handwritten notes (as image, get a VLM to transcribe) and the agenda, and I get out really nice meeting minutes as a LaTex document. Works a charm and has turned an hour or two of work per meeting into maybe 30 minutes (proofing what was written).

wkat4242 · 2025-05-13T10:26:04 1747131964

Which model do you use? I use large usually, on a GPU. It's fast and works really well. Be aware though that it can only recognise one language at a time. It will autodetect if you don't specify one.

Of course the smaller models don't work nearly as well and they are often restricted to English. Large works great for me though it does require GPU hardware to be responsive enough, even with faster-whisper or insanely-fast-whisper.