Do you have a write up of the tech stack and setup? Or willing to give the gist here?
I'd like to make a private Qwen or similar for my kids to prompt with a button and voice control. It doesn't need vision... Although eventually that'd be very cool.
I also ran across an interesting robot toy demo today that had voice built in. it was whimsical and seemed like it was aimed towards primary education and kids. Someone here might know the name.
You can use Ollama or LM Studio, both in API mode, to return the responses. I believe they offer audio support, but I'm not entirely sure.
However, if you're looking for instruction following (like an agent), I've tried to implement my own agent and have lost faith. Even GPT-4.1 will regularly gaslight me that no, it definitely ran the tool call to add the event to my calendar, when it just didn't. I can't get any more adherence out of it.
I'd like to make a private Qwen or similar for my kids to prompt with a button and voice control. It doesn't need vision... Although eventually that'd be very cool.
Siri just sucks.
We might not be there yet...