I wish they still had the voice mode that was _only_ text-to-speech, and speech-to-text. It didn't sound as good, but it was as smart as the underlying model. The advanced voice mode regularly goes off the rails for me, makes the same mistake repeatedly, and other things that the text-version of advanced LLMs hasn't done for months now.
It comes with LLaMA by default, but you can point it to any compatible endpoint, including self-hosted (I personally use Gemini Flash via AI Studio API key because it is free).
I echo your comments about advanced voice mode. It’s like a completely different, less “intelligent” model than the text mode ones. It’s like it has an incredibly short context window or something and really does a lousy job following your prompt.
As with all things LLM… everybody’s experience will be different. I’m sure there are plenty of people who manage to make it work.