Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On a 1080Ti (6 year old GPU) I found Whisper large models to take around as much time to transcribe as the length of the audio.


That's very similar to CPU-based performance with modern CPUs and parallelization! Frankly, with whisper.cpp it tends to be a little faster than the length of the audio for the "small" model, and much faster for "base" and "tiny".


Doesn't even have to be that modern, my Ivy Bridge CPU already achieves faster than realtime performance - which makes me wonder if there is maybe some upstart cost for the GPU based solution and it would outperform the CPU only with longer clips.


Try quantised models. They perform reasonably well, although you probably want to run some benchmarks if you really want to get it done properly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: