On a 1080Ti (6 year old GPU) I found Whisper large models to take around as much...

inciampati · on March 30, 2023

That's very similar to CPU-based performance with modern CPUs and parallelization! Frankly, with whisper.cpp it tends to be a little faster than the length of the audio for the "small" model, and much faster for "base" and "tiny".

pantalaimon · on March 30, 2023

Doesn't even have to be that modern, my Ivy Bridge CPU already achieves faster than realtime performance - which makes me wonder if there is maybe some upstart cost for the GPU based solution and it would outperform the CPU only with longer clips.

selfhoster11 · on March 30, 2023

Try quantised models. They perform reasonably well, although you probably want to run some benchmarks if you really want to get it done properly.