I tried using
Vosk's vosk-model-en-us-0.42-gigaspeech model for transcription. Last time I tried, I got nothing. This time I realized that I forgot to make sure the wav file was 16-bit, and I got stuff.
It certainly wasn't faster than Whisper. Forty minutes in, I killed it off because it still wasn't done. Then, I noticed the Node API has an
async method, and thought it could get a lot faster.
Unfortunately, it crashes right away if you try to transcribe more than one segment at a time. (If you do one segment at a time with the async method — which is pointless — it runs out of memory and gets killed.)
Issues like this make me realize that the Node bindings are slapped together:
I have been learning the hard way that the acceptWaveformAsync() method is a very dangerous beast, and calling free() in the middle of its processing is not the only issue with it. It cannot be simply used like typical Node.js single-threaded, asynchronous code style.
I don't blame Vosk; they were probably did not know what was involved in making Node bindings.
So, I could try one of their smaller models, but via their sync method, but I wouldn't be surprised if I still hit some instability.