2

I am experimenting with Whisper AI sound-to-text (actually I am using the whisper.cpp clone right now) and am trying to optimize the performance but I have discovered something odd - although I use exactly the same setting every time except for the number of threads, the result isn't exactly the same. In other words, the result varies depending on the number of threads.

Is this expected? If so, why does it happen?

d-b
  • 121
  • 4

1 Answers1

2

Yes, you can see a comment from discussion here:

https://github.com/openai/whisper/discussions/81

This happens when the model is unsure about the output (according to the compression_ratio_threshold and logprob_threshold settings). The most common failure mode is that it falls into a repeat loop, where it likely triggers the compression_ratio_threshold. The default setting tries temperatures 0, 0.2, 0.4, 0.6, 0.8, 1.0 until it gives up, at which it is less likely to be in a repeat loop but is also less likely to be correct.

I've seen also non deterministic results on audio that was clear. Even more than that it generates non related outputs. Using the whisper.ai API you can not play with the parameters but if you are using the open source models you can fine tune it a little bit to perform better.

MikeL
  • 121
  • 3