In a speech recognition system, I want to train a GMM with Baum-Welsch algorithm. To do this, the GMM needs to be initialized with initial values.
How do I do this initialization?
Do I need to have audio recordings for a specific sound (phone) from which I will extract features that I will use to initialize the GMM, so that after this initialization I can continue training the GMM on words/sentences with Baum-Welsch algorithm?
Or is it possible to somehow set the initialization without audio recordings of a specific sound? If so, how?
What is the usual way to initialize and train GMMs in speech recognition systems?