I am implementing my HMM-GMM speech recognition model.
Right now I am facing a problem described below.
Given phone-level HMMs A and B, build word-level HMM C. In this questions lets assume that according to lexicon file I need to make C from A and B where A is followed by B. Is it a common practice?
States of HMM A: a1, a2, a3
States of HMM B: b1, b2, b3
Let transition matrices for A and B be as follows:

As far as I understand C has states merged from A and B.
So states for HMM C: a1, a2, a3, b1, b2, b3.
But what about transition matrix?
But this doesnt seem like a legit solution.
What is the algorithm of concatination of such matrices? Or perhaps I am missing something. Link to a good article is highly appreciated.
