In his article word2vec Parameter Learning Explained Xin Rong says (page 7):
Each output is computed using the same hidden->output matrix: $$ p(w_{c,j} = w_{O.c}|w_I)=y_{c,j}=\frac{exp(u_{c,j})}{\sum_{j'=1}^{V}exp(u_{j'})} \ \ \ \ (25) $$
Looking into the word2vec source code I don’t see any “panels” or “output layers”. With this most of the terms in the equation above in totally unclear for me. Could you please help me realizing how this mathematical description intended to work (I understand the source code, but mathematical description is another question here).
Am I missing something here? With the description above from the article, since “the output layer panels share the same weights”, how the results on output panels (even if they existed could be different)?