0

I was trying to make a multimodal architecture for SER. For that I required to extract features from Emotion2Vec model for audio features. Reading the paper and going through the github codebase I was not able to exactly identify it. Hence I require some help as I'm quite new to this.

Here is the link to the paper:1

Here is the codebase link:2

0 Answers0