Highest Voted 'speech-synthesis' Questions - Artificial Intelligence Stack Exchange

4

votes

3 answers

Open-source vocal cloning (speech-to-speech neural style transfer)

I want to program and train a voice cloner, in part to learn about this area of AI, and in part to use as a prototype of audio for testing and getting feedback from early adopters before recording in a studio with voice actors. For the prototype, I…

asked Mar 02 '23 at 16:52

ginjaemocoes

37
1
9

3

votes

0 answers

Can computers recognise "grouping" from voice tonality?

In human communication, tonality or tonal language play many complex information, including emotions and motives. But excluding such complex aspects, tonality serves some a very basic purpose of "grouping" or "taking common" functions such as: The…

natural-language-processing voice-recognition speech-synthesis

asked Jul 16 '19 at 17:23

user27217

3

votes

2 answers

What is the difference between automatic transcription and automatic speech recognition?

What is the difference between automatic transcription and automatic speech recognition? Are they the same? Is my following interpretation correct? Automatic transcription: it converts the speech to text by looking at the whole spoken input…

natural-language-processing comparison speech-synthesis

asked Feb 11 '19 at 10:38

Murugesh

141
2

2

votes

1 answer

How to measure the similarity the pronunciation of two words?

I would like to know how I could measure the pronunciation of two words. These two words are quite similar and differ only in one vowel. I know there is, e.g., the Hamming distance or the Levenshtein distance but they measure the "general"…

natural-language-processing natural-language-understanding speech-synthesis

asked Jul 07 '21 at 06:54

Ben

205
1
8

2

votes

0 answers

Model for direct audio-to-audio speech re-encoding

There are many resources available for text-to-audio (or vice versa) synthesis, for example Google's 'Wavenet'. These tools do not allow the finer degree of control that may be required regarding the degree of inflections / tonality retained in…

audio-processing model-request speech-synthesis

asked May 19 '21 at 10:09

NeverWasMyRealName

21
1

2

votes

0 answers

How do I train a multiple-speaker model (speech synthesis) based on Tacotron 2 and espnet?

I'm new to Speech Synthesis & Deep Learning. Recently, I got a task as described below: I have problem in training a multi-speaker model which should be created by Tacotron2. And I was told I can get some ideas from espnet, which is a end-to-end…

deep-learning recurrent-neural-networks audio-processing speech-recognition speech-synthesis

asked Feb 06 '20 at 04:13

Envelo Lee

21
1

2

votes

0 answers

What is the State-of-the-Art open source Voice Cloning tool right now?

I would like to clone a voice as precisely as possible. Lately, impressive models have been released that only need about 10 s of voice input (cf. https://github.com/CorentinJ/Real-Time-Voice-Cloning), but I would like to go beyond that and clone a…

natural-language-processing speech-synthesis

asked Sep 18 '19 at 09:04

Remind

21
1

1

vote

0 answers

How to achieve Voice Conversion Using Voice Samples of a Specific Person using any voice as input?

I'm working on a project involving voice conversion, aiming to transform a voice to sound like a specific person speaking Darija (a Moroccan Arabic dialect). I have collected a set of voice samples from the target person and prepared them in a…

generative-model audio-processing speech-synthesis

asked Aug 12 '24 at 19:12

anasse

11
1

1

vote

4 answers

What is the best Text-to-speech model available open-source?

I tried a couple of different websites and libraries. Also found this topic from 3.5 years ago - What are the current open source text-to-audio libraries? It looks like nobody published anything in the last couple of years and most solutions are…

search models speech-synthesis

asked Aug 30 '23 at 18:46

Yevhen Salitrynskyi

27
1
2

1

vote

0 answers

Is Speech to Speech with changing the voice to a given other voice possible?

Background: I am working on a research project to use (demonstrate) the possibilities of Machine Learning and AI in artistic projects. One thing we are exploring is demonstrating deep fakes on stage. Of course, a deep fake is not easy to make.…

natural-language-processing reference-request deepfakes speech-synthesis

asked Sep 18 '21 at 09:06

Nathan

143
4

1

vote

0 answers

How many spectrogram frames per input character does text-to-speech (TTS) system Tacotron-2 generate?

I've been reading on Tacotron-2, a text-to-speech system, that generates speech just-like humans (indistinguishable from humans) using the GitHub https://github.com/Rayhane-mamah/Tacotron-2. I'm very confused about a simple aspect of text-to-speech…

recurrent-neural-networks word-embedding attention text-classification speech-synthesis

asked May 14 '20 at 22:39

Joe Black

181
2
6

1

vote

0 answers

Can't figure out what's going wrong with my dataset construction for multivariate regression

TL;DR: I can't figure out why my neural network wont give me a sensible output. I assume it's something to do with how I'm presenting the input data to it but I have no idea how to fix it. Background: I am using matched pairs of speech samples to…

tensorflow python keras speech-synthesis

asked Dec 19 '19 at 11:56

NotQuiteHere

19
1

1

vote

0 answers

Improving the performance of a DNN model

I have been executing an open-source Text-to-speech system Ossian. It uses feed forward DNNs for it's acoustic modeling. The error graph I've got after running the acoustic model looks like this: Here are some relevant information: Size of Data: 7…

deep-learning speech-synthesis

asked Jul 17 '19 at 11:01

Arif Ahmad

111
1

0

votes

1 answer

Adding voices to voice synthesis corpuses

If one uses one of the open source implementations of the WaveNet generative speech synthesis design, such as https://r9y9.github.io/wavenet_vocoder/, and trains using something like the CMU's arctic corpus, now can one add a voice that sounds…

training generative-model speech-synthesis

asked Sep 16 '18 at 14:55

Douglas Daseeco

7,543
1
28
63

0

votes

0 answers

How to resize the time-frequency spectrum of 1D signal so that image classification model can be used?

In case of 1D signal generally they tend to be around 1000s of sample points for each trial especially for biomedical signal. The time-frequency spectrum then can have shape like [256,1000] or [50,1000] etc. However most popular image classification…

deep-learning convolutional-neural-networks image-processing speech-recognition speech-synthesis

asked Dec 13 '24 at 14:49

thinking_sapiens

1

Questions tagged [speech-synthesis]