Highest Voted 'audio-processing' Questions - Artificial Intelligence Stack Exchange

9

votes

1 answer

Is it possible to clean up an audio recording of a lecture using some type of AI system?

Is it possible to clean up an audio recording of a lecture from a smartphone (i.e. remove the background noise) using some type of AI system?

asked Dec 12 '18 at 15:15

Thibault Molleman

99
1
1
3

5

votes

1 answer

How can I find a specific word in an audio file?

I'm trying to train and use a neural network to detect a specific word in an audio file. The input of the neural network is an audio of 2-3 seconds duration, and the neural network must determine whether the input audio (the voice of a person)…

neural-networks machine-learning deep-learning python audio-processing

asked Aug 03 '20 at 09:28

Ali.kavari76

121
6

3

votes

1 answer

Can I filter barking sounds on the television?

My dog goes bonkers every time the sound of a barking dog is heard on a television program. I never noticed this before but literally every movie or show with an outdoors setting eventually includes the sound of a barking dog. Is it possible to…

audio-processing

asked Dec 21 '18 at 12:21

AlanD

31
2

3

votes

1 answer

What are the differences between RVC and SO-VITS-SVC models?

I'm trying to decide which one to use for my project but I can't find anywhere specific differences or comparisons of the models.

audio-processing

asked Jun 22 '23 at 13:16

Hiperfly

33
1
3

3

votes

2 answers

Can AI be used to reverse engineer a black box?

A while back I posted on the Reverse Engineering site about an audio DSP system whose designer had passed away and whose manufacturer no longer had source code (but the question was deleted). Basically, the audio filter settings are passed from a…

ai-design audio-processing signal-processing

asked Aug 12 '19 at 06:25

chmedly

131
2

2

votes

1 answer

Difficulty understanding Keras LSTM fitting data

I'm try to train a RNN with a chunk of audio data, where X and Y are two audio channels loaded into numpy arrays. The objective is to experiment with different NN designs to train them to transform single channel (mono) audio into a two channel…

keras long-short-term-memory audio-processing

asked Oct 12 '18 at 06:43

Dmitry

29
2

2

votes

0 answers

How to prepare audio data for deep learning?

Audio data is typically an array with the waveform represented by values from -1 to 1. There are two issues with that: if all values are inverted, e.g. -1 becomes 1 and 1 becomes -1, the audio doesn't change. But if for example I need to find…

data-preprocessing gradient audio-processing spectral-analysis

asked Feb 07 '23 at 14:10

Ford F150 Gaming

121
3

2

votes

0 answers

Understanding gumbel-softmax backpropagation in Wav2Vec papers

I'm studying the series of Wav2Vec papers, in particular, the vq-wav2vec and wav2vec 2.0, and have a problem understanding some details about the quantization procedure. The broader context is this: they use raw audio and first convert it to…

deep-learning papers audio-processing

asked Oct 19 '21 at 11:37

Peter Franek

384
1
4
14

2

votes

2 answers

Is it realistic to train a transformer-based model (e.g. GPT) in a self-supervised way directly on the Mel spectrogram?

In music information retrieval, one usually converts an audio signal into some kind "sequence of frequency-vectors", such as STFT or Mel-spectrogram. I'm wondering if it is a good idea to use the transformer architecture in a self-supervised manner…

transformer gpt audio-processing embeddings self-supervised-learning

asked May 24 '21 at 21:31

Peter Franek

384
1
4
14

2

votes

0 answers

Model for direct audio-to-audio speech re-encoding

There are many resources available for text-to-audio (or vice versa) synthesis, for example Google's 'Wavenet'. These tools do not allow the finer degree of control that may be required regarding the degree of inflections / tonality retained in…

audio-processing model-request speech-synthesis

asked May 19 '21 at 10:09

NeverWasMyRealName

21
1

2

votes

1 answer

I want to determine how similar a given song is to Queen's songs. Am I headed in the right direction?

I've asked this question before (@ Reddit) and people suggested CNNs on a mel spectrogram more than anything else. This is great. But I'm sort of stuck at: label some music data as "queen" and "not queen" and have this be the training set. Like,…

convolutional-neural-networks audio-processing

asked Apr 03 '21 at 05:24

Mike Johnson Jr

121
1

2

votes

1 answer

How to get more accuracy of the logistic regression model?

I am working on a Baby Crying Detection model using logistic regression. Out of $581$ audios, $222$ are of a baby crying. Each audio is of $5$ seconds. what I have done is convert each audio into numbers. and those numbers go into a .csv file. so…

regression audio-processing binary-classification logistic-regression

asked Mar 27 '21 at 17:54

Muhammad Waqar Anwar

121
1

2

votes

0 answers

Is there an AI that can complete Deezer Spleeter work?

I have used Deezer Spleeter but it produces echoes aside the stems, so I wonder if there is already an AI that remove echoes noises.

audio-processing

asked Feb 16 '20 at 15:56

Mohammed Mehdi TBER

21
2

2

votes

0 answers

How do I train a multiple-speaker model (speech synthesis) based on Tacotron 2 and espnet?

I'm new to Speech Synthesis & Deep Learning. Recently, I got a task as described below: I have problem in training a multi-speaker model which should be created by Tacotron2. And I was told I can get some ideas from espnet, which is a end-to-end…

deep-learning recurrent-neural-networks audio-processing speech-recognition speech-synthesis

asked Feb 06 '20 at 04:13

Envelo Lee

21
1

2

votes

0 answers

State of the art in voice recognition

In the media there's lot of talk about face recognition, mainly with respect to identifying faces (= assigning to persons). Less attention is paid to the recognition of facially expressed emotions but there's a lot of research done into this…

computer-vision emotional-intelligence facial-recognition voice-recognition audio-processing

asked Jan 22 '20 at 15:11

Hans-Peter Stricker

931
1
8
23

Questions tagged [audio-processing]