Quick way to extract a shopping list from audio

Question

I want to write code to extract a list of parts from an audio file. I'm not sure how to proceed, and I would like to hear if people have any suggestions.

The audio file contains a voice saying e.g. "We should get four 16'' wheels, each with a tyre, then 2 copper pipes 5' long and 2'' wide...", and the list should have something like

4 XWB16
4 XWB16T
2 PCPR52

The first step I'd use is getting a transcription of the audio file - I'd follow the ASR tutorial from hugging. Then, I could write a simple script that reads the list of words one by one, and every it finds something that matches one of the items stored in the config file, it adds an element to the list.

However, this approach would require coding by hand all of the potential variation per item, like "copper pipes" and "pipes made of copper", and also something to handle the case that I used in an example above ("4 wheels, and a tyre for each wheel").

Is there a way to just feed the part codes with each item's descriptions to a machine learning model, and receive the parts list as output?

Quick way to extract a shopping list from audio

0 Answers0