ESpeak and MBROLA ?

MBROLA is a speech synthesizer based on the concatenation of diphones. It takes a list of phonemes as input, together with prosodic information (duration of phonemes and a piecewise linear description of pitch), and produces speech samples on 16 bits (linear), at the sampling frequency of the diphone database.

Today I was playing around with MBROLA and was impressed by the quality of the speech. ESpeak provides an integration into the MBROLA speech synthesizer, so I was considering to integrate this implementation in my library as well.

But then I realised that the corresponding voices need at least 5 to 20 MB of memory. Unfortunately this is much more the we have available on any Microcontroller!

Bad luck…

Published by pschatzmann on 19. November 202219. November 2022

0 Comments

Leave a Reply Cancel reply

AudioTools: Filter Performance of the RP2350

arduino-audio-tools: Mixing Effects with the Input Signal

arduino-snapclient with Rasperry Pico 2 W