There are quite a few popular TTS libraries for Arduino but most of them suffer from the same problem: The TTS function is not properly separated from the output function.

Architechture

In a properly architected solution we would have a

  • A TTS function which produces platfrom independent PCM data
  • An output function or library to process the PCM data.

In Arduino we have the abstact Print class which is used by everything to which you can output data to and we have the abstract Stream class which inherits from Print and which provides the functioinality to read the data from it. Good examples for Streams are the File and the HardwareSerial class (for the Serial object).

So one good flexible and portable way to define the output of a TTS functionality is to provide it with an instance that supports Print so that it can be used to output the generated PCM samples. This supports e.g. the output to

  • Serial
  • Files
  • I2S (if your platform supports it)
  • and many more

My AudioTools library is the perfect fit for the second part. You can e.g. output the audio

  • with the help of PWM
  • to the internal DAC
  • to an external DAC
  • to a Bluetooth Speaker
  • to Serial has CSV or hex data
  • to the network using different protocols
  • and many more

The Talkie Library

I wanted to extend the quite popular Talkie TTS library to provide PCM data, so that we can send it e.g. to a Bluetooth speaker.

Unfortunately I was looking at a big mess of #ifdefs all over the place that were inpossible to untangle. So I decided to go back to the original Talkie library from going-digital: Here there was at least a chance to understand what’s going on, because only one platfrom was supported and the code was quite wells structured. Unfortunately it really took me too much time to figure out how the timer callback and the generation are working together.

After an embarrassingly long time, I finally managed to grok the inner workings and after restructuring the code a bit, I got my PCM generation finally working.

The generated audio is 16 bits with a sampling rate of 8000 and you can define how many channels you want to generate. E.g for I2S which is a stereo protocol, you can just generate data on the 2 channels.

I decided to roll my own version of the Library and I called it TalkiePCM to avoid any naming conflicts with the existing libraries. I also added a CMakeLists.txt to make it usable outside of Arduino.

An Example Arduino Sketch

Here is a simple example sketch:

#include "AudioTools.h"
#include "AudioTools/AudioLibs/AudioBoardStream.h" 
#include "TalkiePCM.h" 
#include "Vocab_US_Large.h"

const AudioInfo info(8000, 2, 16);
AudioBoardStream out(AudioKitEs8388V1);  // Audio sink
//CsvOutput<int16_t> out(Serial); // ouput on screen
TalkiePCM voice(out, info.channels);

void setup() {
  Serial.begin(115200);
  AudioLogger::instance().begin(Serial, AudioLogger::Info);
  // setup AudioKit
  auto cfg = out.defaultConfig();
  cfg.copyFrom(info);
  out.begin(cfg);

  Serial.println("Talking...");
}

void loop() {
  voice.say(sp2_DANGER);
  voice.say(sp2_DANGER);
  voice.say(sp2_RED);
  voice.say(sp2_ALERT);
  voice.say(sp2_MOTOR);
  voice.say(sp2_IS);
  voice.say(sp2_ON);
  voice.say(sp2_FIRE);
  voice.silence(1000);
}

1) We define the AudioBoardStream out output object which uses the AudioKitEs8388V1 driver, but you can replace this with any supported audio sink class (e.g. I2SStream).
2) We define a TalkiePCM object giving the above as output and telling it to generate audio in stereo.
3) In the setup we open the AudioBoardStream;
4) In the loop we just generate the PCM data with the help of the say() method, which will automatically render to samples to the assinged AudioBoardStream.

Dependencies

For this example, you need to have the following libraries installed:

Further Reading

I am providing quite a few other Text to Speach libraries that have been convered in the past.


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *