TTS with Pre-Recorded Audio: Building a Talking Clock

Last year I was digging into Arduino Based TTS Solutions and came to the conclusion that the available engines will not provide any quality audio and therefore recommended to consider an approach which is based on recorded audio samples.

Today I took the opportunity to create the new arduino-simple-tts project which is based on this approach. As a proof of concept I have implemented the speaking of numbers and of the output of time. All the relevant words have been recorded as mp3 files and are stored in program memory.

Arduino Sketch

Here is a Example Sketch which implements a Speaking Clock:

#include "SimpleTTS.h"
#include "AudioCodecs/CodecMP3Helix.h"
#include "AudioLibs/AudioKit.h"
#include "TimeInfo.h"

// Output
TimeToText ttt;
AudioKitStream i2s; // replace with alterntive Audio Sink if needed: AnalogAudioStream, I2SStream etc.
MP3DecoderHelix mp3;
AudioDictionary dictionary(ExampleAudioDictionaryValues);
TextToSpeech tts(ttt, i2s, mp3, dictionary);

// Determine Time
TimeInfo timeInfo;
const char* ssid = "SSID";
const char* password = "password";

void setup() {
  Serial.begin(115200);
  AudioLogger::instance().begin(Serial, AudioLogger::Info);

  // setup i2s
  auto cfg = i2s.defaultConfig();
  cfg.sample_rate = 24000;
  cfg.channels = 1;
  i2s.begin(cfg);

  // We announce the time only every 5 minutes
  timeInfo.setEveryMinutes(5);
  // start WIFI and time
  timeInfo.begin(ssid, password);
  ttt.say(timeInfo.time());
}

void loop() {
  // speach output
  if (timeInfo.update()){
    ttt.say(timeInfo.time());
  }
}

The TimeToText class is translating the time into words which is the input to the TextToSpeech class which handles the audio. This class is based on my Arduino Audio Tools library: so we need to feed it with a OutputSink and a MP3 Decoder. The audio samples are determined with the help of the AudioDictionary. As part of the sketch I have implemented the TimeInfo class which just retrieves the time information from a time server and determines if we need to announce a new time.

The full source code is available on Github

Memory Requirements

The sketch which includes the audio data is only using 37% of the program storage:

Sketch uses 1171698 bytes (37%) of program storage space. Maximum is 3145728 bytes.
Global variables use 48232 bytes (14%) of dynamic memory, leaving 279448 bytes for

I think this is quite impressive and we have quite some headroom before we need to resort to the samples being stored on a SD drive.

Next Steps

I see three things that could be improved:

There are some unnatural long gaps between some numbers: We could filter them out.
We need some functionality that helps us to record the text input
It would be cool to extend the example to support speak recognition that would reply to the request: “what’s the time?”

TTS with Pre-Recorded Audio: Building a Talking Clock

Published by pschatzmann on 16. February 202216. February 2022

Arduino Sketch

Memory Requirements

Next Steps

1 Comment

Len Struttmann · 13. January 2023 at 2:58

Leave a Reply Cancel reply

Remote Control for the Arduino AudioTools AudioPlayer

A Http Live Streaming (HLS) Player with the Arduino Audio Tools

SD Read and Write Speeds on an ESP32

TTS with Pre-Recorded Audio: Building a Talking Clock

Published by pschatzmann on 16. February 202216. February 2022

Arduino Sketch

Memory Requirements

Next Steps

see also:

1 Comment

Len Struttmann · 13. January 2023 at 2:58

Leave a Reply Cancel reply

Related Posts

Remote Control for the Arduino AudioTools AudioPlayer

A Http Live Streaming (HLS) Player with the Arduino Audio Tools

SD Read and Write Speeds on an ESP32