Tensorflow Lite Microspeach with my Audio Tools Library - The Final Solution

The staring point for doing speech recognition on an Arduino based board is TensorFlow Light For Microcontrollers with the example sketch called micro_speech!

I have presented a DRAFT proposal for a Micro Speach API in one of my last posts. In the meantime I have finally managed to adapt the MicroSpeech example from TensorFlow Lite to follow the philosophy of my Arduino Audio Tools Library. The example uses a Tensorflow model which can recognise the words ‘yes’ and ‘no’. The output stream class is TfLiteAudioStream. In the example I am using an AudioKit, but you can replace this with any type of microphone.

The Arduino Sketch

Here is the complete Arduino Sketch. The audio source is the microphone of the audiokit and the audio sink is the TfLiteAudioStream class which can classify the result into 4 different outputs labels.


#include "AudioTools.h"
#include "AudioLibs/AudioKit.h"
#include "AudioLibs/TfLiteAudioStream.h"
#include "model.h"  // tensorflow model

AudioKitStream kit;  // Audio source
TfLiteAudioStream tfl;  // Audio sink
const char* kCategoryLabels[4] = {
    "silence",
    "unknown",
    "yes",
    "no",
};
StreamCopy copier(tfl, kit);  // copy mic to tfl
int channels = 1;
int samples_per_second = 16000;

void respondToCommand(const char* found_command, uint8_t score,
                      bool is_new_command) {
  if (is_new_command) {
    char buffer[80];
    sprintf(buffer, "Result: %s, score: %d, is_new: %s", found_command, score,
            is_new_command ? "true" : "false");
    Serial.println(buffer);
  }
}

void setup() {
  Serial.begin(115200);
  AudioLogger::instance().begin(Serial, AudioLogger::Warning);

  // setup Audiokit Microphone
  auto cfg = kit.defaultConfig(RX_MODE);
  cfg.input_device = AUDIO_HAL_ADC_INPUT_LINE2;
  cfg.channels = channels;
  cfg.sample_rate = samples_per_second;
  cfg.use_apll = false;
  cfg.auto_clear = true;
  cfg.buffer_size = 512;
  cfg.buffer_count = 16;
  kit.begin(cfg);

  // Setup tensorflow output
  auto tcfg = tfl.defaultConfig();
  tcfg.setCategories(kCategoryLabels);
  tcfg.channels = channels;
  tcfg.sample_rate = samples_per_second;
  tcfg.kTensorArenaSize = 10 * 1024;
  tcfg.respondToCommand = respondToCommand;
  tcfg.model = g_model;
  tfl.begin(tcfg);
}

void loop() { copier.copy(); }

The key information that needs to be provided as configuration for tensorflow are

number of channels
sample rate
kTensorArenaSize
a callback for handling the responses (respondToCommand)
the model
the labels

Like in any other audio sketch, we just need to copy the data from the input to the output class.

Overall Processing Logic

The TfLiteAudioOutput uses Fast Fourier transform (FFT) to calculate the FFT result (with the length of kFeatureSliceSize) over slices (defined by kFeatureSliceStrideMs and kFeatureSliceDurationMs) of audio data. This is used to update a spectrogram (with the length of kFeatureSliceSize x kFeatureSliceCount). After we added 2 (kSlicesToProcess) new FFT results to the end, we let Tensorflow evaluate the updated spectrogram to calculate the classification result. These results are post-processed (in the TfLiteRecognizeCommands class) to make sure that the result is stable.

Dependencies

Arduino Audio Tools
tflite-micro-arduino-examples
arduino-audiokit Optional if you use an AudioKit board

Building the Tensorflow Model

Here is the relevant Jupyter workbook. I am also providing the necessary files to run it in Docker. Just execute docker-compose up and connect to http://localhost:8888.

Tensorflow Lite Microspeach with my Audio Tools Library – The Final Solution

Published by pschatzmann on 5. April 20225. April 2022

The Arduino Sketch

Overall Processing Logic

Dependencies

Building the Tensorflow Model

0 Comments

Leave a Reply Cancel reply

A Http Live Streaming (HLS) Player with the Arduino Audio Tools

SD Read and Write Speeds on an ESP32

AudioTools: An ESP32 IDF implementation of URLStream

Tensorflow Lite Microspeach with my Audio Tools Library – The Final Solution

Published by pschatzmann on 5. April 20225. April 2022

The Arduino Sketch

Overall Processing Logic

Dependencies

Building the Tensorflow Model

see also:

0 Comments

Leave a Reply Cancel reply

Related Posts

A Http Live Streaming (HLS) Player with the Arduino Audio Tools

SD Read and Write Speeds on an ESP32

AudioTools: An ESP32 IDF implementation of URLStream