An Introduction to Speech Recognition with Arduino

The staring point for doing speech recognition on an Arduino based board is TensorFlow Light For Microcontrollers with the example sketch called micro_speech!

There are quite a few alternative Arduino Libraries for TensorFlow Light but honestly the situation is rather dire and I could not find any version which would fit my requirements

support multiple microcontrollers (ESP32, RP2040 and Arduino Based MBED Implementations (e.g. Nano BLE Sense)) w/o compile errors
no further dependencies

It took me quite some time to find the official repository on Github which is called tflite-micro-arduino-examples, but the only processor that seems to work out of the box is the Nano BLE Sense.

So I cloned the repository to make some corrections for the RP2040 and ESP32. The updated repository can be found here!

Corrected Issues:

std::fmax and std::fmin do not exist on ESP32
TF_LITE_REMOVE_VIRTUAL_DELETE is not public for GreedyMemoryPlanner and MicroErrorReporter on ESP32
RingBufferN is not available for all microcontrollers
Missing implementation for DebugLog, SerialReadLine and SerialWrite for all processors except ARDUINO_ARDUINO_NANO33BLE

The next challenge was to to understand the example and well, it is also quite confusing. Please note the provided examples are still not compiling because the audio provider has not been implemented. And honestly I think it is not worth the effort!

Instead, I decided to try to go for an alternative design which follows the Arduino style of doing things: In my vision it should be very easy to implement a new micro-speech sketch. So a sketch might look as follows:

#include "AudioTools.h"
#include "AudioLibs/AudioKit.h"
#include "AudioLibs/TfLiteAudioOutput.h"
#include "model.h" // tensorflow model

AudioKitStream kit; //Audio source
TfLiteAudioFeatureProvider fp;
TfLiteAudioOutput<4> tfl; // Audio sink
const char* kCategoryLabels[4] = {
    "silence",
    "unknown",
    "yes",
    "no",
};
StreamCopy copier(tfl, kit); // copy mic to tfl
int channels = 2;
int samples_per_second = 16000;

void setup() {
    Serial.begin(115200);
    AudioLogger::instance().begin(Serial, AudioLogger::Warning);

    // setup Audiokit
    auto cfg = kit.defaultConfig(RX_MODE);
    cfg.input_device = AUDIO_HAL_ADC_INPUT_LINE2;
    cfg.channels = channels;
    cfg.sample_rate = samples_per_second;
    kit.begin(cfg);

    // Setup tensorflow 
    fp.kAudioChannels = channels;
    fp.kAudioSampleFrequency = samples_per_second;
    tfl.begin(g_model, fp, kCategoryLabels, 10 * 1024);
}

void loop() {
    copier.copy();
}

In a nutshell, there is no more confusing logic distributed over 17 different files – but just a single simple Arduino Sketch with just a few lines of code: The TfLiteAudioOutput would implement the Arduino Print interface, so that we can write the audio data to it from any audio source and with the begin method we pass all relevant information which is needed for the processing!

Keep tuned…

An Introduction to Speech Recognition with Arduino

Published by pschatzmann on 27. February 202227. February 2022

1 Comment

Wagner Fontalva · 18. April 2023 at 16:29

Leave a Reply Cancel reply

Arduino Audio Tools: Pimping Up Resampling

HIMEM – ESP32 PSRAM on Steroids

Pimping up your ContainerM4A

An Introduction to Speech Recognition with Arduino

Published by pschatzmann on 27. February 202227. February 2022

see also:

1 Comment

Wagner Fontalva · 18. April 2023 at 16:29

Leave a Reply Cancel reply

Related Posts

Arduino Audio Tools: Pimping Up Resampling

HIMEM – ESP32 PSRAM on Steroids

Pimping up your ContainerM4A