The staring point for doing speech recognition on an Arduino based board is TensorFlow Light For Microcontrollers with the example sketch called micro_speech!
There are quite a few alternative Arduino Libraries for TensorFlow Light but honestly the situation is rather dire and I could not find any version which would fit my requirements
- support multiple microcontrollers (ESP32, RP2040 and Arduino Based MBED Implementations (e.g. Nano BLE Sense)) w/o compile errors
- no further dependencies
It took me quite some time to find the official repository on Github which is called tflite-micro-arduino-examples, but the only processor that seems to work out of the box is the Nano BLE Sense.
So I cloned the repository to make some corrections for the RP2040 and ESP32. The updated repository can be found here!
Corrected Issues:
- std::fmax and std::fmin do not exist on ESP32
- TF_LITE_REMOVE_VIRTUAL_DELETE is not public for GreedyMemoryPlanner and MicroErrorReporter on ESP32
- RingBufferN is not available for all microcontrollers
- Missing implementation for DebugLog, SerialReadLine and SerialWrite for all processors except ARDUINO_ARDUINO_NANO33BLE
The next challenge was to to understand the example and well, it is also quite confusing. Please note the provided examples are still not compiling because the audio provider has not been implemented. And honestly I think it is not worth the effort!
Instead, I decided to try to go for an alternative design which follows the Arduino style of doing things: In my vision it should be very easy to implement a new micro-speech sketch. So a sketch might look as follows:
#include "AudioTools.h"
#include "AudioLibs/AudioKit.h"
#include "AudioLibs/TfLiteAudioOutput.h"
#include "model.h" // tensorflow model
AudioKitStream kit; //Audio source
TfLiteAudioFeatureProvider fp;
TfLiteAudioOutput<4> tfl; // Audio sink
const char* kCategoryLabels[4] = {
"silence",
"unknown",
"yes",
"no",
};
StreamCopy copier(tfl, kit); // copy mic to tfl
int channels = 2;
int samples_per_second = 16000;
void setup() {
Serial.begin(115200);
AudioLogger::instance().begin(Serial, AudioLogger::Warning);
// setup Audiokit
auto cfg = kit.defaultConfig(RX_MODE);
cfg.input_device = AUDIO_HAL_ADC_INPUT_LINE2;
cfg.channels = channels;
cfg.sample_rate = samples_per_second;
kit.begin(cfg);
// Setup tensorflow
fp.kAudioChannels = channels;
fp.kAudioSampleFrequency = samples_per_second;
tfl.begin(g_model, fp, kCategoryLabels, 10 * 1024);
}
void loop() {
copier.copy();
}
In a nutshell, there is no more confusing logic distributed over 17 different files – but just a single simple Arduino Sketch with just a few lines of code: The TfLiteAudioOutput would implement the Arduino Print interface, so that we can write the audio data to it from any audio source and with the begin method we pass all relevant information which is needed for the processing!
Keep tuned…
1 Comment
Wagner Fontalva · 18. April 2023 at 16:29
Friend, your site is fantastic, I am amazed with so much accurate information! Thank you for sharing!!!