The staring point for doing speech recognition on an Arduino based board is TensorFlow Light For Microcontrollers with the example sketch called micro_speech!
I have presented a DRAFT proposal for a Micro Speach API in one of my last posts. In the meantime I have finally managed to adapt the MicroSpeech example from TensorFlow Lite to follow the philosophy of my Arduino Audio Tools Library. The example uses a Tensorflow model which can recognise the words ‘yes’ and ‘no’. The output stream class is TfLiteAudioStream. In the example I am using an AudioKit, but you can replace this with any type of microphone.
The Arduino Sketch
Here is the complete Arduino Sketch. The audio source is the microphone of the audiokit and the audio sink is the TfLiteAudioStream
class which can classify the result into 4 different outputs labels.
#include "AudioTools.h"
#include "AudioLibs/AudioKit.h"
#include "AudioLibs/TfLiteAudioStream.h"
#include "model.h" // tensorflow model
AudioKitStream kit; // Audio source
TfLiteAudioStream tfl; // Audio sink
const char* kCategoryLabels[4] = {
"silence",
"unknown",
"yes",
"no",
};
StreamCopy copier(tfl, kit); // copy mic to tfl
int channels = 1;
int samples_per_second = 16000;
void respondToCommand(const char* found_command, uint8_t score,
bool is_new_command) {
if (is_new_command) {
char buffer[80];
sprintf(buffer, "Result: %s, score: %d, is_new: %s", found_command, score,
is_new_command ? "true" : "false");
Serial.println(buffer);
}
}
void setup() {
Serial.begin(115200);
AudioLogger::instance().begin(Serial, AudioLogger::Warning);
// setup Audiokit Microphone
auto cfg = kit.defaultConfig(RX_MODE);
cfg.input_device = AUDIO_HAL_ADC_INPUT_LINE2;
cfg.channels = channels;
cfg.sample_rate = samples_per_second;
cfg.use_apll = false;
cfg.auto_clear = true;
cfg.buffer_size = 512;
cfg.buffer_count = 16;
kit.begin(cfg);
// Setup tensorflow output
auto tcfg = tfl.defaultConfig();
tcfg.setCategories(kCategoryLabels);
tcfg.channels = channels;
tcfg.sample_rate = samples_per_second;
tcfg.kTensorArenaSize = 10 * 1024;
tcfg.respondToCommand = respondToCommand;
tcfg.model = g_model;
tfl.begin(tcfg);
}
void loop() { copier.copy(); }
The key information that needs to be provided as configuration for tensorflow are
- number of channels
- sample rate
- kTensorArenaSize
- a callback for handling the responses (respondToCommand)
- the model
- the labels
Like in any other audio sketch, we just need to copy the data from the input to the output class.
Overall Processing Logic
The TfLiteAudioOutput uses Fast Fourier transform (FFT) to calculate the FFT result (with the length of kFeatureSliceSize) over slices (defined by kFeatureSliceStrideMs and kFeatureSliceDurationMs) of audio data. This is used to update a spectrogram (with the length of kFeatureSliceSize x kFeatureSliceCount). After we added 2 (kSlicesToProcess) new FFT results to the end, we let Tensorflow evaluate the updated spectrogram to calculate the classification result. These results are post-processed (in the TfLiteRecognizeCommands class) to make sure that the result is stable.
Dependencies
- Arduino Audio Tools
- tflite-micro-arduino-examples
- arduino-audiokit Optional if you use an AudioKit board
Building the Tensorflow Model
Here is the relevant Jupyter workbook. I am also providing the necessary files to run it in Docker. Just execute docker-compose up
and connect to http://localhost:8888
.
0 Comments