In one of my latest Blogs I described how the API for an easy to use the micro_speech should look like. However I was struggling to make it work. So I decided to do some baby steps and make the existing example work first with my audio-tools library.
We need to implement the (missing) methods defined audio_provider.h:
- InitAudioRecording
- GetAudioSamples
- LatestAudioTimestamp
InitAudioRecording
Init is easy: we just need to start i2s:
TfLiteStatus InitAudioRecording(tflite::ErrorReporter* error_reporter) {
AudioLogger::instance().begin(Serial, AudioLogger::Info);
if (!g_is_audio_initialized) {
g_error_reporter = error_reporter;
g_latest_audio_timestamp = millis();
// Start listening for audio: MONO @ 16KHz
auto cfg = i2s.defaultConfig(RX_MODE);
cfg.channels = 1;
cfg.use_apll = false;
cfg.auto_clear = false;
cfg.sample_rate = kAudioSampleFrequency;
cfg.input_device = AUDIO_HAL_ADC_INPUT_LINE2;
i2s.begin(cfg);
g_is_audio_initialized = true;
g_start_time = millis();
}
return kTfLiteOk;
}
GetAudioSamples
This is easy as well, because we just need to provide the requested number of samples
TfLiteStatus GetAudioSamples(tflite::ErrorReporter* error_reporter,
int start_ms, int duration_ms,
int* audio_samples_size, int16_t** audio_samples) {
// Determine how many samples we want in total
const int duration_sample_count = duration_ms * (kAudioSampleFrequency / 1000);
const int duration_byte_count = duration_sample_count * 2;
// blocking read to provide the requested data
int num_read = i2s.readBytes((uint8_t*)g_audio_output_buffer, duration_byte_count);
if (num_read!=duration_byte_count){
LOGE("readBytes: %d->%d",duration_byte_count, num_read);
return kTfLiteError;
}
// Set pointers to provide access to the audio
*audio_samples_size = duration_sample_count;
*audio_samples = g_audio_output_buffer;
return kTfLiteOk;
}
LatestAudioTimestamp
With this one I was struggling most. I started to provide the milliseconds since the start. But this was leading increasing slice sizes. On the BLE Sense we process 3-4 slices with each loop. Things started to work when I just provided the to be time for 3 frames.
// we can not provide millis() because the result type is int32_t, so we provide the milliseconds since start.
int32_t LatestAudioTimestamp() {
g_latest_audio_timestamp += 60; // time for 3-4 frames
return g_latest_audio_timestamp;
}
Things are not working perfectly yet, but at least the board is starting to recognize some yes and no, which is a big step forward…
The full source code can be found on Github.
If you compare this implementation with the original one you can notice that the complexity could be reduced a lot!
0 Comments