In one of my latest Blogs I described how the API for an easy to use the micro_speech should look like. However I was struggling to make it work. So I decided to do some baby steps and make the existing example work first with my audio-tools library.

We need to implement the (missing) methods defined audio_provider.h:

  • InitAudioRecording
  • GetAudioSamples
  • LatestAudioTimestamp

InitAudioRecording

Init is easy: we just need to start i2s:

TfLiteStatus InitAudioRecording(tflite::ErrorReporter* error_reporter) {
  AudioLogger::instance().begin(Serial, AudioLogger::Info);
  if (!g_is_audio_initialized) {
    g_error_reporter = error_reporter;
    g_latest_audio_timestamp = millis();
    // Start listening for audio: MONO @ 16KHz
    auto cfg = i2s.defaultConfig(RX_MODE);
    cfg.channels = 1;
    cfg.use_apll = false;
    cfg.auto_clear = false;
    cfg.sample_rate = kAudioSampleFrequency;
    cfg.input_device = AUDIO_HAL_ADC_INPUT_LINE2;
    i2s.begin(cfg);
    g_is_audio_initialized = true;
    g_start_time = millis();
  }
  return kTfLiteOk;
}

GetAudioSamples

This is easy as well, because we just need to provide the requested number of samples

TfLiteStatus GetAudioSamples(tflite::ErrorReporter* error_reporter,
                             int start_ms, int duration_ms,
                             int* audio_samples_size, int16_t** audio_samples) {
  // Determine how many samples we want in total
  const int duration_sample_count = duration_ms * (kAudioSampleFrequency / 1000);
  const int duration_byte_count = duration_sample_count * 2;
  // blocking read to provide the requested data
  int num_read = i2s.readBytes((uint8_t*)g_audio_output_buffer, duration_byte_count);
  if (num_read!=duration_byte_count){
    LOGE("readBytes: %d->%d",duration_byte_count, num_read);
    return kTfLiteError;
  }
  // Set pointers to provide access to the audio
  *audio_samples_size = duration_sample_count;
  *audio_samples = g_audio_output_buffer; 
  return kTfLiteOk;
}

LatestAudioTimestamp

With this one I was struggling most. I started to provide the milliseconds since the start. But this was leading increasing slice sizes. On the BLE Sense we process 3-4 slices with each loop. Things started to work when I just provided the to be time for 3 frames.

// we can not provide millis() because the result type is int32_t, so we provide the milliseconds since start.
int32_t LatestAudioTimestamp() {
  g_latest_audio_timestamp += 60; // time for 3-4 frames
  return g_latest_audio_timestamp;
}

Things are not working perfectly yet, but at least the board is starting to recognize some yes and no, which is a big step forward…
The full source code can be found on Github.

If you compare this implementation with the original one you can notice that the complexity could be reduced a lot!


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *