When I was looking for an alternative open source MP3 decoder, I found the minimp3 project from lieff.
Arduino Support
Yesterday I took another effort to make it work on an Arduino ESP32. To make the project Arduino compatible was easy: I just had to add library.properties and change the Readme.md to add the instructions.
Running the code on the ESP32 however turned out to be not so easy: the sketches were crashing w/o any proper error messages or stack traces. Reviewing the code, I found 2 data structures that were allocated on the stack, that were quite big:
- mp3dec_scratch_t
- L12_scale_info
Most of the other data was allocated on the heap as part of the mp3dec_t, so the solution was simple. I just moved these variables from the stack and made it part of mp3dec_t as well and to my surprise all the issues were resolved with this.
The Result
Unfortunately however, the result is disappointing: The decoder is too slow (e.g. on an ESP32) to decode any audio with a rate >= 32000 samples per second and the audio quality is quite poor.
On other platforms this library can use SIMD to improve the performance. Unfortunately neither the ESP32 or the RP2040 can support this. W/o SIMD the library relies heavily on floating point operations – which is a known weak spot of all the Espressif microcontrollers – so any alternative with a decent floating point performance will perform much better .
The Source Code
The adapted Arduino library can be found on github.
3 Comments
Bromium · 22. January 2023 at 22:35
Look at it this way, the data to the DAC is 16 (or 24) bits. So why do you ever need to store huge or tiny numbers with exponents 2^128? 2^128 ~= 10^33. What kind of operation are you doing on a 16 bit number that needs the range of 10^33 or 10^-33, but only with the precision of 24 bits instead of 32 bits?
I think the reason that floating point is used is because trigonometric and logarithmic operation libraries are all in floating point, and not rational numbers. If rational operations, which will be a lot faster and a lot higher resolution were available, nobody in their right mind would use floats, unless they want to deal with huge or tiny numbers like in the sciences.
Best regards
Bromium · 11. January 2023 at 4:07
Hi Phil. I am actually surprised that an encoder/decoder would use floating point. Is this true for all or most coders that you have seen? Fractional numbers should always be represented as a rational number, i.e. a numerator and a denominator. Multiplication and division for example becomes two integer multiplication, one for numerator the other for denominator, and should be a lot faster than a floating point operation. If an overflow condition is to develop, then both numerator should be shifted to the right by the same amount, prior to the operation, to prevent an overflow. At the very last moment that the data is to be sent to the DAC or off-board, the actual division (with rounding) should be performed and data reduced to one long (or short). Floating point is just a rational number where the denominator must be a power of 10. The range of a long float is larger than that of two rational longs, but rational longs are far more accurate (higher resolution). In audio work, we don’t need that kind of range,but we could use more resolution. What do you think?
Thanks for this wonderful blog,and you are covering areas where noone else I have found to cover.
Regards,
Bromium
pschatzmann · 11. January 2023 at 13:41
I am not aware of any mp3 or aac decoder which would rely on integer math, so I am not sure if this is even possible.