Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RMT Buffer Allocation Fix (Thread-Safe) for Issue #375 #394

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

teknynja
Copy link

NOTE: This is a thread-safe version of pull-request #392: It is essentially the same code with a mutex added to protect the code from multiple threads.

This pull request addresses issue #375 [pixel.show() crash with more than 73 pixel on ESP32s3]

After reviewing the code and also getting some helpful feedback from the Espressif Forums (https://www.esp32.com/viewtopic.php?f=13&t=40270) and @robertlipe it was determined that the code in esp.c for handling the RMT item buffers when using the IDF v5 framework was allocating too much space on the stack when using more than 70ish pixels.

This pull request addresses that issue by allocating the RMT buffers from the heap instead. It will attempt to allocate a single block of memory to accommodate the largest configured instance (sharing the buffer between instances is fine, as the buffer is completely populated each each time the espShow() method is called).

I also took the time to improve the channel allocation management, previously the RMT channels were initialized on each call to espShow(), now the RMT channels are only de-initialized and re-initialized whenever the output pin is changed.

Finally, I was concerned about problems that may be caused by allocating large buffers on the heap without giving the user any way to free that memory, so the code allows a user to free that memory (and also release the RMT channels) by setting the number of pixels to zero using .updateLength(0) and then calling .show(). This will de-allocate the heap memory used for the RMT buffers and release the RMT channels held by driver. They will automatically be re-allocated if needed when setting the number of pixels back to a non-zero value.

@teknynja
Copy link
Author

Updated code to move espShow()'s mutex initialization to the Adafruit_NeoPixel constructor, allowing users to avoid a race condition.

@robertlipe
Copy link

robertlipe commented Jun 16, 2024 via email

Copy link

@egnor egnor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE NOTE NOTE I AM ALSO JUST A KIBITZER DO NOT ASSUME I KNOW WHAT I AM DOING

I think this change can be simplified a great deal, and doing so will make it much easier to review and merge. See the individual comments, but in short

  • remove spurious whitespace changes
  • use a function-local static to initialize the mutex
  • simplify the interaction with rmtInit() and rmtDeinit()

@@ -92,6 +92,11 @@ Adafruit_NeoPixel::Adafruit_NeoPixel(uint16_t n, int16_t p, neoPixelType t)
}
init = true;
#endif
#if defined(ESP32)
#if ESP_IDF_VERSION >= ESP_IDF_VERSION_VAL(5, 0, 0)
espInit();
Copy link

@egnor egnor Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] I would actually have espInit() called unconditionally for #if defined(ESP32), leave the ESP-IDF version dependencies in esp.c? BUT see comments below, I think we can actually remove espInit() entirely?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here was to minimize the changes to the library's binary code flow when not compiling for ESP32/IDF5. The reason I'm initializing the mutex in the constructor is for thread-safety - as long as all instances of Adafruit_NeoPixel are constructed on the same (usually "main") thread, there's no chance for a race condition. Trying to do it in espShow() when running each instance on different threads creates a race condition when initializing the mutex.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. This was actually the very problem in the original version of this PR that we (he) sat out to solve in this version.

@@ -234,6 +239,7 @@ extern "C" IRAM_ATTR void espShow(uint16_t pin, uint8_t *pixels,
#elif defined(ESP32)
extern "C" void espShow(uint16_t pin, uint8_t *pixels, uint32_t numBytes,
uint8_t type);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[trivial] Probably remove this extra blank line, just to keep the PR minimal?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my bad - I have to learn to reign in my tendency to add blank lines ;-)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had 'ignore whitespace' turned off. I'm so used to submits just fixing such things autoamaticaly that I dont' get much wound up about them. But egnor is right that it's best to not leave a wake as you boat through code.

@@ -3553,4 +3559,4 @@ neoPixelType Adafruit_NeoPixel::str2order(const char *v) {
}
if (w < 0) w = r; // If 'w' not specified, duplicate r bits
return (w << 6) | (r << 4) | ((g & 3) << 2) | (b & 3);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[trivial] Probably remove this whitespace change, just to keep the PR minimal?

for specific hardware/library versions
*/
#if defined(ESP32)
#if ESP_IDF_VERSION >= ESP_IDF_VERSION_VAL(5, 0, 0)
Copy link

@egnor egnor Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Again, maybe always have an espInit() for ESP32, just have it do nothing as appropriate based on ESP-IDF version? BUT see comments below, I think we can actually remove espInit() entirely?

@@ -31,40 +31,93 @@
#endif



Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[trivial] Probably remove this extra blank line, just to keep the PR minimal?

// Adafruit_NeoPixel must be constructed before launching and child threads
void espInit() {
if (!show_mutex) {
show_mutex = xSemaphoreCreateMutex();
Copy link

@egnor egnor Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONSIDER instead of initializing the mutex like this, doing the following WITHIN espShow(...):

static SemaphoreHandle_t show_mutex = xSemaphoreCreateMutex();

per C++ standard, this will be a thread-safe runs-once initializer that will create the mutex on first entry.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a perfect solution if it compiled; unfortunately this isn't a c++ module so initializing statics with non-constants is not supported.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All correct. Just as an educational footnote, FreeRTOS, like several locking thingies, actually uses different primitives for a static and dynamic sema's.

xSemaphoreCreateMutexStatic

vs

xSemaphoreCreateMutex

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uff, I forgot this was a .c file.

The xSemaphoreCreateMutexStatic is only different in that it uses a user-provided buffer instead of allocating a buffer, I don't think it's really "static" in the sense of being able to do static initalization unfortunately.

@@ -219,6 +272,6 @@ void espShow(uint8_t pin, uint8_t *pixels, uint32_t numBytes, boolean is800KHz)
}

#endif // ifndef IDF5

Copy link

@egnor egnor Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[trivial] Probably undo this line removal, just to keep the PR minimal?

static uint32_t led_data_size = 0;
static int rmtPin = -1;

if (show_mutex && xSemaphoreTake(show_mutex, SEMAPHORE_TIMEOUT_MS / portTICK_PERIOD_MS) == pdTRUE) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better diagnostics, I might say something like

if (!show_mutex) {
  log_e("NeoPixel ESP32 mutex uninitialized, did you call show() without begin()?");
  return;
}
if (xSemaphoreTake(show_mutex, SEMAPHORE_TIMEOUT_MS / portTICK_PERIOD_MS) != pdTRUE) {
  log_e("NeoPixel ESP32 mutex acquisition failed, long show() in progress?");
  return;
}

... then you can continue on knowing that the mutex is held.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of adding more feedback for users. I was trying to minimize the amount of code/data I added, to avoid bloating the size of the library.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hard problem in a library. You usually don't know what environment you're running in. Even in ESP32-land alone, you might be in Nuttx or Arduino or Zephyr or whatever and the log primitives might be different. There might be some constraints on where this library is supported (I'm a kibbitzer, too.)

As a practical matter, I suppose, if we're calling xSemaphoreTake and friends, we're already planting a flag (intentionally or otherwise) that this code is for something on FreeRTOS which is probably on ESP-IDF. So if you're calling this before the system is fully up, do you want to conditionally call https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/system/log.html#c.ESP_EARLY_LOGE ? There's the messiness that logging might be on UART or virtualized USB pretend UARTs, so getting the EARLY stuff right matters.

HOWEVER, this library doesn't seem to really support all those cases anyway. It's pretty married to the ESP-IDF system and we're inside a esp.c. If we were to do anything here, I'd use ESP_LOGE("adafruit_neopixel", ...) - and fix the other occurrence already inthe code to do the same. We can probably get away with this inside esp.c in a way we couldn't in the caller.

Copy link

@egnor egnor Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think in this library we can 100% assume Arduino-on-ESP-IDF. It uses the RMT library from arduino-esp32 which in turn depends on the RMT driver in ESP-IDF, and we are ourselves using FreeRTOS mutexes, so there's no reason to overthink things.

And by the time anyone's user code is running, and they're calling show(), I think all the regular logging methods should be just fine. I'd follow existing Adafruit convention for whether to use log_e or ESP_LOGE.

And I think for Adafruit's mission, being "friendly" with error messages supercedes absolute minimum code size (especially since this is for the ESP32-- might feel differently on an 8 bit AVR).

if (show_mutex && xSemaphoreTake(show_mutex, SEMAPHORE_TIMEOUT_MS / portTICK_PERIOD_MS) == pdTRUE) {
uint32_t requiredSize = numBytes * 8;
if (requiredSize > led_data_size) {
free(led_data);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONSIDER instead, for diagnostics and conciseness:

const int requiredBytes = requiredSize * sizeof(rmt_data_t);
led_data = (rmt_data_t *)realloc(led_data, requiredBytes);
if (!led_data) {
  log_e("NeoPixel RMT buffer allocation (%d bytes) failed", requiredBytes);
  xSemaphoreGive(show_mutex);
  return;
}

log_e("Failed to init RMT TX mode on pin %d", pin);
return;
}
if (led_data_size > 0 && requiredSize <= led_data_size) {
Copy link

@egnor egnor Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rmtInit() does several things

rmtDeinit() just does one thing

Based on that, you can do just like previous code

  • always call rmtInit()
  • never call rmtDeinit()

That will keep RMT initialized when it should be, let it be de-initialized as needed (if the pin is repurposed with pinMode() or whatnot), and simplify the code a bunch. Assuming you agree (PLEASE CHECK MY LOGIC), you can

  • remove rmtPin as a variable
  • remove all the checks against rmtPin and calls to rmtDeinit(rmtPin)
  • call rmtInit(...) unconditionally on entry, precisely as the previous code did

That would simplify this change to JUST changing the allocation semantics (with a mutex just in case), which should be a lot easier for everyone to review.

Copy link
Author

@teknynja teknynja Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code handles more than just the caller changing the pin - since all instances of Adafruit_NeoPixel are sharing the RMT API (and each instance defines its own output pin), this code handles re-initializing the RMT each time a different instance's espShow() is called. As noted in the comments, it's not very efficient but seems to work ok in practice.

The multi-threaded scenario I'm targeting is creating multiple instances of Adafruit_NeoPixel on the main thread, then launching individual threads to run each instance (and call show()). It would be nice to be able to handle the mutex initialization on the individual threads, but AFAICT there isn't a way to initialize the mutex that way without introducing a race condition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point about being able to unconditionally call rmtInit() - I didn't dig into the Espressif implementation and treated it like a black box, assuming I would need to let the API know when the pin changed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think being "dumb" will be no less efficient (the underlying library already bails out in the no-change case), and that approach was blessed by the author of arduino-esp32 @me-no-dev in reviewing @ladyada's original change, and the "dumb" approach seems worth the rather large gain in simplicity and in keeping this change small. IMVHO

Copy link
Author

@teknynja teknynja Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like the way forward is to leave the espInit() logic in place, remove the conditional code around rmtInit(), add some additional logging as indicated above, (and see if I can figure out how to clean up my whitespace issues 😜). Does that sound about right?

Its been a minute, so it's gonna take a bit for me to setup my environment (and hardware) and get back into the groove on this issue.

Copy link

@egnor egnor Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me BUT we're a bunch of kibitzers guessing at what Adafruit would actually take, so if Adafruit folks jump in they should override any of our peanut-gallery comments. (OR they might just end up rolling their own fix??)

@robertlipe
Copy link

It's unfortunate that the inertia behind this (much needed) PR seems to have been lost...again.

Is there anyone with commit privileges that can deliver specific reasons this shouldn't be applied or that can produce a list of required changes to get it applied? I don't particularly have a direct interest in this project, but if it "just" needs a programmer to jump through acceptance hoops and confirm that a specific test case works, I'm willing to build up a test fixture and help. Crashing at 70-odd pixels on a robust SOC is kinda embarrassing.

I just hate to see this languish for another couple of quarters. How can we move this PR toward that that "merge" button?

I'm willing to commit to (probably)more-than-moderately-qualified brain (and equipment) cycles to moving this PR forward. What will it take? Let's get the kibitizers (?) and the approvers together and get some action on this, please.

I'd like to see someone with submit approval comment on what it would take to move this to completion. Whether Tekninja or another of us takes the PR and moves it to the goal, the ESP32 world really needs to see this landed.

How can we all help get this submitted?

@ednieuw
Copy link

ednieuw commented Sep 19, 2024

I forked your changes and tried it with an Arduino Nano ESP32 with ESP32 board 3.0.4
But the Arduino Nano ESP32 crashes with my program.
When I use the Arduino board 2.0.13 the modification does not crash the Arduino. But with the original Neopixel also works fine.
So there is still a problem with the Neopixel library on ESP32 board 3.0.4

@teknynja
Copy link
Author

@ednieuw - The only thing I'm noticing when doing a quick look at your code is that you are creating three instances of the NeoPixel driver. The first one isn't really allocating much memory because it's using default constructor, but the other two are allocating space for 256 pixels each at 3 & 4 bytes per pixel. That's probably not too much memory for the driver's normal pixel buffers, but it does end up trying to allocate 32768 bytes from the heap for the RMT buffer to be shared by both instances - I don't know if that's a problem for the underlying libraries or not, but you might try paring back the number of pixels to the required minimum to see if that helps.

Digging into this problem did reveal another problem in my changes though (although it's likely not related to your issue). If a user creates all instances of the NeoPixel driver using the default constructor and then configures/uses those instance(s) later on in the code, the mutex object is not instantiated, which would probably negatively impact the operation of the esp32 RMT driver. The state of my patch seems to be in limbo at this point 🤷, so unless somebody hits this particular code path, I'm probably just going to leave things as they are for now.

@ednieuw
Copy link

ednieuw commented Sep 19, 2024

I found out the size of the RMT-buffer is large enough for 512 LEDs.

Espressif has a nice example with WS2812 LED strips:
https://docs.espressif.com/projects/arduino-esp32/en/latest/api/rmt.html

I tested it with a with 256 and 512 WS2812 LEDs and that works.
It takes 8 sec to write 512 LEDs 512 times. so apprx 16 ms to write 512 LEDs.
The strip is connected to Arduino pin D5 = GPIO8. Compile with pin numbering: By GPIO numbering

My effort to adapt the coding for a SK6812 strip needs some more time. The timing is not correct and it still write 3 bytes instead of four. But it is not on my priority list yet.
https://github.com/ednieuw/RMT_WS2812-SK6812-ESP32

// Copyright 2024 Espressif Systems (Shanghai) PTE LTD
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at

//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

/**
 * @brief This example demonstrates usage of RGB LED driven by RMT
 *
 * The output is a visual WS2812 RGB LED color moving in a 8 x 4 LED matrix
 * Parameters can be changed by the user. In a single LED circuit, it will just blink.
 */
#define PIN_LED_RGB 8

#define NR_OF_LEDS     16 * 16
#define NR_OF_ALL_BITS 24 * NR_OF_LEDS

//
// Note: This example uses a board with 32 WS2812b LEDs chained one
//      after another, each RGB LED has its 24 bit value
//      for color configuration (8b for each color)
//
//      Bits encoded as pulses as follows:
//
//      "0":
//         +-------+              +--
//         |       |              |
//         |       |              |
//         |       |              |
//      ---|       |--------------|
//         +       +              +
//         | 0.4us |   0.85 0us   |
//
//      "1":
//         +-------------+       +--
//         |             |       |
//         |             |       |
//         |             |       |
//         |             |       |
//      ---+             +-------+
//         |    0.8us    | 0.4us |

rmt_data_t led_data[NR_OF_ALL_BITS];

void setup() {
  Serial.begin(115200);
  if (!rmtInit(PIN_LED_RGB, RMT_TX_MODE, RMT_MEM_NUM_BLOCKS_1, 10000000)) {
    Serial.println("init sender failed\n");
  }
  Serial.println("real tick set to: 100ns");
}

int color[] = {0x11, 0x99, 0x11};  // Green Red Blue values
int led_index = 0;

void loop() 
{
  // Init data with only one led ON
  int led, col, bit;
  int i = 0;
  for (led = 0; led < NR_OF_LEDS; led++) {
    for (col = 0; col < 3; col++) {
      for (bit = 0; bit < 8; bit++) {
        if ((color[col] & (1 << (7 - bit))) && (led == led_index)) {
          led_data[i].level0 = 1;
          led_data[i].duration0 = 8;
          led_data[i].level1 = 0;
          led_data[i].duration1 = 4;
        } else {
          led_data[i].level0 = 1;
          led_data[i].duration0 = 4;
          led_data[i].level1 = 0;
          led_data[i].duration1 = 8;
        }
        i++;
      }
    }
  }
  // make the led travel in the panel
  if ((++led_index) >= NR_OF_LEDS) {
    led_index = 0;
  }
  // Send the data and wait until it is done
  rmtWrite(PIN_LED_RGB, led_data, NR_OF_ALL_BITS, RMT_WAIT_FOR_EVER);
//  delay(1);
}

@ednieuw
Copy link

ednieuw commented Sep 23, 2024

Not willing to wait long for an working update of the Neopixel library I wrote a library for WS2812 and SK6812 LEDs to be used with an Arduino Nano ESP32 with Espressif board ESP32 V3.0.
It uses the RMT driver and It will probably also work with ESP32-S3 and other ESP32's
https://github.com/ednieuw/EdSoftLED
It is crude version 1.0.0

@robertlipe
Copy link

robertlipe commented Sep 23, 2024 via email

@bobdoah
Copy link

bobdoah commented Nov 28, 2024

Thanks @teknynja, I was struggling with my build of https://github.com/BikeBeamer/BikeBeamer because it uses 256 WS2812B LEDs. This fixed my issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants