Why almost all Raspberry Pi Pico (RP2040) Neopixel code is wrong

0

Bold title, but stay with me. Neopixels, based on the WS2812b, are great. You can string together a bunch of them with only 3 wires, and make the whole string blinky blinky. The three connections are power, ground, and “data” which in this case is a custom protocol that uses an 800kHz square wave signal, tweaked so that the positive pulse is either shorter (logical 0) or longer (logical 1). There are multiple conflicting datasheets out there. Digikey seems pretty definitive, and defines the timing thusly:

From datasheet. T0H is 0.4us ±150ns.

Dig around a bit and you’ll find a datasheet for the same part number, labeled v2. It has almost the same timing requirements, but tweaked by .05 microseconds in a few places.

From v2 datasheet. T0H is 0.4us ±150ns.

Either way, the one we’re interested in, T0H, is the same in both. 0.4us. 400 nanoseconds. If you are writing an embedded driver for this chipset, when you need to send a 0, you want that high voltage time to be as humanly (machinely?) close to 400ns as possible. There is quite a bit of slop in the specification at ±150ns, but when working with consumer grade hardware (and engineers!) it’s not wise to push the tolerances too much.

Next up for discussion, the Raspberry Pi Pico, based on the new RP2040 microcontroller. A fantastic little piece of hardware, available for $1 in small quantities, or splurge $4 for an entire board. In addition to dual cores, it has built-in hardware for programmable IO, or PIO, where you configure an adorable little state machine to process inputs or generate outputs at up to the full 125+Mhz clock speed. PIO is programmed with an equally adorable mini machine language with 0x8 instructions. It also offers a “side set” feature, where changing the state of a pin doesn’t even require its own instruction–it just happens as a side effect of executing other instructions. And every instruction executes in one clock cycle. Fast fast.

A fantastic tutorial into PIO lives in the official documentation for the C/C++ SDK, but don’t run off–it’s a great read even if all you’re interested in is MicroPython. It includes the same WS2812 driver code I find all over the web with minor modifications, which lives on GitHub. There’s an assembler for PIO, but the Python syntax has a smart way to represent ‘inline’ ‘assembly’. The entire driver follows. I am making Fair Use here for criticism and commentary, but the code is available for most any purpose under a BSD license.

@rp2.asm_pio(sideset_init=rp2.PIO.OUT_LOW, out_shiftdir=rp2.PIO.SHIFT_LEFT, autopull=True, pull_thresh=24)
def ws2812():
    T1 = 2
    T2 = 5
    T3 = 3
    wrap_target()
    label("bitloop")
    out(x, 1)               .side(0)    [T3 - 1]
    jmp(not_x, "do_zero")   .side(1)    [T1 - 1]
    jmp("bitloop")          .side(1)    [T2 - 1]
    label("do_zero")
    nop()                   .side(0)    [T2 - 1]
    wrap()

Pretty readable as-is. But T1, T2, and T3 are timing delays, in units of instruction cycles that in this case are clipping along at 8MHz. The side(0) and side(1) markers are what actually sets the output pin to a 0 or 1. And the square-bracketed trailing terms apply the explicit delays for timing purposes, minus 1 to account for the time taken by the instruction itself. You may see where this is going…

The state machine hangs out in the 0 state, waiting for the main microcontroller to FIFO some bits over. When bits arrive the out(x, 1) instruction grabs a bit, puts it into the x register, and decides what to do with it. Either a narrow pulse of duration T1, or a longer pulse of T1 + T2 before returning to 0. The nice tutorial shows it like this:

Timing diagram. Note: time lengths NOT to scale based on accompanying code!

With time along the bottom axis, this chart is NOT to scale! In the code, T1 is only 2 ticks, while T2 is a whopping 5 ticks. Each tick at 8Mhz is 125ns. In the 0 condition, the positive pulse is only 250ns wide. This is at the extreme bitter edge of what the datasheet allows. If the frequency is fast by 1Hz, or if the particular neopixel being addressed is .01% out of spec, or if there’s a rounding error just about anywhere, this technically won’t work.

Now, in practice it does work. Worldwide, there are probably a million WS2812s running off this code and it’s numerous copypasta variants. But this is relying on the generosity of the graces of the engineers behind these LEDs, to say nothing of manufacturing tolerances. Depending on undocumented behavior makes your feet smell bad, your printer to spontaneously ignite, and your line at the shopping center to always run the slowest.

To Adafruit’s credit, their independent implementation by Scott Shawcroft doesn’t have this same quirk. That one divides the cycle time exactly into thirds, which seems eminently reasonable. A 3, 4, 3 configuration of the example code would work pretty well, too.

So where did the 2, 5, 3 values come from? I’d love to hear your theories!

For more ramblings about my creative projects, hop on over to the Micahcosmos.

Related Posts

© All Right Reserved
Proudly powered by WordPress | Theme: Shree Clean by Canyon Themes.