Experiences with Zephyr

In mid November 2018 I’d gotten nrfcxx to the point where it met my first-level needs: low-power Nordic nRF5-based Bluetooth beacons providing sensor data at 1 Hz acquisitions from ambient and enclosed environments including HVAC systems.

nrfcxx has two significant weaknesses, though:

  • There’s no support for over-the-air firmware updates;
  • There’s no support for Bluetooth central/peripheral roles that would allow configuration at the device level (e.g. to provide calibration values).

Around this time I discovered the Zephyr Project, a real-time operating system that emerged from Intel’s Open Source Technology Center and was subsequently adopted by the Linux Foundation. Many of the major silicon providers for IoT applications are members of this project. Best of all, it included not only support for a boot loader, but also a complete Bluetooth stack contributed by Nordic Semiconductor which included support for Bluetooth Mesh.

So I decided to devote up to three months full-time effort to a deep dive into Zephyr, to see if it was a better path forward for me than nrfcxx.

I submitted my first patch 2018-11-18. Over the next three months I opened 45 issues and submitted 28 pull requests, of which 24 were merged.

Things got a little rocky from the start. I2C on Nordic didn’t work because the API specified a behavior that the driver didn’t support. The kernel command to wait for short periods (measured in microseconds) was horribly inaccurate on Nordic hardware. The system timer implementation was broken too, in an unrelated way.

All this in the first two weeks. This set the stage for the next ten weeks. Some things worked. More didn’t. Ultimately by early February I stopped actively contributing, primarily due to what I perceived as governance failures.

This is what exhausted my patience: Early on, after discussion with and general approval from participants in the weekly API telecon, I submitted a solution for gaps in GPIO configuration that added the features I needed without changing existing behavior. It took weeks and prodding before reviews came in; among several that were positive were a couple that could uncharitably be described as “I want it done this other way instead”. While both approaches could be justified, I wasn’t willing to relax the requirement for full backwards compatibility in the solution I provided, and the other parties weren’t willing to accept my work as a step towards a more significant rework by somebody else in the future. Deadlock. Two months after submission and ongoing back-and-forth it ended up in the Technical Steering Committee which chose to discard both proposals and revisit the issue for the next stable release.

Let me be clear: I don’t have a problem with that as a resolution. I do have a problem that the pull request sat for nine weeks, with ongoing discussion, and had to escalate to the highest decision-making body in the project before anybody could decide what should be done.

Linux has been a success in large part because it had a strong architectural lead who decided the bounds of acceptable technical solutions. Zephyr doesn’t have this: it depends on a contributor-based consensus model where merges require at least one approving review, no requests-for-change, and somebody who has commit privileges being interested enough to perform the merge. There is nobody with both the authority and the responsibility to ensure that architectural and process decisions are made in a timely manner.

There are other, related concerns. Several companies have employees who are paid to work on Zephyr, generally providing new features. The review process and lack of project-level architectural oversight has allowed solutions to be proposed, accepted, and merged entirely based on one perspective. There is little evidence of managed early coordination on core technologies like power management, so people can spend a lot of time working on a solution that meets their needs, before finding out that it’s completely unsuitable for other applications. This is exacerbated by the wide range of target platforms, from multi-core DSP engines and X86_64 processors all the way down to ARM Cortex-M1 devices. The needs of a mains-connected table-top voice-activated personal assistant are vastly different from those of a low-cost battery-powered temperature sensor with minimum five-year expected lifespan bolted into ductwork in the ceiling.

At the end I stepped away, having concluded that key Zephyr capabilities such as GPIO, I2C, SPI, timers, and sensors were functionally incomplete, unpleasant to use, or so abstracted their overhead made them unsuitable for use in ultra-low-power wireless sensors.

All that said: I really want to use Bluetooth mesh.

Zephyr is a “too big to fail” project: it’s got strong backing, and is around for the long haul. It really doesn’t have any viable competitors. I’m also a strong believer that, when there’s an existing solution to the problem you have, you need a rock solid reason why you won’t use it. And “I just don’t like it” isn’t good enough.

So, six weeks later I’m coming back to Zephyr. But the project as it stands still fails to meet my needs with core capabilities such as timers, GPIO, I2C, sensors, introspection of system status, and other functionality necessary for robust low-power sensors. I’ll submit PRs for small things, but I’m not motivated to speculatively contribute the more significant changes, so I’m doing work in my Zephyr fork.

Some of the patches on those branches are worth discussing, so as time permits I may describe them here, and maybe somebody will be interested enough to assist in getting them into Zephyr.