The biggest psychological impediment to my work on wireless sensor network frameworks is data. What do I collect? What do I do with it? In short, why do I even want a WSN? (If you can answer that, but want help making it happen, please get in touch through one of the options on my about page.)
Collecting data that’s never looked at is a waste of time, so good tool support for viewing and analyzing time-series data is important. The pieces underlying this are the databases and the graphs.
The traditional database approach is RRDTool. Been around forever, does the job very well.
I’m not in love with its API, which involves providing text representations of the observations through an argc/argv command-line interface even from within C, which on the surface introduces a lot of overhead. But “I just don’t like it” is an inadequate reason to reject a time-proven solution. In fact, a text-based API has certain benefits: I recently got a patch merged to RRD that significantly simplifies specification of archive retention, but the new feature can’t be used in collectd configurations because the parameters are no longer the integer counts that collectd stores and sorts to reformat into RRD arguments.
Where RRDTool falls down is in the display of the data. Well, really, in constructing the specification that’s used to define the graphs. The rrdgraph tool that comes with it actually makes some very powerful graphs, but it’s in desperate need of a GUI to help select among data sets, combine them, change the time bounds, etc. There’s cacti, but it wants to do data collection too, I don’t like kitchen-sink solutions, and it’s not as good as collectd. The days of the fat client are past; there are some web apps like Collectd Graph Panel that make it a bit easier, but not enough to really be nice to use.
Or so I thought until I read this blog and found out about statsd which led me to graphite.
This is what I’m talking about: dashboards with pre-configured graphs showing the information I want to see, automatically updated as the data comes in. A web application to dynamically construct graphs, adding and removing from all available data sets, applying transformations to each source in turn, etc.
Turns out the graphite project provides three layers:
- whisper and ceres are back-end databases for time series data, similar to RRDTool
- carbon is a daemon infrastructure that receives samples over a network connection and dispatches them to whisper or ceres based on a text name, which by convention encodes a hierarchy such as collectd.server1.cpu0.load.
- graphite-web is what generates the graphs; it can read data from whisper, ceres, and RRD databases.
It’s all written in Python and runs as a virtual host under django, so it’s pretty easy to configure on Ubuntu 12.04 or 14.04. Even getting django to run in Apache isn’t nearly as hard as everybody makes it out to be.
Graphite’s capabilities are awesome.
Carbon has some really neat architectural features including automated creation of databases as metrics arrive (with customized retentions based on the metric name), centralized aggregation of metrics, and daemons to perform meta-aggregation and relaying to other servers.
Whisper is…inadequate.
Now whisper exists only because of a couple of issues that the graphite developers had with RRDTool. Paraphrasing the explanation:
- can’t back-fill data that wasn’t available in a timely manner. Here I have some sympathy. This limitation on RRDTool may explain why collectd’s architecture can’t handle plugins that record multiple observations within a sampling interval. Fixing this in RRDTool would be tough. Probably the only viable solution is to integrate the capability into rrdcached, and enhance rrdcached so it acts as a fetch server without first flushing everything to disk.
- not designed for irregular updates. In short, incoming data to RRD gets interpolated to align with the primary data point timestamps. If you don’t get data often enough to do that interpolation, RRDTool can’t do the alignment, and data gets dropped. I’m sympathetic to this issue, though it doesn’t affect my use cases as much as it does, say, StatsD and other system-monitoring applications.
On the other hand, whisper has a few problems of its own:
- Each file stores only one metric, which wastes storage when a sensor provides a multiple metrics (e.g. temperature, humidity, pressure, wind direction and speed) and multiple consolidations (AVERAGE, MAX, MIN over various retentions)
- As a consequence, when aggregating for lower-resolution periods only one consolidation function is allowed (normally “average”). You lose the extreme values (such as daily highs and lows) unless you configure to create separate databases for those. (Carbon does support this if carbon-aggregator is used, but that’s another daemon/point-of-failure.)
- In my case, sensors have a limited ability to store data locally, so if the off-sensor database stops functioning and I’m told about it in time I can restart things and back-fill the missing material. This is exactly what nagios is for, but figuring out when the last update was received by a whisper database requires an O(n) search because the value isn’t stored in the database header, even though a related problem was a motivation for rejecting RRDTool!
The biggest problem with carbon, and the graphite project as a whole, is lack of leadership and active management. Carbon has over two hundred open issues and pull requests, some of which have been apparently been addressed but the issues left open. There was a major rework called “megacarbon” but that’s been dormant for six months. There are incompatible changes being made on the 0.9.x maintenance branch relative to master. whisper is supposed to be superseded by ceres, but requests for information on project status and schedule are left unanswered for months. If you noticed the link I gave above for graphite was to an outdated page: it’s because the new one doesn’t have the FAQ or any of the pieces that tell people why they should even care about the project.
This an unfortunate but common failing with open source projects, where nobody’s compensated for their effort and maintenance naturally drops when the original developer can only respond “it works for me” or “I don’t even use that software anymore”.
Regardless of all that. RRDTool is a robust tool that’s worked for years and continues to be maintained: a dozen enhancement patches I submitted were promptly integrated for the next release. As an open source solution for web access to display real-time data I don’t think graphite-web would survive a serious challenge from an actively-managed alternative, but it works well enough that there’s really no motivation to develop a competitor.
I’ve set up permanently running rrdtool, graphite, and collectd systems on both my stable and development servers. They’re already recording whole-house power consumption at 1Hz from a TED 5000, interior temperature and humidity from a daemon running on a raspberry pi (stored in RRD databases because I care about that data), and collectd statistics from internal hosts (stored in whisper databases because it’s easier to customize the retention periods and I don’t care about that data).
Since I finally have a way to visualize the data, and I have all these wireless microcontrollers and sensors, it’s probably time to start collecting more.