Python 2/3 Compatible Source and PyXB

I started work on PyXB just over five years ago. At the time, Python 3.0 had just come out, but was far too new to hassle with, so I made Python 2.4 the minimum required version.

In September 2011 people started to hint they’d like Python 3 support, but it looked like it’d be an awful lot of work, and nobody asked officially, so I just kept it in the back of my mind. In June 2012 the noise was getting harder to ignore, so I logged the request but didn’t take it further.

Over the next year or so PyXB’s unicode support got stronger, and I started understanding exactly how much easier it’d be to do XML with a proper distinction between text (i.e., unicode) and data (i.e, octet sequences). Python 2 did this poorly, but the difference is deeply embedded in Python 3. In September 2013 I finally created a branch for Python 3 off the 1.2.3 release. This involved running 2to3 over the source then running a second script to fix the resulting errors. This was good enough to make available for folks who could build from the repository, but couldn’t support packaging a version because converting the source was too complex to run on an end-user’s machine.

While investigating an installation problem that ultimately turned out to be a bug in pip I discovered six. Six is a single module, released under the MIT license, that can be integrated into a Python package to allow the same source code to work under both Python 2 and Python 3. No more running 2to3. No more fixing up the mess 2to3 makes when it changes pyxb.utils.unicode to pyxb.utils.str.

As of today, the next branch of PyXB passes all tests using Python version 2.6 up through 3.4.0rc1 without source-code changes. Well, ok, some unit tests fail because whitespace in formatted XML changed in 2.7; the unittest.TestCase.assertRaisescontext manager feature isn’t handled in 2.6, 3.0, or 3.1; and I haven’t tested 3.0.1 because hacking its configure script so it can build a functional hashlib module on Ubuntu 12.04 isn’t worth the effort. Nonetheless, PyXB itself works fine.

There’s more work to be done. A packaged PyXB includes generated bindings for about 186 namespaces. When building from the repository those can be generated with the same Python that’ll be running them, so they might include Unicode literals which aren’t going to work across the gap where Python 3 didn’t support the unicode prefix ( u'text') until version 3.3. But the big hurdle has been overcome, and the next PyXB release should support all Python versions from 2.6 onward.

Editing XML schemas on Emacs with nXML

While looking into DocBook recently, I discovered that GNU emacs finally has a high-quality XML editing mode that includes validation of the documents. nXML mode is integrated into emacs23, and it comes with RELAX NG grammars to support docbook editing, though only for DocBook 4.2.

For work on PyXB, though, I really need something that handles XML Schema Definition (XSD) documents. emacs nXML doesn’t come with XSD support, but the RELAX NG homepage points to Jeni Tennison’s schema as a candidate.

This is a great start, but when I tried using it with xmllint from libxml2 to validate some schemas supported by PyXB it said they were invalid. There are a variety of subtle issues the original version didn’t quite get right (and a few cases where my example schema were wrong). I’ve updated the schema to fix those issues, and made it available on github.

emacs nXML comes with an XSLT RELAX NG schema, but only for version 1.0. As XSLT 3 is nearly complete at the time I’m writing this, I was hoping to find support to validate against other XSLT versions as well. Turns out Norman Walsh has provided a unified solution for XSLT 1.0, 2.0, and 3.0 on github.

So: To support XSD and XSLT editing with nXML in Emacs 23, I put this in my .emacs file:

I copied the original schema from the rng4xsd and xslt-relax-ng repositories, and used Trang to convert from the standard RELAX NG XML syntax to the compact syntax used by nXML. Then the following goes into ~/.emacs.d/nxml-schemas/schemas.xml:

Now when I write my XSD schemas in emacs, the mode line tells me when they’re invalid, and I can use C-c C-n to jump to the error, with the explanation placed in the message line. Very nice.

Ultimate Overloads

Generic algorithms in C++ operate by substituting specific types into templates that use features of the underlying types. Optimized implementations of algorithms can be selected when the type parameters satisfy certain constraints. A standard technique for this is overload selection via tag dispatching, but maybe you want a more abstract solution.

As usual, code examples are available in a github gist.

As shown previously, std::enable_if can be used to select an override if types are assignable. The condition can be arbitrarily complex; taking advantage of decltype and std::declval you could even check whether a particular member function can be called and will return a particular type:

The problem is that you need something to distinguish and prioritize the different overloads, so when multiple candidates are possible the best one is selected. In this case we used int and long. The blog post Remastered enable_if covers the options well, but the final word is in a followup post Beating overload resolution into submission which presents a solution with unbounded extensibility. It’s the one at the bottom of that post, if you don’t want to read the whole thing (though you should).

All the heavy lifting was done in those posts. I want to add a little value because some things about the solution leave me discomforted:

  1. There’s a magic number 10 used to ground the template recursion, based on an assumption that no more than 10 overloads are necessary;
  2. There’s a distinct type that’s used for the “none-of-the-above” situation, which in my view is non-orthogonal;
  3. The example doesn’t work with clang: it compiles without warning, but prints “fizzbuzz” for every N.

Let’s deal with the last one first. Each overload is of the form:

but it’s clear the underlying std::enable_if hidden inside the EnableIf isn’t doing its job.

I reduced this to LLVM bug 18677, which was promptly marked as a duplicate of LLVM bug 11723, showing that the bug has been present for about two years, so it’s clearly not a priority. In fact, it’s mentioned in the fine print of Remastered. The upshot is: using a variadic parameter pack to make signatures distinct without requiring an instance is a neat trick, but if you want to be portable to clang you can’t use it.

Fortunately, the solution in Beating shows us how to get an unbounded number of unique types that form an overload hierarchy without caring whether the template parameter is unique, so all we need do is change the canonical form back to the more traditional default template parameter:

This has the added benefit (IMO) of removing the custom template alias and directly using only constructs for which the meaning is well-defined and in the standard namespace.

The other two concerns are dealt with simply by inverting the priority indicated by the unsigned template parameter: let 0 be the lowest priority, which becomes what you use for the default case. To start the recursion, just keep track of the maximum overload count required for the particular function you need to support. Here’s the whole solution:

A trivial change, but this allows us to put the overload_weight template into a library and use it for multiple functions without having to change it if somebody adds more alternatives than were anticipated.

Diagnostics for Template Meta-Programming in C++

At C++Now 2012 Marshall Clow presented Generic Programming in C++: A Real World Example which addressed the addition of a hex/unhex pair of functions to Boost.Utility. A future post may address why I think the design for this specific feature took a wrong turn right at the start, but as a pedagogical example of intermediate C++ generic programming it’s worth viewing.

The design includes an algorithm which expects a template parameter to provide certain capabilities. The original solution used std::enable_if to disable the definition when those requirements were not met. Around 00:45:00, Stephan T. Lavavej pointed out that disabling unacceptable overloads with std::enable_if produces obscure errors uninterpretable by mortal users because the compiler won’t find a match, and that a cleaner solution is an outer function with a static_assert that invokes an inner function that implements the algorithm. After a very inconvenient interruption, comment from somebody I didn’t recognize at 00:47:40 pointed out that not all compilers terminate template expansion on the static_assert failure, so using this approach you get the static assert diagnostic followed by the no-matching-function diagnostics. The commenter went on to propose a workaround where the inner function takes a bool argument, constructed in the outer function from the std::enable_if calculation, which bypasses the body if the expansion is not valid. Unfortunately the audio is unintelligible and I can’t figure out what technique was being recommended (did he say “ mpl:bool_”, “ template bool”; is the flag a template parameter or a function parameter; …).

All that’s the topic of this post. You can get the full source for the examples at this github gist.

So let’s start with a simple example. Here’s a generic algorithm that assigns one value to another:

Here’s code that invokes it, but with types that don’t satisfy the expectations of the algorithm:

And here’s the noise that GCC 4.9.0 produces in response:

That’s not something I want my users to have to cope with. Sure, it says what the problem is, but there’s a lot of detail that’s just distracting, and it’d be a lot worse with more complex types in a more complex algorithm.

So: Assume we take the original approach from the talk and disable the generic algorithm when the types are not assignable:

What that produces is not really better:

The diagnostic is shorter, and somewhat helpful because the conditional is so simple, but still obscure and indirect.

What STL appeared to propose was to add a static assert which verifies the expectations of the parameter and emits a diagnostic when they aren’t satisfied, then delegates to the original version:

This is the same technique addressed in this blog post. And, just as the anonymous commenter in the video warned, the static assert failure didn’t prevent gcc from going on to produce the non-helpful cascading SFINAE errors:

Not better.

I don’t know what the unrecognized commenter intended as the solution, but my reconstruction is the following: put the static assert in the user-called function, then delegate to a hidden overloaded implementation that provides the working algorithm only when the constraints are met, and provides a stub with no errors when they aren’t:

Walking through, in line 21 we alias template_types_ok to a type that’s equivalent to either std::true_type or std::false_type depending on whether or not the algorithm requirements are satisfied by the type parameters. In line 22 we check the satisfiability at compile-time and provide a user-level description of any failed expectation. Then line 23 we use the type that represents the satisfiability to select an implementation that won’t have compile-time errors. That one of the implementations wouldn’t work at runtime is irrelevant because it’s selected only when the static assert prevents compilation from succeeding.

Here’s what this tells the user:

Now that’s what I want my users to see if they misuse my algorithms: a clear description of what they did wrong so they can fix things.