It's a funny thing when after neglecting your project for a year you get a question on whether it's orphaned and then suddenly you find yourself hacking on it for few days straight… Knowing that your work is needed and appreciated is the greatest motivator!
Originally ijson was a ctypes wrapper around yajl which some time ago reached its next major version introducing incompatible API changes. One possible thing to do was to simply switch ijson to the new yajl 2.x API but I wanted to keep it working on current Ubuntu systems which only ship with yajl 1.x. Instead I refactored the library to have several backends so it can support both versions of yajl. The backend system has also neatly accommodated the experimental pure python parsing that used to live in a separate branch, lost and forgotten.
To use a specific backend you import it explicitly:
import ijson.backends.yajl as ijson ijson.parse(...)
Also you can still just
import ijson which should intelligently find the best backend for the current environment going through "yajl2", "yajl" and "python". This however is not yet implemented, so
import ijson just defaults to yajl 1.x (and fails at it on Ubuntu 12.10 Beta that has yajl2 by default).
Tweaking the old pure python branch into a backend inspired me to run again some performance test that I did a year ago. Since this time I used a larger data sample and a modified test script the results aren't directly comparable to the old ones. I was interested in one thing in particular: how the pure python parser running under PyPy compares to the yajl-based parser running under CPython, the latter being the most obvious setup currently.
A year ago they were on par. Now, running the same old code under the new PyPy 1.9 turns out to be significantly faster:
Then I spent some quality time with the parser:
\u-encoded non-English text
The main result is that the code now is much simpler and faster under PyPy grinding through those 20000 objects in almost half the time as the C library: