I've finally scraped some time to finish and release a new version of ijson — 1.0.

New stuff:

Parsing improvements

On this I have already posted in details not that long ago. To summarize, ijson now works with both existing versions of YAJL and also has a Python parser which is probably the fastest among pure Python JSON parsers and works faster than YAJL under PyPy.

One interesting thing missing from that post is the solution to testing all those parsing backends using the same set of tests.

Bilinguality

I believe we've already past that point when there's no question that all new Python code should work with Python 3. However since I didn't release any of that for a while, this was actually my first encounter with the problem. Having looked at what other guys are doing (namely, requests and Django) I get that the original idea of a one-time code conversion using 2to3.py doesn't work for libraries, you have to support both versions for some time. And from what little I know, it can be done, basically, in two ways:

After pondering for some time about it I figured that I hate the first approach less. Ijson is not particularly big and it's all temporary anyway. I hope in a year or so when most of the Python code would switch and Ubuntu would have Python 3 by default I'll be able to drop 2.x support (and also use the new yield from in Python 3.3!)

The whole bilinguality patch ended up to be of quite manageable size and consists mostly of bytes/unicode type casting. All the helper functions are neatly collected in a single module compat.py — an approach borrowed from Kenneth Reitz's requests.

The good thing is that the compatibility code doesn't measurably affect performance. However the pure Python parser is a little bit (~ 6%) slower under Python 3 than under Python 2 (which, as I understand, is the usual story).

Comments: 2

  1. kmike.ru

    I think it is possible to remove "b", "s" and "u" wrappers (they are imho ugly and they has an overhead) by dropping Python 2.5 support:

    • add from __future__ import unicode_literals to the top of the file;
    • "b" becomes a b"bytes constans";
    • "s" becomes str("constant") - btw, are they necessary?;
    • "u" is just removed because "" is unicode with unicode_literals furure import.
  2. Ivan Sagalaev

    Indeed! I totally forgot about unicode_literals. I shall try that.

    And you're right about the s() function too. I started this refactoring with tests.py and ran into an issue with key names that had to be bytes in Python 2 and unicode in Python 3, while yajl backend returned bytes only. So I fixed it with s() that converts keys to the appropriate type in tests. This is of course wrong since I fixed the backend itself to return the correct type in each case.

    Thanks a bunch!

Add comment