elementflow: a formal introduction

Last Monday during Moscow web developer conference DevConf I was giving a master class where I presented a Python library for streaming XML generation. Upon receiving a much-hoped feedback, today I'd like to formally present the project and its first release.

Summary

The release 0.1 can be downloaded from PyPI
The code is released under BSD license on Launchpad

The name is a wordplay on "ElementTree" hinting that "elementflow" doesn't create an actual tree.

Some existing XML producing libraries (like ElementTree, lxml) build a whole XML tree in memory and then serialize it. It might be inefficient for moderately large XML payloads (think of a content-oriented Web service producing lots of XML data output). On the other hand Python's built-in xml.sax.saxutils.XMLGenerator is very low-level and requires closing elements by hand.

Also, most XML libraries, to be honest, suck when dealing with namespaces.

Usage

Basic example:

import elementflow
file = open('text.xml', 'w') # can be any  object with .write() method

with elementflow.xml(file, u'root') as xml:
    xml.element(u'item', attrs={u'key': u'value'}, text=u'text')
    with xml.container(u'container', attrs={u'key': u'value'}):
        xml.text(u'text')
        xml.element(u'subelement', text=u'subelement text')

Using with is required to properly close container elements. The library expects unicode strings on input and produces utf-8 encoded output (you may omit those "u"s for pure ASCII strings if you want to, Python will convert them automatically).

For further instructions on using namespaces, formatting output and turning generating code into an iterator refer to README.

Performance

All tests were performed using the following XML:

<contacts>
  <person id="0">
    <name>John &amp; Smith</name>
    <email>john.smith@megacorp.com</email>
    <phones>
      <phone type="work">123456</phone>
      <phone type="home">123456</phone>
    </phones>
  </person>

  <!-- repeat <person> .. </person> a couple of thousands times -->

</contacts>

The records were generated in a dumb loop so there was no time lost on data fetching. The times are purely of a generation the result and sending it onto /dev/null. Hardware specifications of the testing machine don't matter, it's a fairly average desktop box.

Here are the results of a single test producing 40000 records:

ElementTree	10.7 secs
cElementTree	6.2 secs
lxml.etree	0.9 secs
elementflow	2.3 secs

I am really puzzled about poor performance of ElementTree. I couldn't figure out why it is that slow, even the C version. High speed of lxml was, on the contrary, quite expected. The test itself however is not very useful because I was much more interested in exploring the behavior of a Web service generating XML for several concurrent connections.

During the master class I did two different tests using Tornado and Django in which I compared two models of generating XML:

generate entire XML in memory, then send it in a single piece
send data in small chunks as they are being generated

In both cases generation was done with elementflow. Web service was load-tested with "siege" using a mix of two kinds of requests: 80% of small and fast ("Hello, World!") and 20% of requests producing big XML. Tornado service was tested using 10 concurrent requests and Django service had 20.

The code of both test servers is available in Bazaar history.

Tornado

In-memory generation has immediately revealed this nasty behavior:

HTTP/1.1 200  16.18 secs: 3548949 bytes ==> /memory?count=20000
HTTP/1.1 200  16.21 secs: 3548949 bytes ==> /memory?count=20000
HTTP/1.1 200  10.34 secs:      13 bytes ==> /
HTTP/1.1 200  16.25 secs: 3548949 bytes ==> /memory?count=20000
HTTP/1.1 200   9.32 secs:      13 bytes ==> /
HTTP/1.1 200   8.64 secs:      13 bytes ==> /
HTTP/1.1 200   9.42 secs:      13 bytes ==> /
HTTP/1.1 200   0.11 secs:      13 bytes ==> /
HTTP/1.1 200   0.18 secs:      13 bytes ==> /
HTTP/1.1 200   0.00 secs:      13 bytes ==> /

Those "hello-world" responses that are requested alongside heavy XML responses are forced to wait for them which results in very long response time for them. Under usual conditions they are returned almost instantly.

Streaming generation removes the effect altogether:

HTTP/1.1 200   3.69 secs: 1086426 bytes ==> /stream?count=20000&bufsize=4096
HTTP/1.1 200   3.69 secs: 1086426 bytes ==> /stream?count=20000&bufsize=4096
HTTP/1.1 200   0.02 secs:      13 bytes ==> /
HTTP/1.1 200   3.33 secs: 1086426 bytes ==> /stream?count=20000&bufsize=4096
HTTP/1.1 200   3.33 secs: 1086426 bytes ==> /stream?count=20000&bufsize=4096
HTTP/1.1 200   3.33 secs: 1086426 bytes ==> /stream?count=20000&bufsize=4096
HTTP/1.1 200   3.33 secs: 1086426 bytes ==> /stream?count=20000&bufsize=4096
HTTP/1.1 200   0.02 secs:      13 bytes ==> /
HTTP/1.1 200   0.02 secs:      13 bytes ==> /

Also XML responses are generated faster.

Django

Django shouldn't suffer from the nasty effects showed by a single-threaded Tornado server because its processes don't interfere with each other. It introduces another problem: in-memory generation creates too many forked processes which greatly increase machine's LA and make it work significantly slower. This was apparent even when load was not as high to consume all the available memory.

Here's the report for in-memory generation:

HTTP/1.1 200   9.05 secs:    5620 bytes ==> /memory?count=20000
HTTP/1.1 200   9.39 secs:    5620 bytes ==> /memory?count=20000
HTTP/1.1 200   0.00 secs:      13 bytes ==> /
HTTP/1.1 200   0.00 secs:      13 bytes ==> /
HTTP/1.1 200   0.29 secs:      13 bytes ==> /
HTTP/1.1 200   0.00 secs:      13 bytes ==> /
HTTP/1.1 200  10.51 secs:    5620 bytes ==> /memory?count=20000
HTTP/1.1 200   0.00 secs:      13 bytes ==> /
HTTP/1.1 200   0.01 secs:      13 bytes ==> /

LA: 5.9; average speed: 6.81 rps

Streaming generation looks much better:

HTTP/1.1 200   3.83 secs: 1086426 bytes ==> /stream?count=20000&bufsize=4096
HTTP/1.1 200   3.84 secs: 1086426 bytes ==> /stream?count=20000&bufsize=4096
HTTP/1.1 200   3.86 secs: 1086426 bytes ==> /stream?count=20000&bufsize=4096
HTTP/1.1 200   3.86 secs: 1086426 bytes ==> /stream?count=20000&bufsize=4096
HTTP/1.1 200   3.98 secs: 1086426 bytes ==> /stream?count=20000&bufsize=4096
HTTP/1.1 200   0.48 secs:      13 bytes ==> /
HTTP/1.1 200   0.00 secs:      13 bytes ==> /
HTTP/1.1 200   0.00 secs:      13 bytes ==> /
HTTP/1.1 200   0.00 secs:      13 bytes ==> /

LA: 2.7; average speed: 13.18 rps

Conclusion

While not as fast as the notorious lxml I believe this library is quite useful in certain situations. In the nearest future I plan to make a branch for a Python 3 version. And I sure hope to get more feedback and contributions.