Dissecting fetchers

This is the first installment of my diaries on refactoring python3-openid. The post is turning out pretty big so may be I should try doing them more often.

Warm-up

I started with fixing failing tests, because you can't do refactoring without tests.

The root cause of errors was somewhere inside pycurl which seemed to refuse to accept custom headers for some reason. Instead of fixing that I decided to drop pycurl altogether in favor of the standard urllib. And luckily, it turned out that the module "fetchers" that does all HTTP work in python3-openid had in fact three separate implementations based on pycurl, urllib and httplib2 for good measure. So fixing this bug was a simple matter of switching to urllib unconditionally.

The next bug was from my favorite category: unicode vs bytes. The fetcher — basically, a wrapper making an HTTP request with some pre- and post-processing — was also trying to decode bytes received from a socket into a string and returning it. This is usually a bad idea because a caller has a better understanding of the nature of the data it requests and is in a better position to decide if and how it should be decoded, or parsed, or stored unmolested. In this particular case an XML parser (rightfully) refused to parse a decoded string with an XML encoding PI (<?xml .. encoding .. ?>) in it.

Removing early decoding from the fetcher has resulted in more broken test that were implicitly relying on the wrong behavior, so the rest of the diff is dedicated to adjusting the system to the new order of plain-bytes-from-source.

Fetchers

After spending some time in the fetchers module it only seemed natural to dissect it further.

First easy step was removing two extra fetcher implementations leaving only the urllib one and then ditching what appeared to be an abstract base class defining the "fetcher interface":

class HTTPFetcher(object):
    """
    This class is the interface for openid HTTP fetchers.  This
    interface is only important if you need to write a new fetcher for
    some reason.
    """

    def fetch(self, url, body=None, headers=None):
        """
        This performs an HTTP POST or GET, following redirects along
        the way. If a body is specified, then the request will be a
        POST. Otherwise, it will be a GET.


        @param headers: HTTP headers to include with the request
        @type headers: {str:str}

        @return: An object representing the server's HTTP response. If
            there are network or protocol errors, an exception will be
            raised. HTTP error responses, like 404 or 500, do not
            cause exceptions.

        @rtype: L{HTTPResponse}

        @raise Exception: Different implementations will raise
            different errors based on the underlying HTTP library.
        """
        raise NotImplementedError

I'm showing it here in full glory because this fragment is very representative of the code. It's a page worth of text that does absolutely nothing. For starters, we don't need to define class interfaces in Python thanks to duck typing. Then, this method's docstring is a lie because the method doesn't actually do anything it says. And also, polluting docstrings with this sort of formal markup is not only useless for documentation it also makes looking through code of the library really hard: you basically never have a logically complete piece of code before your eyes as it tends to be spread over pages and files intermingled with seas of plain text.

Custom exception

Next piece to remove was a custom exception that the fetcher was using instead of propagating exceptions from an underlying library. Masking out exceptions is a well-known anti-pattern because it doesn't make a system safer, it just makes errors less informative. And the implementation in this case is particularly noteworthy:

try:
    # fetch(...)
except (SystemExit, KeyboardInterrupt, MemoryError):
    raise
except Exception as why:
    raise HTTPFetchingError(why=why)

I tries to exclude some common non-HTTP exception from masking, but such a list is a maintenance nightmare and is never complete (what about ValueError, TypeError, RuntimeError?) Also, when trying to preserve the original exception in the general case it nonetheless loses the original traceback.

To be honest when there were three different fetcher implementations this idea might even have been defensible. But even then the implementation should have used a white list of HTTP-related exceptions and it should have preserved the original exception using raise ... from ....

Anyway, replacing the custom exception with the standard urllib.error.URLError lead to:

removal of the whole fetcher wrapper with the sole purpose of this exception masking,
removal of a bunch of tests dedicated to asserting correctness of raised exceptions,
removal of a parameter responsible for choosing a kind of fetcher to create — masking or non-masking.

That last item is especially interesting. Apparently you could create a non-wrapped fetcher. But why would you? Fetchers is a utility intended only for internal use within the library and the library didn't even use that ability. This is a good example of "accidental complexity" — when you write unnecessarily flexible code "just in case" and it ends up creating more complexity elsewhere.

Attempt to remove fetchers

If the only thing that a fetcher does is calling urlopen why do we need this wrapper at all? Indeed, my gut feeling from the beginning was that it might be possible to get rid of the whole module altogether.

It might have been tempting at this point to simply delete the thing, replace all calls to fetchers.fetch with urlopen and then fix all the tests. But the discipline of refactoring insisted on doing small incremental changes, so I obliged.

In a series of boring commits I turned the Urlllib2Fetcher class into a single function fetch() getting rid of the infrastructure supporting a singleton-like behavior for instances of that class.
Adopted a response model of urllib that raises all HTTP errors (status ≥ 400) as exceptions. The fetcher were returning them as regular responses which is not very convenient: you end up with two distinct error handling paths in your code, one for non-successful HTTP responses and the other for non-HTTP exceptions.
Cleaned up fetch() some more.
Killed custom HTTPResponse class in favor of returning the result of urlopen directly. This actually proved to be the most difficult change of the whole refactoring to date. There was a lot of subtle breakage because instead of having a response body as a string attribute callers now had to call .read() on a response and that works only once!

The last change also made me realize that by exposing raw read-able object I'm losing a useful feature of the fetcher: imposing a maximum limit on the amount of bytes read from the socket. Which, if dropped, can actually present a nice DDOS attack vector. I'm now figuring out a least intrusive way to introduce it back.

So ultimately I didn't succeed in killing fetchers module (at least, not yet). Apart from the temporary lost read limiting it also does a couple of other minor things like providing a custom User-agent header and hiding the verbosity of urllib when requesting POSTs.

The real lesson here is that doing refactoring gradually lets you deal with this kind of problems. If I tried to replace the whole thing at once I'd either had lost much functionality or would've been buried under a pile of broken code and had to reset all of it back.

Mocks and test generation

Tests in python3-openid are, to say the least, inconsistent. Applying mock objects, test discovery, data-driven test generation varies not only from module to module but from test to test. It looks like most of the (many) people who contributed test code didn't really tried to refactor it or even adhere to any particular style. Here's one particularly impressive change I made to a piece of testing code:

-            msg = 'Identity URL mismatch: actual = %r, expected = %r' % (
-                result.normalized_uri, expected.normalized_uri)
-            self.assertEqual(
-                expected.normalized_uri, result.normalized_uri, msg)
-
-            msg = 'Content mismatch: actual = %r, expected = %r' % (
-                result.response_text, expected.response_text)
-            self.assertEqual(
-                expected.response_text, result.response_text, msg)
-
-            expected_keys = dir(expected)
-            expected_keys.sort()
-            actual_keys = dir(result)
-            actual_keys.sort()
-            self.assertEqual(actual_keys, expected_keys)
-
-            for k in dir(expected):
-                if k.startswith('__') and k.endswith('__'):
-                    continue
-                exp_v = getattr(expected, k)
-                if isinstance(exp_v, types.MethodType):
-                    continue
-                act_v = getattr(result, k)
-                assert act_v == exp_v, (k, exp_v, act_v)
+            self.assertEqual(result.__dict__, expected.__dict__)

First two 4-line blocks of removed code do nothing more than testing equality of particular attributes of two objects that are later being compared attribute by attribute anyway. And that itself is done in a very elaborate manner.

The good thing is that I finally had a chance to learn about unittest.mock, particularly its versatile "patch" contraption (I can't really call it a "decorator", or a "function", or a "context manager" as it is in fact all of those things). It allowed for some nice code reductions.

I also invented my own class decorator for generating separate test methods for every set of test data. I have a strong suspicion that this problem is also already solved by many people before me and there's a better or more canonical way to do it. Any pointers?

Further plans

Before all else, I plan to keep on simplifying tests as fixing them again and again is the main time sink right now. I hope this will speed up future refactoring of the actual code.

I'm about to drop the examples directory from the library (and its tests) as I don't see any real value in them. They are big elaborate pieces of code that contain logic from many domains and they don't really help in understanding how to use the library. I think they just encourage mindless copy-pasting assisted by "forum driven development". (Also, having the whole of Django as a testing dependency is plain crazy!)

I'm also thinking about splitting the library into separate server and consumer and concentrating on the latter. Consumer is what most people need anyway and not having two different things in one library should result in a simpler API. What do you think?

Comments: 2

yuppy99.ya.ru

Description of refactoring process is rare genre. Impressive work.

About class decorator with set of data. pytest uses different approach to this problem with parametrizing functions: http://pytest.org/latest/parametrize.html#pytest-mark-parametrize-parametrizing-test-functions

I've also left some comments at GitHub.
Ivan Sagalaev

Thanks!

Looking at pytest parametrizing, I'd say it looks about the same with the obvious difference that it doesn't use classes which are a given in unittest. Thanks for the reference anyway.