Маниакальный веблог » Рефакторинг python-openidhttps://softwaremaniacs.org/blog/category/openid-refactor/2014-09-26T21:53:59.143000-07:00ManiacИван Сагалаев о программировании и веб-разработкеhttp://softwaremaniacs.org/media/sm_org/style/photo.jpgRefactoring discovery protocol
2014-09-26T21:53:59.143000-07:00https://softwaremaniacs.org/blog/2014/09/26/refactoring-discovery-protocol/It's been a while since my last update on the python3-openid refactoring. Though I still work on it pretty actively, I totally failed at documenting the process as I planned in the beginning. So I came up with a new plan. New plan First of all, I admit to gravely ...
<p>It's been a while since my last update on the <a href="http://softwaremaniacs.org/blog/category/openid-refactor/en/">python3-openid refactoring</a>. Though I still work on it pretty actively, I totally failed at documenting the process as I planned in the beginning. So I came up with a new plan.</p>
<p><a name=more></a></p>
<h2>New plan</h2>
<p>First of all, I admit to gravely underestimating the sheer size of the task. The library is <strong>huge</strong>. And not only because of the enterprise-grade complexity painstakingly cultivated in the code but also because it contains a lot of things that have little to do with the actual OpenID protocol. Since I don't want to do it forever I decided to limit it in two ways:</p>
<ul>
<li>
<p>I plan to stop active refactoring work on November 1st and publish whatever I have at the moment with the new name. It should be usable (as it is from day one, modulo small bugs) and immensely easier to modify.</p>
</li>
<li>
<p>I want to get rid of some functionality. The whole <em>server</em> part is definitely going to be cut, so it'll be just a consumer, which is what everyone should need anyway. Probably something else too, I'm not sure yet.</p>
</li>
</ul>
<p>As for this little "diary", the library proved to be bad material for a refactoring tutorial: instead of providing a few good real-world examples it mostly repeats all the same mistakes over and over (and over) again, and it doesn't make sense to write about them in details. So I'll probably just write some summary at the end.</p>
<h2>Progress so far</h2>
<p>All this time I was working on OpenID discovery: the part that takes a URL from the user and figures out what to do with it. It was a very simple affair in the version 1 of the protocol: just parse two HTML <code><link></code>s with hard-coded <code>rel</code> attributes and you're done. In OpenID 2 this functionality was extended, partly legitimately, but at some point all hell broke loose: the discovery alone has sprouted not one but two(!) separate specifications — <a href="http://openid.net/specs/yadis-v1.0.pdf">Yadis</a> and <a href="http://docs.oasis-open.org/xri/2.0/specs/xri-resolution-V2.0.html">XRI</a> — both written with thoroughbred Enterprise™ spirit. Accordingly, the discovery mechanism in python-openid consisted of three generalized sub-libraries implementing everything about those specs.</p>
<p>Here's a <em>high-level overview</em> of the discovery process on an HTTP URL in the original code, as best as I remember it:</p>
<ol>
<li>
<p>The URL content is fetched with <code>yadis.discover.discover</code> that does one or two HTTP requests depending on certain HTTP headers and wraps it into a result object along with some metadata (that is never used). The data is presumably an XRDS document describing OpenID services on that URL.</p>
</li>
<li>
<p>Data is handed over to a constructor method of a class representing a discovered service. I'll call it a "service class" from now on, though it is called <code>OpenIDServiceEndpoint</code> in the library.</p>
</li>
<li>
<p>The constructor is a one-liner calling a module-level function <code>extractServices</code> which is actually an import alias for <code>yadis.services.applyFilter</code> (don't ask why "extracting services" is the same thing as "applying a filter").</p>
<p>The class passes itself as the "filter" parameter to the function.</p>
</li>
<li>
<p>The "filter" parameter is not really a filter but something that can be turned into a filter. This something can be: </p>
<ul>
<li>a callable, </li>
<li>a class with a certain method,</li>
<li>another already constructed filter, or</li>
<li>a list of any of those things.</li>
</ul>
<p>With the help of quite a few <code>hasattr</code>, <code>isinstance</code> and the <a href="http://en.wikipedia.org/wiki/Composite_pattern">Composite design pattern</a> all of this is turned into a proper filter — another class with its own interface consisting of a single method.</p>
<p>Here's the deal though: of all the various argument types only one ever gets passed to the function in practice: a service class from step 3. So it effectively simply gets wrapped into a different kind of class-with-a-method.</p>
</li>
<li>
<p>The filtering function then parses the XRDS document (finally!) with <code>parseXRDS</code> from another module that returns a list of XML service elements that it passes back to the filter (the wrapper from step 4).</p>
</li>
<li>
<p>The filtering wrapper doesn't do any filtering though. First, it expands each XML service element into <em>serveral</em> objects of yet another type that are destined to finally become discovered services. This expansion happens because a service element may contain several service URIs. Here's the funny part: the Yadis spec doesn't require a client to treat it this way, you can assume that a service element represents one service and use any of the URIs you like. But who would miss the opportunity to write more code, right?</p>
</li>
<li>
<p>Okay, the service proto-objects are finally passed back into the <code>discover</code> module where the service class actually constructs itself from them, leaving out those whose type it doesn't recognize. Now <em>this</em> is what "filtering" actually is: ignoring service elements with non-OpenID types.</p>
</li>
</ol>
<p>The main hindrance in grasping all this for me was that the code relies heavily on <a href="http://en.wikipedia.org/wiki/Inversion_of_control">inversions of control</a>: function accept callbacks disguised as "interfaces" and to track a single call stack you have to jump back and forth between a handful of modules, it's far from being unidirectional.</p>
<p>Anyway, I killed all of it.</p>
<p>The <a href="https://github.com/isagalaev/python3-openid/blob/bfd260923e9c9ead166f2eecf3f41dcc5e9e1e4a/openid/yadis/__init__.py"><code>yadis</code> module</a> now only does HTTP requests taking care of HTTP headers. The <a href="https://github.com/isagalaev/python3-openid/blob/bfd260923e9c9ead166f2eecf3f41dcc5e9e1e4a/openid/xrds.py"><code>xrds</code> module</a> only parses XRDS, dealing with the ugliness of working with namespaces in ElementTree and providing basic service like sorting entries by prioritiy and filtering.</p>
<p>The discovery process now looks like this (I omit XRI identifiers and HTML fallback):</p>
<ol>
<li>The URL content is fetched with <code>yadis.fetch_data</code>.</li>
<li>Data is parsed with <code>xrds.get_services</code> that returns a list of service elements corresponding to types passed in an argument.</li>
<li>XML elements are mapped into Service objects by <code>discover.parse_services</code>.</li>
</ol>
<p>No callbacks or meta-programming is involved at any point.</p>
<h2>What's next</h2>
<p>After having dealt with the discovery process I have now started working on <em>the</em> main thing: the <a href="https://github.com/isagalaev/python3-openid/blob/bfd260923e9c9ead166f2eecf3f41dcc5e9e1e4a/openid/consumer/consumer.py"><code>consumer</code></a>. It is also big, unwieldy, carpet-tested, blah-blah... nothing new. Let's see if I can manage to crack it by next Thursday.Carpet testing
2014-08-07T21:23:24.078000-07:00https://softwaremaniacs.org/blog/2014/08/07/carpet-testing/Carpet bombing is a "large aerial bombing done in a progressive manner to inflict damage in every part of a selected area of land." Similarly, carpet testing is done by progressively tossing random data samples at your code without regard for its internal structure, hoping that sufficient amount of data ...
<p><a href="http://en.wikipedia.org/wiki/Carpet_bombing">Carpet bombing</a> is a "large aerial bombing done in a progressive manner to inflict damage in every part of a selected area of land." Similarly, carpet <em>testing</em> is done by progressively tossing random data samples at your code without regard for its internal structure, hoping that sufficient amount of data will eventually cover it all.</p>
<p>Or at least, this is the analogy that kept floating up in my mind while I was refactoring discovery testing in python-openid. This post is a part of <a href="http://softwaremaniacs.org/blog/category/openid-refactor/">the series</a>.</p>
<p><a name=more></a></p>
<p>Examples here are rather long to better show the point I want to make. But at the same time they aren't complicated and aren't supposed to be followed line by line anyway.</p>
<h2>Simple example</h2>
<pre><code>class TestIsOPIdentifier(unittest.TestCase):
def setUp(self):
self.endpoint = discover.OpenIDServiceEndpoint()
def test_none(self):
self.assertFalse(self.endpoint.isOPIdentifier())
def test_openid1_0(self):
self.endpoint.type_uris = [discover.OPENID_1_0_TYPE]
self.assertFalse(self.endpoint.isOPIdentifier())
def test_openid1_1(self):
self.endpoint.type_uris = [discover.OPENID_1_1_TYPE]
self.assertFalse(self.endpoint.isOPIdentifier())
def test_openid2(self):
self.endpoint.type_uris = [discover.OPENID_2_0_TYPE]
self.assertFalse(self.endpoint.isOPIdentifier())
def test_openid2OP(self):
self.endpoint.type_uris = [discover.OPENID_IDP_2_0_TYPE]
self.assertTrue(self.endpoint.isOPIdentifier())
def test_multipleMissing(self):
self.endpoint.type_uris = [discover.OPENID_2_0_TYPE,
discover.OPENID_1_0_TYPE]
self.assertFalse(self.endpoint.isOPIdentifier())
def test_multiplePresent(self):
self.endpoint.type_uris = [discover.OPENID_2_0_TYPE,
discover.OPENID_1_0_TYPE,
discover.OPENID_IDP_2_0_TYPE]
self.assertTrue(self.endpoint.isOPIdentifier())
</code></pre>
<p>This whole test case is dedicated to this one method:</p>
<pre><code>def isOPIdentifier(self):
return OPENID_IDP_2_0_TYPE in self.type_uris
</code></pre>
<p>That's right. Seven tests to test an <code>in</code> operation on a list. This might be justifiable for a "black box" testing of unknown code. But here it shouldn't be more complicated than testing just two states in a single test.</p>
<h2>Generated example</h2>
<p>This code is more OpenID-specific but the problem is still the same. As soon as you start writing too much of not too different code you want to automate it and naturally come to <em>generated tests</em>:</p>
<pre><code>@gentests
class Discover(unittest.TestCase):
data = [
("equiv", (True, "equiv", "equiv" , "xrds")),
("header", (True, "header", "header" , "xrds")),
("lowercase_header", (True, "lowercase_header", "lowercase_header" , "xrds")),
("xrds", (True, "xrds", "xrds" , "xrds")),
("xrds_ctparam", (True, "xrds_ctparam", "xrds_ctparam" , "xrds_ctparam")),
("xrds_ctcase", (True, "xrds_ctcase", "xrds_ctcase" , "xrds_ctcase")),
("xrds_html", (False, "xrds_html", "xrds_html" , "xrds_html")),
("redir_equiv", (True, "redir_equiv", "equiv" , "xrds")),
("redir_header", (True, "redir_header", "header" , "xrds")),
("redir_xrds", (True, "redir_xrds", "xrds" , "xrds")),
("redir_xrds_html", (False, "redir_xrds_html", "xrds_html" , "xrds_html")),
("redir_redir_equiv", (True, "redir_redir_equiv", "equiv" , "xrds")),
("404_server_response", (False, "404_server_response", None , None)),
("404_with_header", (False, "404_with_header", None , None)),
("404_with_meta", (False, "404_with_meta", None , None)),
("500_server_response", (False, "500_server_response", None , None)),
]
@mock.patch('openid.fetchers.fetch', fetch)
def _test(self, success, input_name, id_name, result_name):
input_url, expected = discoverdata.generateResult(
BASE_URL,
input_name,
id_name,
result_name,
success,
)
if expected is None:
self.assertRaises(urllib.error.HTTPError, discover, input_url)
else:
result = discover(input_url)
self.assertEqual(input_url, result.request_uri)
self.assertEqual(result.__dict__, expected.__dict__)
</code></pre>
<p>Actual test methods are generated from <code>self.data</code>, each calling <code>self._test</code> with provided arguments. The test function compares a generated expected result with a result returned from a real <code>discover</code> function which is being tested.</p>
<p>It seems reasonable at first, but there's quite a few things wrong with it that <strong>nobody can see</strong>:</p>
<ul>
<li>
<p>For starters, there are two distinct code paths — for successes and failures — that share nothing in common. Yes, even the seemingly "common" call to <code>generateResult</code> actually <a href="https://github.com/isagalaev/python3-openid/blob/ae2a56ec2238d3b6de05124019a9bec80bf31d36/openid/test/discoverdata.py#L110">has its own <code>if</code></a> that returns completely different result for failures.</p>
</li>
<li>
<p>Though these tests are grouped together, they in fact test very different parts of OpenID discovery process: request headers, response headers, HTML <code><meta></code> overrides, etc. But instead of being asserted directly those are implied by the success of the overall discovery process. Which makes debugging more complicated: instead of saying: "you don't recognize this content type", your test will say: "uhm... the discovery is broken somewhere".</p>
</li>
<li>
<p>A few of those tests are completely useless — guess which ones? Redirects. Redirects are never even exposed to the client code, they're handled by the HTTP library. So what we're testing here is our own HTTP mock, not the discovery process.</p>
</li>
<li>
<p>Some of those tests hide actual bugs. For example tests 3, 4 and 5 should test, in theory, that we recognize an XRDS-formatted response even with slight variations in its Content-type header. But they don't even call the method <code>isXRDS()</code> responsible for this, they implicitly compare the <code>content_type</code> attributs of the expected and the actual results that — and here's the best part — <em>are being generated from the same test sample</em>!</p>
</li>
</ul>
<p>This last bit is, by the way, makes <em>all</em> of those tests rather useless. Probably out of desire to reuse more code both the <code>generateResult()</code> and the mock <code>fetch()</code> ultimately read the same data file.</p>
<h2>Killer example</h2>
<p>I won't even paste it here as it is too big.</p>
<p>It's a hierarchy of a base class and three descendants accompanied by two mock fetchers. Test methods assert many different things but use a single complicated generalized test function from the base. You probably want to have a look at this test function <a href="https://github.com/isagalaev/python3-openid/blob/ae2a56ec2238d3b6de05124019a9bec80bf31d36/openid/test/test_discover.py#L78">_checkService()</a> and at <em>The Boss</em> itself, the class called <a href="https://github.com/isagalaev/python3-openid/blob/ae2a56ec2238d3b6de05124019a9bec80bf31d36/openid/test/test_discover.py#L151">TestDiscovery</a>.</p>
<p>All tests work approximately the same way:</p>
<ul>
<li>call <code>discover</code> (with additional checks)</li>
<li>feed the result to <code>self._checkService</code> with a whole lot of control values that tell it which code path to take</li>
</ul>
<p>In addition to all the problems that I already mentioned it has another rather obvious one: it's <strong>big</strong>. Apparently, because of the many subtle differences in individual tests the generalized testing code was becoming too complicated and made it impossible to converted this test case into a generated one. (This is just my hypothesis.)</p>
<p class=strong><strong>Don't do carpet testing. It doesn't make tests "more complete" but makes it harder to reason about them, leading to hidden bugs.</strong></p>
<h2>Refactoring it</h2>
<p>BOOOOORING!!!! It took me several days and a lot of patience :-).</p>
<p>Deconstruction of such a thing may seem an insurmountable task at first. And I made a few false starts trying to do too much too soon.</p>
<p>My general approach is to start reading through the file looking for obvious, easy to fix code smell, like <a href="https://github.com/isagalaev/python3-openid/commit/7f13874778a3d01ece26939179dc8d4369848cf2">moving a repeated function call into one place</a> or <a href="https://github.com/isagalaev/python3-openid/commit/641937d5bb8647a4ba7d8b44cc109f2e8a41ce84">killing a class attribute that does the job of a local variable</a>. Dealing with those removes some code but more importantly gives you a better understanding of its scope and intentions.</p>
<p>Then you start noticing corner cases that differ from the general shape the most, <a href="https://github.com/isagalaev/python3-openid/commit/49eb05b98b9e013eb45a2101f2bdeee082cb59a3">remove</a> <a href="https://github.com/isagalaev/python3-openid/commit/3300a33e1b2f29ba54e30dfc574abe998588d05c">their</a> <a href="https://github.com/isagalaev/python3-openid/commit/35095a52c9a57ba342572cf341b4859d8e27c0ed">dependency</a> on the common parts and then <a href="https://github.com/isagalaev/python3-openid/commit/a2e26612a800be992b5486cc87fd6484dedc7eab">extract them</a>. This groups uniform bits of code together and removes code dealing with the differences.</p>
<p>And then at some magical point you suddenly realize that all similar looking code is completely identical and then you just remove all the repetitions <a href="https://github.com/isagalaev/python3-openid/commit/c3dd27f3cf9e26855acae3f3c520579073427a1a">in one big swoop</a>.</p>
<p>And never, ever try to "just rewrite" the <a href="https://github.com/isagalaev/python3-openid/compare/isagalaev:7f13874...204a764b">entire thing</a> from scratch!</p>
<h2>urlopen mock</h2>
<p>One thing that allowed me to kill a lot of custom mocking code is the <a href="https://github.com/isagalaev/python3-openid/blob/8e086303de55a2c7267833bdc0ee86b68a8a3c1e/openid/test/support.py#L129">generalized mock for <code>urlopen()</code></a> that I now use in all the tests. It can serve regular files from the designated test directory with the correct <code>Content-type</code> determined from extensions. You can also use query parameters to ask it for a specific status code or a response header:</p>
<pre><code>query = {'status': 400, 'header': 'X-XRDS-Location: http://...'}
url = 'http://unittest/test-sample.html?' + urlencode(query)
</code></pre>
<p>I wonder if someone has already done that before?Dissecting fetchers
2014-08-07T21:00:46.270000-07:00https://softwaremaniacs.org/blog/2014/07/20/dissecting-fetchers/This is the first installment of my diaries on refactoring python3-openid. The post is turning out pretty big so may be I should try doing them more often. Warm-up I started with fixing failing tests, because you can't do refactoring without tests. The root cause of errors was somewhere inside ...
<p>This is the first installment of my diaries on <a href="http://softwaremaniacs.org/blog/2014/07/15/python3-openid-fork/">refactoring python3-openid</a>. The post is turning out pretty big so may be I should try doing them more often.</p>
<p><a name=more></a></p>
<h2>Warm-up</h2>
<p>I started with <a href="https://github.com/isagalaev/python3-openid/commit/020c1e137253a8a9b4dde947af3a635e2e5e16f5">fixing failing tests</a>, because you can't do refactoring without tests.</p>
<p>The root cause of errors was somewhere inside pycurl which seemed to refuse to accept custom headers for some reason. Instead of fixing that I decided to drop pycurl altogether in favor of the standard urllib. And luckily, it turned out that the module "fetchers" that does all HTTP work in python3-openid had in fact three separate implementations based on pycurl, urllib and httplib2 for good measure. So fixing this bug was a simple matter of switching to urllib unconditionally.</p>
<p>The next bug was from my favorite category: unicode vs bytes. The fetcher — basically, a wrapper making an HTTP request with some pre- and post-processing — was also trying to decode bytes received from a socket into a string and returning it. This is usually a bad idea because a caller has a better understanding of the nature of the data it requests and is in a better position to decide if and how it should be decoded, or parsed, or stored unmolested. In this particular case an XML parser (rightfully) refused to parse a decoded string with an XML encoding PI (<code><?xml .. encoding .. ?></code>) in it.</p>
<p>Removing early decoding from the fetcher has resulted in more broken test that were implicitly relying on the wrong behavior, so the rest of the diff is dedicated to adjusting the system to the new order of plain-bytes-from-source.</p>
<h2>Fetchers</h2>
<p>After spending some time in the fetchers module it only seemed natural to dissect it further.</p>
<p>First easy step was <a href="https://github.com/isagalaev/python3-openid/commit/802bc92d9050809a52deae98f1e841f660823caa]">removing two extra fetcher implementations</a> leaving only the urllib one and then <a href="https://github.com/isagalaev/python3-openid/commit/73ed24dddaac7bd800400c0dbae22a3a32a5f9e9">ditching</a> what appeared to be an abstract base class defining the "fetcher interface":</p>
<pre><code>class HTTPFetcher(object):
"""
This class is the interface for openid HTTP fetchers. This
interface is only important if you need to write a new fetcher for
some reason.
"""
def fetch(self, url, body=None, headers=None):
"""
This performs an HTTP POST or GET, following redirects along
the way. If a body is specified, then the request will be a
POST. Otherwise, it will be a GET.
@param headers: HTTP headers to include with the request
@type headers: {str:str}
@return: An object representing the server's HTTP response. If
there are network or protocol errors, an exception will be
raised. HTTP error responses, like 404 or 500, do not
cause exceptions.
@rtype: L{HTTPResponse}
@raise Exception: Different implementations will raise
different errors based on the underlying HTTP library.
"""
raise NotImplementedError
</code></pre>
<p>I'm showing it here in full glory because this fragment is very representative of the code. It's a page worth of text that does <strong>absolutely nothing</strong>. For starters, we don't need to define class interfaces in Python thanks to duck typing. Then, this method's docstring is a lie because the method doesn't actually do anything it says. And also, polluting docstrings with this sort of formal markup is not only useless for documentation it also makes looking through code of the library really hard: you basically never have a logically complete piece of code before your eyes as it tends to be spread over pages and files intermingled with seas of plain text.</p>
<h2>Custom exception</h2>
<p>Next piece to remove was <a href="https://github.com/isagalaev/python3-openid/commit/8fb46f3c492febe50df759d9c5169b8a5a86f97e">a custom exception</a> that the fetcher was using instead of propagating exceptions from an underlying library. Masking out exceptions is a well-known anti-pattern because it doesn't make a system safer, it just makes errors less informative. And the implementation in this case is particularly noteworthy:</p>
<pre><code>try:
# fetch(...)
except (SystemExit, KeyboardInterrupt, MemoryError):
raise
except Exception as why:
raise HTTPFetchingError(why=why)
</code></pre>
<p>I tries to exclude some common non-HTTP exception from masking, but such a list is a maintenance nightmare and is never complete (what about ValueError, TypeError, RuntimeError?) Also, when trying to preserve the original exception in the general case it nonetheless loses the original traceback.</p>
<p>To be honest when there were three different fetcher implementations this idea might even have been defensible. But even then the implementation should have used a white list of HTTP-related exceptions and it should have preserved the original exception using <a href="http://legacy.python.org/dev/peps/pep-3134/"><code>raise ... from ...</code></a>.</p>
<p>Anyway, replacing the custom exception with the standard <code>urllib.error.URLError</code> lead to:</p>
<ul>
<li>removal of the whole fetcher wrapper with the sole purpose of this exception masking,</li>
<li>removal of a bunch of tests dedicated to asserting correctness of raised exceptions,</li>
<li>removal of a parameter responsible for choosing a kind of fetcher to create — masking or non-masking.</li>
</ul>
<p>That last item is especially interesting. Apparently you <em>could</em> create a non-wrapped fetcher. But why would you? Fetchers is a utility intended only for internal use within the library and the library didn't even use that ability. This is a good example of "accidental complexity" — when you write unnecessarily flexible code "just in case" and it ends up creating <em>more</em> complexity elsewhere.</p>
<h2>Attempt to remove fetchers</h2>
<p>If the only thing that a fetcher does is calling urlopen why do we need this wrapper at all? Indeed, my gut feeling from the beginning was that it might be possible to get rid of the whole module altogether.</p>
<p>It might have been tempting at this point to simply delete the thing, replace all calls to fetchers.fetch with urlopen and then fix all the tests. But the discipline of refactoring insisted on doing small incremental changes, so I obliged.</p>
<ul>
<li>
<p>In a <a href="https://github.com/isagalaev/python3-openid/compare/isagalaev:4d6ef88...7a9e4b">series of boring commits</a> I turned the Urlllib2Fetcher class into a single function fetch() getting rid of the infrastructure supporting a singleton-like behavior for instances of that class.</p>
</li>
<li>
<p><a href="https://github.com/isagalaev/python3-openid/commit/e847c09106e1e36e01f71dd871c55e07ac536341">Adopted a response model of urllib</a> that raises all HTTP errors (status ≥ 400) as exceptions. The fetcher were returning them as regular responses which is not very convenient: you end up with two distinct error handling paths in your code, one for non-successful HTTP responses and the other for non-HTTP exceptions. </p>
</li>
<li>
<p>Cleaned up fetch() <a href="https://github.com/isagalaev/python3-openid/commit/59c63d4083fb69a416e03c682a54b4b7418e0376">some</a> <a href="https://github.com/isagalaev/python3-openid/commit/9fb1ad348c14a84398747400a38b19227c87a0d1">more</a>.</p>
</li>
<li>
<p><a href="https://github.com/isagalaev/python3-openid/commit/3e9e5744626f8c90eb4d15be241b2eb2640ce07a">Killed custom HTTPResponse</a> class in favor of returning the result of urlopen directly. This actually proved to be the most difficult change of the whole refactoring to date. There was a lot of subtle breakage because instead of having a response body as a string attribute callers now had to call <code>.read()</code> on a response and that works <em>only once</em>!</p>
</li>
</ul>
<p>The last change also made me realize that by exposing raw read-able object I'm losing a useful feature of the fetcher: imposing a maximum limit on the amount of bytes read from the socket. Which, if dropped, can actually present a nice DDOS attack vector. I'm now figuring out a least intrusive way to introduce it back.</p>
<p>So ultimately I didn't succeed in killing fetchers module (at least, not yet). Apart from the temporary lost read limiting it also does a couple of other minor things like providing a custom User-agent header and hiding the verbosity of urllib when requesting POSTs.</p>
<p>The real lesson here is that doing refactoring gradually lets you deal with this kind of problems. If I tried to replace the whole thing at once I'd either had lost much functionality or would've been buried under a pile of broken code and had to reset all of it back.</p>
<h2>Mocks and test generation</h2>
<p>Tests in python3-openid are, to say the least, inconsistent. Applying mock objects, test discovery, data-driven test generation varies not only from module to module but from test to test. It looks like most of the (many) people who contributed test code didn't really tried to refactor it or even adhere to any particular style. Here's one particularly impressive <a href="https://github.com/isagalaev/python3-openid/commit/d28a7f9c456062764af6a9e5381707b7a5093975">change</a> I made to a piece of testing code:</p>
<pre><code>- msg = 'Identity URL mismatch: actual = %r, expected = %r' % (
- result.normalized_uri, expected.normalized_uri)
- self.assertEqual(
- expected.normalized_uri, result.normalized_uri, msg)
-
- msg = 'Content mismatch: actual = %r, expected = %r' % (
- result.response_text, expected.response_text)
- self.assertEqual(
- expected.response_text, result.response_text, msg)
-
- expected_keys = dir(expected)
- expected_keys.sort()
- actual_keys = dir(result)
- actual_keys.sort()
- self.assertEqual(actual_keys, expected_keys)
-
- for k in dir(expected):
- if k.startswith('__') and k.endswith('__'):
- continue
- exp_v = getattr(expected, k)
- if isinstance(exp_v, types.MethodType):
- continue
- act_v = getattr(result, k)
- assert act_v == exp_v, (k, exp_v, act_v)
+ self.assertEqual(result.__dict__, expected.__dict__)
</code></pre>
<p>First two 4-line blocks of removed code do nothing more than testing equality of particular attributes of two objects that are later being compared attribute by attribute anyway. And that itself is done in a very elaborate manner.</p>
<p>The good thing is that I finally had a chance to learn about <a href="https://docs.python.org/dev/library/unittest.mock-examples.html">unittest.mock</a>, particularly its versatile "patch" contraption (I can't really call it a "decorator", or a "function", or a "context manager" as it is in fact <em>all</em> of those things). It allowed for <a href="https://github.com/isagalaev/python3-openid/commit/9b55915fb2ac3858ebc17f287fabdf639c43a154">some</a> <a href="https://github.com/isagalaev/python3-openid/commit/b1fed82a90b29aaa11d5008db2fd0b5865e4ed08">nice</a> code reductions.</p>
<p>I also invented my own <a href="https://github.com/isagalaev/python3-openid/blob/4f27c35f052bcb30258df8e86641551dd506c341/openid/test/support.py#L90">class decorator</a> for generating separate test methods for every set of test data. I have a strong suspicion that this problem is also already solved by many people before me and there's a better or more canonical way to do it. Any pointers?</p>
<h2>Further plans</h2>
<p>Before all else, I plan to keep on simplifying tests as fixing them again and again is the main time sink right now. I hope this will speed up future refactoring of the actual code.</p>
<p>I'm about to drop the <a href="https://github.com/isagalaev/python3-openid/tree/master/examples">examples</a> directory from the library (and its tests) as I don't see any real value in them. They are big elaborate pieces of code that contain logic from many domains and they don't really help in understanding how to use the library. I think they just encourage mindless copy-pasting assisted by "forum driven development". (Also, having the whole of Django as a testing dependency is plain crazy!)</p>
<p>I'm also thinking about splitting the library into separate server and consumer and concentrating on the latter. Consumer is what most people need anyway and not having two different things in one library should result in a simpler API. What do you think?OpenID library for Python 3
2014-08-07T21:01:07.653000-07:00https://softwaremaniacs.org/blog/2014/07/15/python3-openid-fork/Apparently, there is no obvious choice for an OpenID library in Python 3. In Python 2 there's python-openid which — despite being an ugly over-engineered mess of a code — works and conforms to the spec. Unfortunately, the same can't be said about its independent port to Python3 python3-openid, which ...
<p>Apparently, there is no obvious choice for an OpenID library in Python 3. In Python 2 there's <a href="https://github.com/openid/python-openid">python-openid</a> which — despite being an ugly over-engineered mess of a code — <em>works</em> and conforms to the spec. Unfortunately, the same can't be said about its independent port to Python3 <a href="https://github.com/necaris/python3-openid">python3-openid</a>, which doesn't.</p>
<p>I am now in the process of converting this site to Python 3 and OpenID is one of the technologies that I depend on. So for the past few days I was considering two alternatives: either 1) drop OpenID and replace the whole comment system with something else or 2) fix python3-openid.</p>
<p><a name=more></a></p>
<p>In fact, neither choice was leaving me completely in peace with my inner demons, so after all I decided to take it a step further. I'm going to not simply fix the library but to fork it and to rewrite it into something much leaner, maintainable, idiomatic and with an API designed for humans.</p>
<p>Because, seriously, if what you have to <em>start with</em> is this:</p>
<pre><code>from openid.consumer.consumer import Consumer
</code></pre>
<p>… something is definitely wrong! Python deserves better.</p>
<p>Why not try to write a new library from scratch? Because aside of Python 3 related bugs the code is a rigorously tested and correct implementation of the spec and if I don't screw up my refactoring too much I'll have a working implementation pretty early in the process. Besides, I just love refactoring more than writing new code. Yeah, weird, I know :-)</p>
<p>Here's the (as yet empty) fork: <a href="https://github.com/isagalaev/python3-openid">https://github.com/isagalaev/python3-openid</a></p>
<p>I'll try my best to document the process on this blog, both for educational purposes and for getting early feedback. First question: a nice name for the library, not reminding of "python-openid" as it won't be compatible. Any ideas?