Refactoring discovery protocol

It's been a while since my last update on the python3-openid refactoring. Though I still work on it pretty actively, I totally failed at documenting the process as I planned in the beginning. So I came up with a new plan.

New plan

First of all, I admit to gravely underestimating the sheer size of the task. The library is huge. And not only because of the enterprise-grade complexity painstakingly cultivated in the code but also because it contains a lot of things that have little to do with the actual OpenID protocol. Since I don't want to do it forever I decided to limit it in two ways:

I plan to stop active refactoring work on November 1st and publish whatever I have at the moment with the new name. It should be usable (as it is from day one, modulo small bugs) and immensely easier to modify.
I want to get rid of some functionality. The whole server part is definitely going to be cut, so it'll be just a consumer, which is what everyone should need anyway. Probably something else too, I'm not sure yet.

As for this little "diary", the library proved to be bad material for a refactoring tutorial: instead of providing a few good real-world examples it mostly repeats all the same mistakes over and over (and over) again, and it doesn't make sense to write about them in details. So I'll probably just write some summary at the end.

Progress so far

All this time I was working on OpenID discovery: the part that takes a URL from the user and figures out what to do with it. It was a very simple affair in the version 1 of the protocol: just parse two HTML <link>s with hard-coded rel attributes and you're done. In OpenID 2 this functionality was extended, partly legitimately, but at some point all hell broke loose: the discovery alone has sprouted not one but two(!) separate specifications — Yadis and XRI — both written with thoroughbred Enterprise™ spirit. Accordingly, the discovery mechanism in python-openid consisted of three generalized sub-libraries implementing everything about those specs.

Here's a high-level overview of the discovery process on an HTTP URL in the original code, as best as I remember it:

The URL content is fetched with yadis.discover.discover that does one or two HTTP requests depending on certain HTTP headers and wraps it into a result object along with some metadata (that is never used). The data is presumably an XRDS document describing OpenID services on that URL.
Data is handed over to a constructor method of a class representing a discovered service. I'll call it a "service class" from now on, though it is called OpenIDServiceEndpoint in the library.
The constructor is a one-liner calling a module-level function extractServices which is actually an import alias for yadis.services.applyFilter (don't ask why "extracting services" is the same thing as "applying a filter").

The class passes itself as the "filter" parameter to the function.
The "filter" parameter is not really a filter but something that can be turned into a filter. This something can be:
- a callable,
- a class with a certain method,
- another already constructed filter, or
- a list of any of those things.
With the help of quite a few hasattr, isinstance and the Composite design pattern all of this is turned into a proper filter — another class with its own interface consisting of a single method.

Here's the deal though: of all the various argument types only one ever gets passed to the function in practice: a service class from step 3. So it effectively simply gets wrapped into a different kind of class-with-a-method.
The filtering function then parses the XRDS document (finally!) with parseXRDS from another module that returns a list of XML service elements that it passes back to the filter (the wrapper from step 4).
The filtering wrapper doesn't do any filtering though. First, it expands each XML service element into serveral objects of yet another type that are destined to finally become discovered services. This expansion happens because a service element may contain several service URIs. Here's the funny part: the Yadis spec doesn't require a client to treat it this way, you can assume that a service element represents one service and use any of the URIs you like. But who would miss the opportunity to write more code, right?
Okay, the service proto-objects are finally passed back into the discover module where the service class actually constructs itself from them, leaving out those whose type it doesn't recognize. Now this is what "filtering" actually is: ignoring service elements with non-OpenID types.

The main hindrance in grasping all this for me was that the code relies heavily on inversions of control: function accept callbacks disguised as "interfaces" and to track a single call stack you have to jump back and forth between a handful of modules, it's far from being unidirectional.

Anyway, I killed all of it.

The yadis module now only does HTTP requests taking care of HTTP headers. The xrds module only parses XRDS, dealing with the ugliness of working with namespaces in ElementTree and providing basic service like sorting entries by prioritiy and filtering.

The discovery process now looks like this (I omit XRI identifiers and HTML fallback):

The URL content is fetched with yadis.fetch_data.
Data is parsed with xrds.get_services that returns a list of service elements corresponding to types passed in an argument.
XML elements are mapped into Service objects by discover.parse_services.

No callbacks or meta-programming is involved at any point.

What's next

After having dealt with the discovery process I have now started working on the main thing: the consumer. It is also big, unwieldy, carpet-tested, blah-blah... nothing new. Let's see if I can manage to crack it by next Thursday.

Refactoring discovery protocol

New plan

Progress so far

What's next

Add comment