Back in January I quietly replaced here a WordPress installation with a custom blog software. I didn't write anything about it then partly because of laziness and partly because of the fact that the software itself was quite banal.
However one of the advantages of having a custom solution is the ability to implement new features in exactly the way you want them. So recently after implementing support for bilingual content I decided that it's now worth giving it a technical overview.
Let me clarify that this software isn't intended for general consumption. And though it is written in a stand-alone pluggable fashion its feature set is not likely fit most bloggers' needs.
Code
The code is available in a branch at Launchpad. It's not big, just about 860 lines, and in my opinion is quite readable. What's interesting is that bilingual stuff accounts for about 250 lines of those. Anyway the blog itself is so small because many things are extracted in separate libraries:
- pingdjack is used to send and accept pingbacks
- scipio covers OpenID authentication in comments and implements anti-spam pipeline
- subhub implements a personal PSHB hub which allows feed readers to receive and update my posts instantly (though some of them haven't yet catch up... Google Reader, I'm looking at you)
Features
The blog doesn't have much special in it. I've implemented everything I was actually using in WP including some things covered by plugins. This is why nobody has noticed the switching: everything works the same. Except for small bugs that were fixed along the way: typography is no longer applied to code snippets, navigational links don't lose page numbers, etc.
All editing is done in Django admin. The interface is radically simpler than the old WP-style "we-have-the-best-blogging-system-look-how-many-things-we-got" dashboard. The only interesting admin customization is a FilterSpec for fields of type "boolean time". Essentially they are pretty normal DateTimeField(null=True)
that work both as flags and as time values. A typical example is "published" field for an article: while it has None the article is considered a draft, when it gets a specific time value it becomes published.
The hardest thing to implement were feeds. Though Django does support them out of the box it does this not exactly in the way I need it. I plan to write another post about it, there are a couple of things to grumble about :-).
Bilingual internationalization
As the first principle of my bilingual design I decided not to implement an abstract universal support for arbitrary content in multiple languages. This is quite hard to implement and even harder to implement it efficiently. Instead I've just framed my problem into two limited usecases and kept them in mind all the time:
- Current audience of this blog can read Russian and, to a lesser extent, English and should get posts in both languages with a preference for Russian version.
- English speaking audience generally has no use for Russian content, so there should be an English-only version of the blog.
Obviously not all readers fall into these categories but since all the content is anyway available I decided not to complicate neither code nor UI with support for corner cases.
Models
Models that work with bilingual content store it explicitly in separate fields for both languages:
class Article(models.Model):
title_ru = models.CharField(max_length=255, blank=True)
text_ru = models.TextField(blank=True)
title_en = models.CharField(max_length=255, blank=True)
text_en = models.TextField(blank=True)
# ...
Granted, it wouldn't be very convenient to work with those fields directly because it would require having conditions all over the code checking which of the fields are available. This is helped as follows.
A model has methods that return content depending on the language passed into them:
class Article(models.Model):
def title(self, language=None):
# return title_ru or title_en
title.needs_language = True
def html(self, language=None):
# format text_ru or text_en
html.needs_language = True
def get_absolute_url(self, language=None):
# generate URL depending on the language
get_absolute_url.needs_language = True
# etc.
I'll explain implementation logic of these methods later. What's interesting now is the attribute "needs_language" that is used by a specialized proxy Translation. Its job is to pass implicitly the language of translation into all methods that require it. The proxy is used then to wrap all bilingual objects before they can be used in Python or templates code to make it possible to write just {{ article.title }}
and have it automatically translated into article.title(language)
call.
class Translation(object):
def __init__(self, obj, language):
super(Translation, self).__init__()
self.obj = obj
self.language = language
def __getattr__(self, name):
attr = getattr(self.obj, name)
if getattr(attr, 'needs_language', None):
attr = curry(attr, self.language)
return attr
To make it possible to "translate" objects (i.e. wrap them in proxies) right in template code I've also made a filter "translate". It is smart enough to accept single objects, flat sequences of objects and even trees represented as nested sequences:
{% with comment.article|translate:language as a %}
...
{% for cat in article.categories.all|translate:language %}
...
{% tree object_list|astree:"parent"|translate:language %}
URLs and language selection
Having divided the audience into two categories I needed different URL schemes to address them. I ended up with a simple solution where URLs for English content got "en/" part at the end. In this case Russian content isn't shown at all. Without the "en/" all the content is shown with a preference for Russian version when it's available. As a side effect I've also got working URLs ended with "ru/" which yield only Russian content. But they are not advertised anywhere in UI.
Here is, by the way, where the decision to keep language specific fields inside models instead of having them in separate tables turned out to be good: queries tend to be very simple. For example querying articles available only in English is as simple as filtering them without any additional joins:
Article.objects.exclude(text_en='')
Such URLs provide three different values for a language in which a user requests the content: "en", "ru" and None. The content itself also can be available in three variants: in Russian , in English and in both languages (technically there's a fourth variant: when content isn't available in any language but it's not of much practical interest :-) ). Therefore all language-aware methods should return proper content depending on these two parameters.
Logic is not complicated but requires a bit of attention in corner cases. For example method Article.title looks like this:
def title(self, language=None):
if language:
return self.title_en if language == 'en' else self.title_ru
else:
return self.title_ru or self.title_en
In the situation when a caller requests an English title of a purely Russian article it will get an empty string. I decided not to raise any special Exception in this case because this situation is prevented at the application level where objects that don't have required content are filtered out early.
Translation of user interface
All the infrastructure for translating user interface is available in Django's internationalization system. However what I didn't use from it was a middleware that detects language based on user settings. Since I get language from URLs I initialize translation manually in each view function that works with languages:
translation.activate(language or 'ru')
The or 'ru'
part means that by default the language of user interface is switched to Russian.
Because of the concept of a default language I couldn't use Django's built-in variable LANGUAGE_CODE in templates because it always set to some particular language. Instead I pass my own language variable into templates wherever it's needed.
Some people may point out that repeating all these lines in several views violates DRY (shockingly!). And that's true. But here I deliberately refused the temptation to write some generalized code because I suspect that all the fuss with middleware and context processors just won't worth the benefit.
Name
I named this blog software "Marcus" after Marcus Antonius (orator). This is not that famous general who had affairs with Cleopatra. This one was known by his good memory that allowed him to remember and pronounce in courts thoroughly prepared speeches that still appeared as if he spoke impromptu.
I think that this is the most suitable Ancient Roman name for a blog :-).
Comments: 6
I'm not a native speaker either, but here is what I found:
I decided that it's now worth giving it a technical overview. - >