Marcus: a bilingual blog

Back in January I quietly replaced here a WordPress installation with a custom blog software. I didn't write anything about it then partly because of laziness and partly because of the fact that the software itself was quite banal.

However one of the advantages of having a custom solution is the ability to implement new features in exactly the way you want them. So recently after implementing support for bilingual content I decided that it's now worth giving it a technical overview.

Let me clarify that this software isn't intended for general consumption. And though it is written in a stand-alone pluggable fashion its feature set is not likely fit most bloggers' needs.

Code

The code is available in a branch at Launchpad. It's not big, just about 860 lines, and in my opinion is quite readable. What's interesting is that bilingual stuff accounts for about 250 lines of those. Anyway the blog itself is so small because many things are extracted in separate libraries:

pingdjack is used to send and accept pingbacks
scipio covers OpenID authentication in comments and implements anti-spam pipeline
subhub implements a personal PSHB hub which allows feed readers to receive and update my posts instantly (though some of them haven't yet catch up... Google Reader, I'm looking at you)

Features

The blog doesn't have much special in it. I've implemented everything I was actually using in WP including some things covered by plugins. This is why nobody has noticed the switching: everything works the same. Except for small bugs that were fixed along the way: typography is no longer applied to code snippets, navigational links don't lose page numbers, etc.

All editing is done in Django admin. The interface is radically simpler than the old WP-style "we-have-the-best-blogging-system-look-how-many-things-we-got" dashboard. The only interesting admin customization is a FilterSpec for fields of type "boolean time". Essentially they are pretty normal DateTimeField(null=True) that work both as flags and as time values. A typical example is "published" field for an article: while it has None the article is considered a draft, when it gets a specific time value it becomes published.

The hardest thing to implement were feeds. Though Django does support them out of the box it does this not exactly in the way I need it. I plan to write another post about it, there are a couple of things to grumble about :-).

Bilingual internationalization

As the first principle of my bilingual design I decided not to implement an abstract universal support for arbitrary content in multiple languages. This is quite hard to implement and even harder to implement it efficiently. Instead I've just framed my problem into two limited usecases and kept them in mind all the time:

Current audience of this blog can read Russian and, to a lesser extent, English and should get posts in both languages with a preference for Russian version.
English speaking audience generally has no use for Russian content, so there should be an English-only version of the blog.

Obviously not all readers fall into these categories but since all the content is anyway available I decided not to complicate neither code nor UI with support for corner cases.

Models

Models that work with bilingual content store it explicitly in separate fields for both languages:

class Article(models.Model):
    title_ru = models.CharField(max_length=255, blank=True)
    text_ru = models.TextField(blank=True)
    title_en = models.CharField(max_length=255, blank=True)
    text_en = models.TextField(blank=True)
    # ...

Granted, it wouldn't be very convenient to work with those fields directly because it would require having conditions all over the code checking which of the fields are available. This is helped as follows.

A model has methods that return content depending on the language passed into them:

class Article(models.Model):

    def title(self, language=None):
        # return title_ru or title_en
    title.needs_language = True

    def html(self, language=None):
        # format text_ru or text_en
    html.needs_language = True

    def get_absolute_url(self, language=None):
        # generate URL depending on the language
    get_absolute_url.needs_language = True

    # etc.

I'll explain implementation logic of these methods later. What's interesting now is the attribute "needs_language" that is used by a specialized proxy Translation. Its job is to pass implicitly the language of translation into all methods that require it. The proxy is used then to wrap all bilingual objects before they can be used in Python or templates code to make it possible to write just {{ article.title }} and have it automatically translated into article.title(language) call.

class Translation(object):
    def __init__(self, obj, language):
        super(Translation, self).__init__()
        self.obj = obj
        self.language = language

    def __getattr__(self, name):
        attr = getattr(self.obj, name)
        if getattr(attr, 'needs_language', None):
            attr = curry(attr, self.language)
        return attr

To make it possible to "translate" objects (i.e. wrap them in proxies) right in template code I've also made a filter "translate". It is smart enough to accept single objects, flat sequences of objects and even trees represented as nested sequences:

{% with comment.article|translate:language as a %}
...
{% for cat in article.categories.all|translate:language %}
...
{% tree object_list|astree:"parent"|translate:language %}

URLs and language selection

Having divided the audience into two categories I needed different URL schemes to address them. I ended up with a simple solution where URLs for English content got "en/" part at the end. In this case Russian content isn't shown at all. Without the "en/" all the content is shown with a preference for Russian version when it's available. As a side effect I've also got working URLs ended with "ru/" which yield only Russian content. But they are not advertised anywhere in UI.

Here is, by the way, where the decision to keep language specific fields inside models instead of having them in separate tables turned out to be good: queries tend to be very simple. For example querying articles available only in English is as simple as filtering them without any additional joins:

Article.objects.exclude(text_en='')

Such URLs provide three different values for a language in which a user requests the content: "en", "ru" and None. The content itself also can be available in three variants: in Russian , in English and in both languages (technically there's a fourth variant: when content isn't available in any language but it's not of much practical interest :-) ). Therefore all language-aware methods should return proper content depending on these two parameters.

Logic is not complicated but requires a bit of attention in corner cases. For example method Article.title looks like this:

def title(self, language=None):
    if language:
        return self.title_en if language == 'en' else self.title_ru
    else:
        return self.title_ru or self.title_en

In the situation when a caller requests an English title of a purely Russian article it will get an empty string. I decided not to raise any special Exception in this case because this situation is prevented at the application level where objects that don't have required content are filtered out early.

Translation of user interface

All the infrastructure for translating user interface is available in Django's internationalization system. However what I didn't use from it was a middleware that detects language based on user settings. Since I get language from URLs I initialize translation manually in each view function that works with languages:

translation.activate(language or 'ru')

The or 'ru' part means that by default the language of user interface is switched to Russian.

Because of the concept of a default language I couldn't use Django's built-in variable LANGUAGE_CODE in templates because it always set to some particular language. Instead I pass my own language variable into templates wherever it's needed.

Some people may point out that repeating all these lines in several views violates DRY (shockingly!). And that's true. But here I deliberately refused the temptation to write some generalized code because I suspect that all the fuss with middleware and context processors just won't worth the benefit.

Name

I named this blog software "Marcus" after Marcus Antonius (orator). This is not that famous general who had affairs with Cleopatra. This one was known by his good memory that allowed him to remember and pronounce in courts thoroughly prepared speeches that still appeared as if he spoke impromptu.

I think that this is the most suitable Ancient Roman name for a blog :-).

Comments: 6

dgl

I'm not a native speaker either, but here is what I found:

I decided that it's now worth giving it a technical overview. - > I decided that it's now worth giving a technical overview.

The blog doesn't have much special in it. -> There is nothing special about the blog.

navigational links don't loose page numbers, etc. -> loose - освобождать, спускать с цепи, lose - терять

when it gets a concrete time value -> when it gets a specific time value

grammatical and spelling errors! - grammatical and spelling mistakes!
Ivan Sagalaev

navigational links don't loose page numbers, etc. -> loose - освобождать, спускать с цепи, lose - терять

when it gets a concrete time value -> when it gets a specific time value

Fixed those (the first one is a typo). Thanks!

As for others I don't agree that they were incorrect, sorry :-)
Случайный прохожий
1. English version shows "Иван Сагалаев" instead of "Ivan Sagalaev" for comments.
2. "Обязательное поле" — error message for English version.
3. Legend image http://softwaremaniacs.org/media/style/markdown-legend.png in Russian.
Случайный прохожий

"Other topics" shows much more articles for English version then you actually have:

http://softwaremaniacs.org/blog/category/en/
Ivan Sagalaev
1. "Обязательное поле" — error message for English version.
2. "Other topics" shows much more articles for English version then you actually have:
Fixed these, thanks!

Legend image http://softwaremaniacs.org/media/style/markdown-legend.png in Russian.

This is a known thing. Gotta pull myself together to remake this both in Russian and in English.

English version shows "Иван Сагалаев" instead of "Ivan Sagalaev" for comments.

And this one is a much deeper problem than translation. User names are from another app and it's tricky.
Ivan Sagalaev

A while ago I reported on switching this blog to a custom software named Marcus. Despite its source code being available in the open I didn't intend developing it into a full-blown project for two reasons: a) maintaining it would have taken much more time than I could afford and b) being completely anal about my own blog software I didn't want to piss off contributors by constantly rejecting all the features they would propose. Anyway, if someone felt so compelled they could take the code and start developing it on their own.