Why "Where everyone goes" crashed

A note for my foreign readers

Malcolm Tredinnick asked me to translate into English my Friday's post about our experience of launching a Django project under high load. So here it is, hope this will be useful :-)

I believe some context is also required. Yandex is the biggest Russian search engine and service portal that started to look into building its services using Django last year. I'm working there as a lead architect on the interesting project "Where everyone goes". This is a social network for people to organize themselves to spend some leisure time together: movies, art galleries, concerts, clubs, etc. This is by far our biggest Django project in Yandex. Last Thursday we've crashed under a stream of traffic coming from a teaser on our mothership's index page. Now read on for the rest of a "war story"!

As many attentive readers of this blog noticed :-), the morning after launching a teaser our wonderful service showed a strange face and refused to work until the teaser was removed. So yesterday and today we were trying to figure out what went wrong with it. And it looks like we've succeeded :-). I am now in an interesting mental state of wondering why it was so bad and hopeless that Thursday while now it's all so simple and obvious... However only the next teaser can prove us right.

First of all some numbers for the sake of statistics. Though as you'll see later they don't mean that much.

We're running on a cluster of 4 machines (CPU Xeon, 4 cores, 2.3 GHz) where we have litgttpd, Django and memcached. On the back there is a single DB server running MySQL. We managed our traffic well all the night and most of the morning when we had about 55 requests/second on one host. Then between 10 and 11 am. we started to feel worse while the traffic has grown to more than 300 requests/second after which we've... oh, well :-(

First-class problem

During the whole Thursday many participants of the process had many various ideas about the issue. But to the middle of the day today one of them had crystallized and now I'm pretty sure that it was the main reason of such a bad performance.

The problem was in the usage of sessions (not in them per se though but in the way we used them). These are very standard sessions that come with Django. In general Django's sessions are made smart enough — while stored in a DB they don't touch it at all if your application doesn't write anything in them. Unfortunately we used them in an interesting manner. We store there one-off user messages that are shown once and then removed. For this we had this code in a context processor:

messages = request.session['messages']
request.session['messages'] = []  # ← a killer line!!!

The second line here clears messages by effectively writing to a session. And since it is in a context processor it was executed on every request.

Well, not exactly on every request but rather only on those when we get a new user. However since we had a teaser on Yandex's index page practically all our users were new.

One may, nevertheless, suppose that even at the rate of 1200 requests/sec MySQL on a good hardware should handle the task. There were two particular features of the session table that didn't let it happen:

as all tables in our DB it used InnoDB engine
it has a md5 hash for the primary key instead of auto-incremented integer

If I get this thing right, InnoDB tables physically rebuild themselves upon each new inserted record according to the order of the primary key. This is why, evidently, they are being queried fast. And in the usual case with auto-incremented integer key this doesn't cause any problems since a new record is added at the end right where it should be anyway. But since we have a key that is random we had each new record inserted somewhere in the middle which caused reconstruction of the table. It is being reconstructed in slices which given such 'chaotic' writes leads to fragmentation which slows writing more and more. Our admins say that at the worst moments a single write in a session table had taken about 6-7 seconds!

Finally, at some moment the growth of quantity of new requests has become faster then the speed of handling them. All attempts to revive the service were hence futile, because the further we went, the slower it was running. This is what I meant saying that performance characteristics don't mean much: it's useless to measure speed with such an anchor.

So we had two lessons out of this (surely pretty obvious for many):

don't create sessions when you don't need them
never store Django sessions in an InnoDB table

By the way... The bitter irony of the story is that we in fact don't use this messages subsystem. The service was killed by a feature that didn't exist.

Second-class problems

We have brought the service back to life by moving sessions into an in-memory table (and also by removing the teaser :-) ). But in-memory table didn't let sessions live very long either: it has quickly outgrown some storage space and in the evening our service has crashed again. So we moved them back on disk. And now we just got rid of that "killer line" altogether.

But an overloaded database allowed us to see other problems. Though they aren't nearly as severe as the first one we will fix them anyway because sooner or later we'll run into them as our audience will grow.

Problems we learnt were:

"Dog-pile" effect. Our index page consists mostly of cached blocks that are produced using relatively heavy queries. When the cache record becomes stale the next request would cause its regeneration. And while this regeneration wasn't complete all subsequent requests would execute these same queries for regeneration. If the regeneration time is long enough and the number of requests is big enough they would create an additional load that further slows down the regeneration speed making the situation gradually worse.
We can't read from DB replicas yet which would otherwise allowed us to relieve the master that was busy writing sessions and would allowed as to stay a little longer :-).
And also one of the queries on index page involved a join of four table one of which is the largest table in the database :-).

Strategy

Yesterday everything looked very bad: we're limited by performance of database writes which means in the world of relational DBMS that we can't scale wide (i.e. by adding machines). We have to rewrite something and artificially partition the database somehow and this is hard and painful. But today just by removing writing to sessions we managed to turn the situation upside down. On a test stand consisting of one frontend and one DB backend we can load the frontend up to 80 LA while the DB is loaded up to 2-2.5 LA. Which means that now we don't have a bottleneck on the database. We can just add frontends and I think we won't overload database very soon. We'll try to evaluate this more precisely on Tuesday when we plan to stress-load the system of two frontends and one database.

To solve the second-class problems we plan in particular:

Use MintCache to avoid dog-pile effect.
I'm about to write a "mysql_cluster" db backend for Django which will use replicas for reads. Looks like it can be done as a self-contained package without hacking Django's core.
We will continue to denormalize DB schema to avoid heavy queries.

Watch for new episodes!

Комментарии: 13

Jökull

That's incredibly useful. Thanks for sharing.
Michiel Bijland

Hi,

I uploaded a snippet we used on a project with a very costly calculation.
It is a template tag that replaces the default cache template but uses almost the same trick as MintCache.

http://www.djangosnippets.org/snippets/614/

good luck with your website,
Michiel
forgems

Why don't you use memcache for storing sessions? It's way ahead in terms of performance and speed compared to database
Иван Сагалаев

Why don’t you use memcache for storing sessions?

Mainly because nobody thought that sessions will cause any problems :-). And actually without constant writes into InnoDB storage they are fast enough. On the other hand we use sessions to store settings for unregistered users and we want them to be persistent for a long period. Definitely longer than until we restart memcached on, say, next service update.
Mysql proxy

We plan to use a service based on django too. We all know that the bottleneck is the database, so we use memecached to avoid DB access. We use Amazon S3 to avoid static files access.That's the main points.

Regarding mysql in particular, we decided to use mysql-proxy (http://forge.mysql.com/wiki/MySQL_Proxy) and specially the Read/Write splitting (http://jan.kneschke.de/2007/8/26/mysql-proxy-more-r-w-splitting). I don't know it this is of any help in your situation, but who knows ;-)

Untill know it seems to balance pretty well between master and slaves. And best of all, this is supposed to support SQL rewritting and sharding. So you don't need to take of the django code.
Иван Сагалаев

I'm a bit hesitant to use read/write splitting on SQL level. Since we use transactions reads going to a separate server won't see writes going to the master until commit. This might create some hard to find errors. I plan to separate reads and writes on a view level to make sure that the entire logically consistent writing view works in a single transaction.
Lad.

Ivan,very interesting article.In the article you provided numbers of requests/second for your site.How did you measure those numbers? I would like to check requests/second on my website too but do not know how?
Can you please let me know how I can find?.
Thanks a lot.
Lad.
Martin M

Very interesting read. Thanks for sharing!
Иван Сагалаев

We measure all the statistics, including requests/second, simply by parsing lighttpd logs. More technically this is not a ready-made software but our in-house set of tools. But basically it just parses logs and counts results.
Leandro Guimarães Faria Corcete DUTRA

You say you have a write problem at the database, and that you intend do denormalise to avoid heavy queries.

But denormalisation aggravates write contention.

Also, you use MySQL, which does not scalate well.

I also bet you use surrogate keys and a badly modelled schema, both recipes for disaster with unnecessary joins.

I am quite sure proper modelling and PostgreSQL would help you a lot.
Иван Сагалаев

You say you have a write problem at the database, and that you intend do denormalise to avoid heavy queries.

But denormalisation aggravates write contention.

We didn't have "a write problem" in general. We have the problem with writing into a table with non-incremental primary key. This bug was fixed by eliminating unneeded writes at all.

As for writing denormalized data... The gain from not needing to construct massive groupings on every read is on many orders of magnitude bigger than the delay from couple more updates inside an occasional write in a transaction.

I also bet you use surrogate keys and a badly modelled schema

This is kind of a daring assumption...
[repost]Yandex Architecture « New IT Farmer

Update: Anatomy of a crash in a new part of Yandex written in Django. Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it.
sj

Why not use redis or membase to save the sessions ? Membase is a drop in replacement for memcache and would allow you to persist them.