SM.Org software update 2012

Over the course of a few recent weeks I updated this site to a more modern software and revised some previously made choices. This one was loooong overdue considering that I still ran Ubuntu 9.10 before the update meaning that the system was almost 3 years old.

Here are some mostly useless but probably fascinating notes about it.

Core system upgrades

Running a comparatively low-load site I had a luxury not to plan for complex procedures minimizing downtime. What I did was just SSHing on the host and running do-release-upgrade under "screen" five times in a row. Linode's upgrade docs were holding my hand during the process.

Currently the site runs on Ubuntu 12.04 "Precise" and 3.0 Linux kernel. It's good to have modern packages!

Notable moments (read: long downtimes) during the upgrades:

Ubuntu's integrated mail stack was good back then and these days it's just wonderful! Upon installing a single package "mail-stack-delivery" you get yourself a completely set up mail server that can send mail, accept mail from outside and that provides an IMAP interface to your mailbox. It even does necessary SSL key generation magic for me, so I can do authorization from my mail client securely.
What I struggled with during all the mail upgrades was Yandex' anti-spam software that relies on manual creative editing of /etc/postfix/master.cf. I still use it simply because it works but I should probably look for some more actively supported solutions eventually.
Postgres updated to 9.x. Database conversion went without a hitch. The only problem was that now Ubuntu seems to be not as conservative in memory settings as it was before so Postgres refused to run due to low memory on my machine (756M) until I reduced the amount of "buffers" (I have no idea what it is). Seems to run pretty smooth so far :-).
The default Python is now 2.7. The change reminded me harshly that I was stupid enough to install some non-debianized Python packages into the system directory of which Ubuntu hadn't a faintest idea. As a result new Python didn't see them. I'm still trying to figure out how best to solve this (more on that later).
The longest downtime was caused by upgrading Django to the newest trunk version. I neglected it for the time long enough for many DeprecationWarnings to become exceptions. Spent some quality time refreshing my code base. One thing remaining unresolved is that now Django insist on MEDIA_URL and STATIC_URL to have different values. I have no idea why it has to be a hard requirement. My problem is that I don't use uploaded media and hence probably shouldn't use MEDIA_URL at all. Except that my forum mutants are generated on the server using an ImageField which does use MEDIA_URL for storage. Still, nothing what I can't deal with in due time, just have to figure out the best way.

New web stack

I moved from the old "lighttpd + FastCGI with flup" setup to the new one with "nginx + uwsgi". For some personal, completely subjective reason which I don't even remember anymore I always preferred lighttpd over nginx. However this was the case where the best maintainer won. It seems that nginx is more actively developed and uwsgi has most mind share. I also suspect all this is old news to everyone except me :-). But what actually bought me was the built-in support for uwsgi in nginx. I love integrated solutions! It means that I will have to write less of stupid glue-code in files that I will later forget.

The config file for nginx turned out to be much simpler than the one for lighttpd. It looks like it was specifically designed for the kind of tasks that web server admins do rather than being not-exactly-Turing-complete Perl-like code that happen to cover most use-cases with the enough amount of regexps.

OK, I just have to show you an example. Here's what I needed to do to handle some legacy redirects with lighttpd (heavily stripped and simplified):

fastcgi.server = (
  "/fcgi" =>  (
    (
      "socket" => "/var/run/sm_org/fcgi.socket" ,
      "check-local" => "disable",
    )
  )
)

url.redirect = (
  "^/soft/tags/(.*)" => "/soft/tagsfield/$1",
)

url.rewrite-once = (
  "^(/soft/tags/)(.*)" => "$1$2",
  "^/(.*)$" => "/fcgi/$1",
)

I don't even mind the infamous rewrite hack to connect the FastCGI backend (got used to it). But having a redirect and a corresponding no-op rewrite to make the former work… Seriously? (Before you ask, there is a reason why those redirects are not handled by Django code.)

Here's the nginx version that simply looks like a thing like that should look:

server {
  server_name softwaremaniacs.org;

  rewrite ^/soft/tags/(.*) /soft/tagsfield/$1 permanent;

  location / {
    include uwsgi_params;
    uwsgi_pass unix:/var/run/uwsgi/sm_org.socket;
  }

}

The uwsgi part of the story however wasn't that bright. What inspired my to try it out was this article about running multiple sites under uwsgi "Emperor mode". But since I don't run multiple sites I decided to run it first in a simple way.

With only one site to run I ditched the idea of having a separate upstart config for the master uwsgi process and a separate config for a site. Instead I put all the parameters in the upstart script itself as arguments to the uwsgi command.

Then I spent some time figuring out why it simply didn't run. Turns out that uwsgi no longer supports --module argument contrary to the statement that the config file keys are equivalent the command line arguments. The fact that Django uwsgi doc also refers to --module didn't help. Neither the fact that upstart has no diagnostics whatsoever (or I couldn't find one).

So I reconfigured everything toward the Emperor mode.

Then I spent some time trying to convince nginx running under "www-data" to talk to uwsgi running under my local user. Yes, this is uncommon, but solving that problem was way out of scope of my intentions. Anyway, apparently there's the wonderful chmod-socket option in uwsgi that solved it. Also I could probably use a TCP socket (by the way, does anyone know what's the practical difference and why everyone seems to prefer using unix sockets?)

Then I spent some time looking helplessly at nginx complaining that it doesn't get data from the backend and uwsgi writing some logs that didn't seem to have anything to do with it. Apparently the important line in those logs was:

-- unavailable modifier requested: 0 --

… which means "you don't have uwsgi-plugin-python installed". Obvious, right? :-)

Now, I don't exactly blame Ubuntu (or Debian?) maintainers who, after splitting uwsgi functionality into plugins, didn't think it necessary to include or recommend a single one of them to make uwsgi, you know, useful. Neither do I blame uwsgi maintainers for this cryptic error message. And neither do I blame myself for overlooking a warning in the uwsgi Quickstart guide that would solve my problem.

What I blame is the whole way of setting up computer software that we established over the last century. The culture of making every imaginable little thing configurable first, then forcing multitude of users to solve a few similar configuration task a million times over… But that's just frustration so please don't mind me!

OK, here are my uwsgi configs in case you wondered (but please use official docs whenever you can).

The upstart script responsible for running the uwsgi Emperor /etc/init/uwsgi.conf:

start on runlevel [2345]
stop on runlevel [06]

exec /usr/bin/uwsgi \
  --emperor /etc/uwsgi/apps-enabled \
  --uid maniac \
  --gid maniac

You do want to run the Emperor mode even if you have only one backend simply because then you can restart it gracefully without using sudo. Just chmod your site config to own it yourself.
You'll probably want to use "www-data" for uid and gid, though I personally find it more convenient to run my code under my local user.

The config describingn instance of the site backend /etc/uwsgi/apps-available/sm_org.ini:

[uwsgi]
master = 1
chdir = /home/maniac/sm_org
module = wsgi
processes = 5
max-requests = 1000
plugins = python
socket = /var/run/uwsgi/sm_org.socket
chmod-socket = 777

I have no idea why you need the "master" option but it seems that everyone uses it. Let's keep everyone happy!
chdir into the project's directory is needed because my code does relative imports of project apps located in that directory. Yours probably does it too.
module = wsgi means the wsgi.py in the project directory that exports Django wsgi app instance. In your case it might be called projectname.wsgi if you follow current Django project layout.
plugins = python is what makes uwsgi actually run Python code. You can skip "http" that some docs suggest here if you don't use your uwsgi backend as an HTTP server.
chmod-socket = 777 is what allows my "www-data" owned nginx talk to the "maniac" owned uwsgi backend. You might not need it if you run them both under the same user or use a TCP socket.

virtualenv

I have one unresolved question right now. OK, there's actually more than one but this one occupies me most: to use or not to use virtualenv.

Here are my thoughts:

I'm not a hosting company working for external clients so I don't run multiple sites with different set of packages. This is not my use-case for virtualenv. I'm quite happy maintaining my whole Django codebase using the trunk version of Django.
I cannot rely only on system Python packages simply because some of them Ubuntu doesn't provide. PyPI is the place where all new packages live now and I want to use it conveniently.
Installing packages with pip into the system site-packages is broken and out of the question. This is where the idea of using virtualenv comes up. But probably I could just tell pip to use a specific installation directory? I couldn't find the option for it.
What I don't like about virtualenv is that it makes my life harder. I should either "activate" my single environment all the time or use explicit paths everywhere to run commands. If this is how everyone does this then the world definitely gone mad :-). I'd rather keep my current way with tweaking PYTHONPATH . But then I'll still have the problem of pip trying to install everything in site-packages :-(.

What a man to do?

Comments: 22

Сергей Петров

Man should write simple service scripts for his own needs. Virtualenv is nice way to go, and you can just create one or two alias scripts to make your life with it easy.

Danila

I should either "activate" my single environment all the time

Yeah, in your shell login script. :)

Kirill

But probably I could just tell pip to use a specific installation directory?

-E DIR, --environment=DIR virtualenv environment to run pip in (either give the interpreter or the environment base directory)

vlasovskikh

Even if you don't need to maintain different environments, you can still get some value out of virtualenv. It provides a good alternative to user site-packages and manual PYTHONPATH modifications, as you can use packages only from a virtualenv, ignoring system-level site pacakges.

It automates the following activities:

Creating a new isolated environment (useful for deployment and development of several projects on a single machine). pip and distribute are installed into the new virtualenv automatically
```
$ python virtualenv.py /path/to/venv
$ . /path/to/venv/bin/activate
$ pip install -r requirements.txt
```

Customizing PYTHONPATH in order to get an isolated environment. Activating environment is easy from both the console and your WSGI application

# Your WSGI script
path = '/path/to/venv/bin/activate_this.py'
execfile(path, {'__file__': path})
application = ... # E.g. Django WSGI handler

To summarize, virtualenv does only two things (create and activate an isolated environment) and does them well.

Powerman

(by the way, does anyone know what's the practical difference and why everyone seems to prefer using unix sockets?)

AFAIR Stevens in one of his books (APUE, etc.) says UNIX sockets are faster TCP. Not sure this is still actual on Linux, but that one of reasons why UNIX sockets become preferable way for local communications.
UNIX sockets allow you to easily tune access permission to your FastCGI/etc. service using file-level permissions.
With TCP you'll have to invent port number between 1024 and 32768, hardcode it in two places, that number must be unique between all TCP services on this server, etc. With noticeable amount of FastCGI/etc. websites it quickly become administration headache.
With TCP there is a chance your FastCGI/etc. service will be accessible from internet. This usually happens unintentionally, just because you didn't mind use 127.0.0.1 for binding socket, and don't close access to that port using firewall. This can became security issue, because: it make it easier to DoS your service; create whole new class of security attacks (on FastCGI/etc. protocol implementation in your service); allow access to your site ignoring any limitations/logs/setup in your webserver; etc.
With TCP your service always accessible by anyone from localhost.

Probably there are other reasons which I don't remember right now. So, only real case when you wanna use TCP instead of UNIX sockets is when you run FastCGI/etc. service on different server than your web server, which usually happens when you've cluster with FastCGI/etc. services - and in this case all TCP downsides listed above usually not actual anymore.

rnd

To add to what Powerman already said on the matter, the reason that unix sockets are faster is precisely because it does not use TCP.

When you initiate a TCP connection first the IP layer resolves the MAC address of the destination (I believe that this is a no-op when the destination is 127.0.0.1 however there is still the cost of making the call to the kernel just to get a no-op call. Then the TCP layer establishes a connection using a "3-way handshake" where the client sends the server a connection request, the server sends the client an acknowledgment to accept the request and the client sends the server an acknowledgment confirming that it is still listening thereby establishing the connection. That's 6 more kernel level calls, 3 for the client and 3 for the server. Once the connection is opened every packet that is received by either side has to be acknowledged by the receiver which is what makes TCP fault tolerant unlike UDP.

When you create a unix socket it simply creates a special 'socket' file on the system. Instead of initialization of a possibly remote connection with a 3-way handshake all you do is open the socket file's file descriptor. Because the data written to the socket is held in memory which is being accessed via the file descriptor by both the client and the server there is no need for acknowledging every write to that file descriptor. Just like with any file, if a file that is open in one process is written to by another process then the second process will receive a kernel event notifying the process of the change allowing for bi-directional communication.

Also I wanted to note that an easy way to fix your socket permissions between maniac and www-data would be to add the maniac user as a member of the group www-data, then set the socket to be owned by user maniac and group www-data and set the permissions to 660. Right now with permissions set to 777 it means that any user can both write to and read from that socket so if someone breaks in to the system they can pound uWSGI with requests regardless of any Nginx rate limiting, even just as the user 'nobody'.

Finally, I personally would suggest you use gUnicorn (optionally with Gevent workers) over uWSGI, and the gUnicorn documentation is great!

Ivan Sagalaev

OK, I don't want to get rid of comments on this blog anymore, thanks guys! Some answers in bulk:

Сергей Петров:

Man should write simple service scripts for his own needs

Setting up a Python environment for a single-task single-user server seems to be the kind of thing that shouldn't require writing any custom glue code. I was thinking of virtualenv as a sort of solution that will help me to get rid it.

Danila:

I should either "activate" my single environment all the time

Yeah, in your shell login script. :)

This is effectively the same as setting PYTHONPATH in the same shell script, no? What's the point then? :-)

vlasovskikh:

It provides a good alternative to user site-packages and manual PYTHONPATH modifications, as you can use packages only from a virtualenv, ignoring system-level site pacakges.

WHY??? Seriously, what's wrong with using system packages? If some of them gets obsoleted by a new version it's solved simply by installing the new version locally. But why would I want to actually disable access to system packages?

# Your WSGI script
path = '/path/to/venv/bin/activate_this.py'
execfile(path, {'__file__': path})
application = ... # E.g. Django WSGI handler

Well, that's what I wanted to avoid: having to patch my wsgi.py and manage.py and have another few lines of easily forgettable code to maintain.

OK, I'm slowly coming to a realization that virtualenv is simply not the thing I thought it was. I will have to use it because of pip and it looks that I will just have to trade my current set of PYTHONPATH tricks for a set of different tricks and get used to them.

Technology sucks!

P.S. Powerman, rnd, thanks for the explanation about sockets!

vlasovskikh

But why would I want to actually disable access to system packages?

In order to be sure that your project contains all the necessary dependencies in its requirements.txt so it will work on other machines after deployment.

I forgot to update deployment scripts a couple of times when I started using new dependencies locally just by importing them. Maybe it was just my weak memory, but IMHO it's better to catch all the import errors while running tests locally.

ndru

About virtualenv - you can try to use it with virtualenvwrapper, that helps a lot (http://www.doughellmann.com/projects/virtualenvwrapper/).

uptimebox.myopenid.com

First of all, I should ask what setuptools version do you have? Default installation prefix was changed to /usr/local in Debian years ago. Not sure about Ubuntu though.

Anyway you can provide easy_install options to pip like this:

pip install --install-option="--prefix=/tmp" django

Django will be installed to /tmp prefix, e.g. /tmp/bin and /tmp/lib/python2.7/site-packages directories will be created. The same way you can provide pip with any easy_install options including --install-dir.

rnd

Your welcome for the explanation. As for Virtualenv I wanted to note that your argument of updating a package once updates it for everything is part of the reason you should use virtualenv. Say you have a Site that runs on Django 1.1 cdoe and you've missed or been ignoring the deprecation warnings. Now you want to make a new Django site using some new feature of Django 1.4 and install it system wide. The next time your Django 1.1 site reloads Django code (probably when a uWsgi worker is restarted) it will pick up the new Django 1.4 code where all those deprecation warnings become show stopping exceptions and now rather than working on your cool new site you have to perform maintenance on the old site just to get it running again (see: https://docs.djangoproject.com/en/dev/internals/release-process/#minor-releases). If you had been using isolated virtaulenvs then both virtualenvs would have their own copy of Django and this wouldn't have happened. That's a good example why virtualenv isolation is to your benefit.

I second the recommendation that you should use virtaulenvwrapper. Here is how I usually setup my Python environment (the first command is specific to Debian/Ubuntu linux):

$ sudo apt-get install python-setuptools
$ sudo easy_install pip
$ sudo pip install virtualenvwrapper
$ mkdir -p ~/.pip/cache ~/Projects

Add the following to the bottom of your ~/.bashrc file:

if [ -x `which virtualenvwrapper.sh` ]; then
    PIP_DOWNLOAD_CACHE='$HOME/.pip/cache' #Optional, but speeds pip up.
    PROJECT_HOME='$HOME/Projects' # Required to use mkproject
    . `which virtualenvwrapper.sh`
fi

Now just run '. ~/.bashrc' (or just close the terminal and start a new session.) The first time you source .bashrc after this it will be a bit noisy while virtualenvwrapper sets up the files and folders it uses, but it only happens once. Now all the virtaulenvwrapper commands are available to you, make a virtualenv:

$ mkproject --no-site-package mysite

which will build and activate your virtualenv, to deactivate the virtualenv:

$ deactivate

And to go back to it just use:

$ workon mysite

virtualenvwrapper modifies your shell's $PATH (amongst other things) so you don't have to use absolute paths for python executables to run in the virtualenv:

$ deactivate
$ which pip
/usr/local/bin/pip
$ workon mysite
$ which pip
/home/{your_username}/.virtualenvs/mysite/bin/pip

If you do a lot of PYTHON_PATH hackery you'll probably find add2virtualenv useful. I usually add my project directory to the python path:

$ cdproject
$ add2virtualenv .

There are a handful of other handy commands virtualenvwrapper provides for you, check out the documentation: http://www.doughellmann.com/projects/virtualenvwrapper/

Ivan Sagalaev

The next time your Django 1.1 site reloads Django code (probably when a uWsgi worker is restarted) it will pick up the new Django 1.4 code where all those deprecation warnings become show stopping exceptions and now rather than working on your cool new site you have to perform maintenance on the old site just to get it running again

Yes, I know how this stuff works in general :-). However this is not my case because Django upgrades don't happen to me unexpectedly — I do them whenever I feel like it. And when it does happen and Django does have some backwards-incompatible changes I usually don't care about the site being broken for some time. I know, this may sound shocking for some people :-). But I really do think that all this rage about 24/7/365 uptime is a little bit overrated when we're talking about personal sites.

But even if I decide to be bothered with a smooth upgrade I will create another copy of Django and another branch of my code and run a separate site instance just in time for the upgrade. I may or may not use virtualenv for this temporary operation. But my point is that it doesn't have anything to do with running the site under virtualenv all the time. It just doesn't add any value.

Roberto De Ioris

About virtualenv, as others have already suggested, they are a great thing, but please do not modify your wsgi.py file or make incredible complex shell script to activate them. uWSGI has a simple --virtualenv option, allowing you to use them without headaches.

And yes, the perceived complexity of deployments (independently by the language/platform used) is still a problem for programmers, this is why PaaS will probably make a lot of money in the near future :)

Ivan Sagalaev

uWSGI has a simple --virtualenv option, allowing you to use them without headaches.

Yeah, I noticed that one :-). But I'd argue that it doesn't really solve the problem, it merely moves this piece of configuration from one file to another. If I move this project to another server with a different path to an environment then chances of me forgetting to alter this path are equal regardless of it being in wsgi.py or appname.ini. The only way to remove an explicit configuration is to replace it with an implicit conventions.

Another problem is that there's one more entry point to my code — the manage.py script. I use it from shell and it is used to run periodic tasks. And I have to setup the environment for this script too. From this point of view having two very different ways of setting up the environment might be even worse than having two of them similar.

But anyway, I've already solved my problem. I should probably blog about it :-).

Roberto De Ioris

I think you should invest a little bit more time in uWSGI specific features (if you use it only as a boring WSGI gateway you will find there are a lot more easier choice out there)

Your sentence about virtualenv made me thing about that trick (i do not directly use it because i prefer to define my cron tasks directly in the app with the @cron decorator):

http://projects.unbit.it/uwsgi/wiki/TipsAndTricks#Sharingvirtualenvwithyourappandcrontasks

Ivan Sagalaev

Thanks Roberto, I'll look into this!

Andrey Popp

I should either "activate" my single environment all the time or use explicit paths everywhere to run commands.

I have these linkes in my .profile

function activate() {
  if [ -s $HOME/.virtualenvs/$1/bin/activate ]; then
    . $HOME/.virtualenvs/$1/bin/activate
  else
    echo "No such env $1 in $HOME/.virtualenvs"
  fi
}

[ -s $HOME/.virtualenvs/default ] && activate default

Also I use /usr/bin/env python in shebangs in my Python scripts.

Leonid Borisenko

Now, I don't exactly blame Ubuntu (or Debian?) maintainers who, after splitting uwsgi functionality into plugins, didn't think it necessary to include or recommend a single one of them to make uwsgi, you know, useful.

Debian/Ubuntu uwsgi-core package (which is providing uWSGI binary) Suggests uwsgi-plugins-all metapackage, so don't say that uWSGI package maintainer didn't made a note about plugins' necessity.

uWSGI binary could be useful as-is, without any plugin, so there is soft Suggests dependency on plugins metapackage, not Depends or Recommends. Anyway, even if uwsgi-core package will Depends on uwsgi-plugin-python, it will not help so much as user will have to enable plugin explicitly in configuration file ('plugins = python' line).

uWSGI isn't a simple WSGI server (anymore). It's a modular general application server.

However, I can understand frustration of a man/woman who installed the binary of server named uWSGI and found that it's not serving WSGI application as-is. Sorry about it.

Also uwsgi package (the package named 'uwsgi') provides infrastructure for automatic starting of uWSGI instances (init.d script, common location for sockets and logs). If you are curious, take a look at extensive documentation at /usr/share/doc/uwsgi/README.Debian.gz

Ivan Sagalaev

Leonid, thanks for weighing in!

You're right that "Suggests" and various docs are there. Technically. And I would gladly accept the blame of not noticing all this if I were the only one having this problem. But the mere fact that uwsgi wiki does have this warning about missing Python plugin and the fact that I did found the solution in a mailing list where someone else was having the same problem tells otherwise.

I believe it can be solved with two things:

Fix the error message. I suppose the situation of someone trying to serve a particular protocol from a bare-bones server is detectable, and in this case a server could simply log an explicit warning about it. That would've saved a lot of trouble.
Have documentation directed to popular deployment environments. If many people run uwsgi on Ubuntu (and I believe it's true) then there should be a prominent document about running uwsgi on Ubuntu. If it turns out that many people try to run Django projects on uwsgi then make a patch for Django docs so they aren't misleading.

Oh, and please don't take it as a go-ahead-and-do-it request! This is just my outsider's view, nothing more.

Ivan Sagalaev

Oh, and thanks for pointing me to the Debian docs on init.d scripts ant other stuff. I'll look into it!

m0use.openid.com

Why don't you use mod_wsgi+httpd instead of uwsgi? Is there any reasons? For me httpd+mod_wsgi solution looks simplier, because of:

HTTP between frontend and backed
mod_msgi can be configured for running different sites with different users (uwsgi too, but we should write init.d scripts manually)
httpd looks more stable and customizable solution

What do you think about that?

Ivan Sagalaev

The first and foremost reason to not use Apache *) is that its multiplication model based on processes and threads is way more resource hungry for the given task than the combination of an async proxy + dynamic backend. I wouldn't go into details in this comment, this is a very well researched topic.

Also some things you mentioned do not exactly look as advantages to me:

uwsgi protocol was specifically designed to be more efficient than FastCGI or HTTP between a frontend and a backend. Though I don't know how well it fullfills this goal. But anyway, this is not a bottleneck in my case, I'll choose smaller memory footprint over faster protocol any day.
User configuration in uwsgi is also per-site, you don't have to touch the master daemon for this. Also, nobody writes init.d scripts manually these days, you either use the already provided one or write a simple upstart config.
And "more stable" claim is just too vague to be taken seriously, sorry.

*) I know that "httpd" is more correct but I prefer to use the more widely accepted name

SM.Org software update 2012

Core system upgrades

New web stack

virtualenv

Comments: 22

Add comment