Over the course of a few recent weeks I updated this site to a more modern software and revised some previously made choices. This one was loooong overdue considering that I still ran Ubuntu 9.10 before the update meaning that the system was almost 3 years old.

Here are some mostly useless but probably fascinating notes about it.

Core system upgrades

Running a comparatively low-load site I had a luxury not to plan for complex procedures minimizing downtime. What I did was just SSHing on the host and running do-release-upgrade under "screen" five times in a row. Linode's upgrade docs were holding my hand during the process.

Currently the site runs on Ubuntu 12.04 "Precise" and 3.0 Linux kernel. It's good to have modern packages!

Notable moments (read: long downtimes) during the upgrades:

New web stack

I moved from the old "lighttpd + FastCGI with flup" setup to the new one with "nginx + uwsgi". For some personal, completely subjective reason which I don't even remember anymore I always preferred lighttpd over nginx. However this was the case where the best maintainer won. It seems that nginx is more actively developed and uwsgi has most mind share. I also suspect all this is old news to everyone except me :-). But what actually bought me was the built-in support for uwsgi in nginx. I love integrated solutions! It means that I will have to write less of stupid glue-code in files that I will later forget.

The config file for nginx turned out to be much simpler than the one for lighttpd. It looks like it was specifically designed for the kind of tasks that web server admins do rather than being not-exactly-Turing-complete Perl-like code that happen to cover most use-cases with the enough amount of regexps.

OK, I just have to show you an example. Here's what I needed to do to handle some legacy redirects with lighttpd (heavily stripped and simplified):

fastcgi.server = (
  "/fcgi" =>  (
    (
      "socket" => "/var/run/sm_org/fcgi.socket" ,
      "check-local" => "disable",
    )
  )
)

url.redirect = (
  "^/soft/tags/(.*)" => "/soft/tagsfield/$1",
)

url.rewrite-once = (
  "^(/soft/tags/)(.*)" => "$1$2",
  "^/(.*)$" => "/fcgi/$1",
)

I don't even mind the infamous rewrite hack to connect the FastCGI backend (got used to it). But having a redirect and a corresponding no-op rewrite to make the former work… Seriously? (Before you ask, there is a reason why those redirects are not handled by Django code.)

Here's the nginx version that simply looks like a thing like that should look:

server {
  server_name softwaremaniacs.org;

  rewrite ^/soft/tags/(.*) /soft/tagsfield/$1 permanent;

  location / {
    include uwsgi_params;
    uwsgi_pass unix:/var/run/uwsgi/sm_org.socket;
  }

}

The uwsgi part of the story however wasn't that bright. What inspired my to try it out was this article about running multiple sites under uwsgi "Emperor mode". But since I don't run multiple sites I decided to run it first in a simple way.

With only one site to run I ditched the idea of having a separate upstart config for the master uwsgi process and a separate config for a site. Instead I put all the parameters in the upstart script itself as arguments to the uwsgi command.

Then I spent some time figuring out why it simply didn't run. Turns out that uwsgi no longer supports --module argument contrary to the statement that the config file keys are equivalent the command line arguments. The fact that Django uwsgi doc also refers to --module didn't help. Neither the fact that upstart has no diagnostics whatsoever (or I couldn't find one).

So I reconfigured everything toward the Emperor mode.

Then I spent some time trying to convince nginx running under "www-data" to talk to uwsgi running under my local user. Yes, this is uncommon, but solving that problem was way out of scope of my intentions. Anyway, apparently there's the wonderful chmod-socket option in uwsgi that solved it. Also I could probably use a TCP socket (by the way, does anyone know what's the practical difference and why everyone seems to prefer using unix sockets?)

Then I spent some time looking helplessly at nginx complaining that it doesn't get data from the backend and uwsgi writing some logs that didn't seem to have anything to do with it. Apparently the important line in those logs was:

-- unavailable modifier requested: 0 --

… which means "you don't have uwsgi-plugin-python installed". Obvious, right? :-)

Now, I don't exactly blame Ubuntu (or Debian?) maintainers who, after splitting uwsgi functionality into plugins, didn't think it necessary to include or recommend a single one of them to make uwsgi, you know, useful. Neither do I blame uwsgi maintainers for this cryptic error message. And neither do I blame myself for overlooking a warning in the uwsgi Quickstart guide that would solve my problem.

What I blame is the whole way of setting up computer software that we established over the last century. The culture of making every imaginable little thing configurable first, then forcing multitude of users to solve a few similar configuration task a million times over… But that's just frustration so please don't mind me!

OK, here are my uwsgi configs in case you wondered (but please use official docs whenever you can).

The upstart script responsible for running the uwsgi Emperor /etc/init/uwsgi.conf:

start on runlevel [2345]
stop on runlevel [06]

exec /usr/bin/uwsgi \
  --emperor /etc/uwsgi/apps-enabled \
  --uid maniac \
  --gid maniac

The config describingn instance of the site backend /etc/uwsgi/apps-available/sm_org.ini:

[uwsgi]
master = 1
chdir = /home/maniac/sm_org
module = wsgi
processes = 5
max-requests = 1000
plugins = python
socket = /var/run/uwsgi/sm_org.socket
chmod-socket = 777

virtualenv

I have one unresolved question right now. OK, there's actually more than one but this one occupies me most: to use or not to use virtualenv.

Here are my thoughts:

What a man to do?

Comments: 22

  1. Сергей Петров

    Man should write simple service scripts for his own needs. Virtualenv is nice way to go, and you can just create one or two alias scripts to make your life with it easy.

  2. Danila

    I should either "activate" my single environment all the time

    Yeah, in your shell login script. :)

  3. Kirill

    But probably I could just tell pip to use a specific installation directory?

    -E DIR, --environment=DIR virtualenv environment to run pip in (either give the interpreter or the environment base directory)

  4. vlasovskikh

    Even if you don't need to maintain different environments, you can still get some value out of virtualenv. It provides a good alternative to user site-packages and manual PYTHONPATH modifications, as you can use packages only from a virtualenv, ignoring system-level site pacakges.

    It automates the following activities:

    • Creating a new isolated environment (useful for deployment and development of several projects on a single machine). pip and distribute are installed into the new virtualenv automatically

      $ python virtualenv.py /path/to/venv
      $ . /path/to/venv/bin/activate
      $ pip install -r requirements.txt
      
    • Customizing PYTHONPATH in order to get an isolated environment. Activating environment is easy from both the console and your WSGI application

      # Your WSGI script
      path = '/path/to/venv/bin/activate_this.py'
      execfile(path, {'__file__': path})
      application = ... # E.g. Django WSGI handler
      

    To summarize, virtualenv does only two things (create and activate an isolated environment) and does them well.

  5. Powerman

    (by the way, does anyone know what's the practical difference and why everyone seems to prefer using unix sockets?)

    • AFAIR Stevens in one of his books (APUE, etc.) says UNIX sockets are faster TCP. Not sure this is still actual on Linux, but that one of reasons why UNIX sockets become preferable way for local communications.

    • UNIX sockets allow you to easily tune access permission to your FastCGI/etc. service using file-level permissions.

    • With TCP you'll have to invent port number between 1024 and 32768, hardcode it in two places, that number must be unique between all TCP services on this server, etc. With noticeable amount of FastCGI/etc. websites it quickly become administration headache.

    • With TCP there is a chance your FastCGI/etc. service will be accessible from internet. This usually happens unintentionally, just because you didn't mind use 127.0.0.1 for binding socket, and don't close access to that port using firewall. This can became security issue, because: it make it easier to DoS your service; create whole new class of security attacks (on FastCGI/etc. protocol implementation in your service); allow access to your site ignoring any limitations/logs/setup in your webserver; etc.

    • With TCP your service always accessible by anyone from localhost.

    Probably there are other reasons which I don't remember right now. So, only real case when you wanna use TCP instead of UNIX sockets is when you run FastCGI/etc. service on different server than your web server, which usually happens when you've cluster with FastCGI/etc. services - and in this case all TCP downsides listed above usually not actual anymore.

  6. rnd

    To add to what Powerman already said on the matter, the reason that unix sockets are faster is precisely because it does not use TCP.

    When you initiate a TCP connection first the IP layer resolves the MAC address of the destination (I believe that this is a no-op when the destination is 127.0.0.1 however there is still the cost of making the call to the kernel just to get a no-op call. Then the TCP layer establishes a connection using a "3-way handshake" where the client sends the server a connection request, the server sends the client an acknowledgment to accept the request and the client sends the server an acknowledgment confirming that it is still listening thereby establishing the connection. That's 6 more kernel level calls, 3 for the client and 3 for the server. Once the connection is opened every packet that is received by either side has to be acknowledged by the receiver which is what makes TCP fault tolerant unlike UDP.

    When you create a unix socket it simply creates a special 'socket' file on the system. Instead of initialization of a possibly remote connection with a 3-way handshake all you do is open the socket file's file descriptor. Because the data written to the socket is held in memory which is being accessed via the file descriptor by both the client and the server there is no need for acknowledging every write to that file descriptor. Just like with any file, if a file that is open in one process is written to by another process then the second process will receive a kernel event notifying the process of the change allowing for bi-directional communication.

    Also I wanted to note that an easy way to fix your socket permissions between maniac and www-data would be to add the maniac user as a member of the group www-data, then set the socket to be owned by user maniac and group www-data and set the permissions to 660. Right now with permissions set to 777 it means that any user can both write to and read from that socket so if someone breaks in to the system they can pound uWSGI with requests regardless of any Nginx rate limiting, even just as the user 'nobody'.

    Finally, I personally would suggest you use gUnicorn (optionally with Gevent workers) over uWSGI, and the gUnicorn documentation is great!

  7. Ivan Sagalaev

    OK, I don't want to get rid of comments on this blog anymore, thanks guys! Some answers in bulk:

    Сергей Петров:

    Man should write simple service scripts for his own needs

    Setting up a Python environment for a single-task single-user server seems to be the kind of thing that shouldn't require writing any custom glue code. I was thinking of virtualenv as a sort of solution that will help me to get rid it.

    Danila:

    I should either "activate" my single environment all the time

    Yeah, in your shell login script. :)

    This is effectively the same as setting PYTHONPATH in the same shell script, no? What's the point then? :-)

    vlasovskikh:

    It provides a good alternative to user site-packages and manual PYTHONPATH modifications, as you can use packages only from a virtualenv, ignoring system-level site pacakges.

    WHY??? Seriously, what's wrong with using system packages? If some of them gets obsoleted by a new version it's solved simply by installing the new version locally. But why would I want to actually disable access to system packages?

    # Your WSGI script
    path = '/path/to/venv/bin/activate_this.py'
    execfile(path, {'__file__': path})
    application = ... # E.g. Django WSGI handler
    

    Well, that's what I wanted to avoid: having to patch my wsgi.py and manage.py and have another few lines of easily forgettable code to maintain.

    OK, I'm slowly coming to a realization that virtualenv is simply not the thing I thought it was. I will have to use it because of pip and it looks that I will just have to trade my current set of PYTHONPATH tricks for a set of different tricks and get used to them.

    Technology sucks!

    P.S. Powerman, rnd, thanks for the explanation about sockets!

  8. vlasovskikh

    But why would I want to actually disable access to system packages?

    In order to be sure that your project contains all the necessary dependencies in its requirements.txt so it will work on other machines after deployment.

    I forgot to update deployment scripts a couple of times when I started using new dependencies locally just by importing them. Maybe it was just my weak memory, but IMHO it's better to catch all the import errors while running tests locally.

  9. ndru

    About virtualenv - you can try to use it with virtualenvwrapper, that helps a lot (http://www.doughellmann.com/projects/virtualenvwrapper/).

  10. uptimebox.myopenid.com

    First of all, I should ask what setuptools version do you have? Default installation prefix was changed to /usr/local in Debian years ago. Not sure about Ubuntu though.

    Anyway you can provide easy_install options to pip like this:

    pip install --install-option="--prefix=/tmp" django
    

    Django will be installed to /tmp prefix, e.g. /tmp/bin and /tmp/lib/python2.7/site-packages directories will be created. The same way you can provide pip with any easy_install options including --install-dir.

  11. rnd

    Your welcome for the explanation. As for Virtualenv I wanted to note that your argument of updating a package once updates it for everything is part of the reason you should use virtualenv. Say you have a Site that runs on Django 1.1 cdoe and you've missed or been ignoring the deprecation warnings. Now you want to make a new Django site using some new feature of Django 1.4 and install it system wide. The next time your Django 1.1 site reloads Django code (probably when a uWsgi worker is restarted) it will pick up the new Django 1.4 code where all those deprecation warnings become show stopping exceptions and now rather than working on your cool new site you have to perform maintenance on the old site just to get it running again (see: https://docs.djangoproject.com/en/dev/internals/release-process/#minor-releases). If you had been using isolated virtaulenvs then both virtualenvs would have their own copy of Django and this wouldn't have happened. That's a good example why virtualenv isolation is to your benefit.

    I second the recommendation that you should use virtaulenvwrapper. Here is how I usually setup my Python environment (the first command is specific to Debian/Ubuntu linux):

    $ sudo apt-get install python-setuptools
    $ sudo easy_install pip
    $ sudo pip install virtualenvwrapper
    $ mkdir -p ~/.pip/cache ~/Projects
    

    Add the following to the bottom of your ~/.bashrc file:

    if [ -x `which virtualenvwrapper.sh` ]; then
        PIP_DOWNLOAD_CACHE='$HOME/.pip/cache' #Optional, but speeds pip up.
        PROJECT_HOME='$HOME/Projects' # Required to use mkproject
        . `which virtualenvwrapper.sh`
    fi
    

    Now just run '. ~/.bashrc' (or just close the terminal and start a new session.) The first time you source .bashrc after this it will be a bit noisy while virtualenvwrapper sets up the files and folders it uses, but it only happens once. Now all the virtaulenvwrapper commands are available to you, make a virtualenv:

    $ mkproject --no-site-package mysite
    

    which will build and activate your virtualenv, to deactivate the virtualenv:

    $ deactivate
    

    And to go back to it just use:

    $ workon mysite
    

    virtualenvwrapper modifies your shell's $PATH (amongst other things) so you don't have to use absolute paths for python executables to run in the virtualenv:

    $ deactivate
    $ which pip
    /usr/local/bin/pip
    $ workon mysite
    $ which pip
    /home/{your_username}/.virtualenvs/mysite/bin/pip
    

    If you do a lot of PYTHON_PATH hackery you'll probably find add2virtualenv useful. I usually add my project directory to the python path:

    $ cdproject
    $ add2virtualenv .
    

    There are a handful of other handy commands virtualenvwrapper provides for you, check out the documentation: http://www.doughellmann.com/projects/virtualenvwrapper/

  12. Ivan Sagalaev

    The next time your Django 1.1 site reloads Django code (probably when a uWsgi worker is restarted) it will pick up the new Django 1.4 code where all those deprecation warnings become show stopping exceptions and now rather than working on your cool new site you have to perform maintenance on the old site just to get it running again

    Yes, I know how this stuff works in general :-). However this is not my case because Django upgrades don't happen to me unexpectedly — I do them whenever I feel like it. And when it does happen and Django does have some backwards-incompatible changes I usually don't care about the site being broken for some time. I know, this may sound shocking for some people :-). But I really do think that all this rage about 24/7/365 uptime is a little bit overrated when we're talking about personal sites.

    But even if I decide to be bothered with a smooth upgrade I will create another copy of Django and another branch of my code and run a separate site instance just in time for the upgrade. I may or may not use virtualenv for this temporary operation. But my point is that it doesn't have anything to do with running the site under virtualenv all the time. It just doesn't add any value.

  13. Roberto De Ioris

    About virtualenv, as others have already suggested, they are a great thing, but please do not modify your wsgi.py file or make incredible complex shell script to activate them. uWSGI has a simple --virtualenv option, allowing you to use them without headaches.

    And yes, the perceived complexity of deployments (independently by the language/platform used) is still a problem for programmers, this is why PaaS will probably make a lot of money in the near future :)

  14. Ivan Sagalaev

    uWSGI has a simple --virtualenv option, allowing you to use them without headaches.

    Yeah, I noticed that one :-). But I'd argue that it doesn't really solve the problem, it merely moves this piece of configuration from one file to another. If I move this project to another server with a different path to an environment then chances of me forgetting to alter this path are equal regardless of it being in wsgi.py or appname.ini. The only way to remove an explicit configuration is to replace it with an implicit conventions.

    Another problem is that there's one more entry point to my code — the manage.py script. I use it from shell and it is used to run periodic tasks. And I have to setup the environment for this script too. From this point of view having two very different ways of setting up the environment might be even worse than having two of them similar.

    But anyway, I've already solved my problem. I should probably blog about it :-).

  15. Roberto De Ioris

    I think you should invest a little bit more time in uWSGI specific features (if you use it only as a boring WSGI gateway you will find there are a lot more easier choice out there)

    Your sentence about virtualenv made me thing about that trick (i do not directly use it because i prefer to define my cron tasks directly in the app with the @cron decorator):

    http://projects.unbit.it/uwsgi/wiki/TipsAndTricks#Sharingvirtualenvwithyourappandcrontasks

  16. Ivan Sagalaev

    Thanks Roberto, I'll look into this!

  17. Andrey Popp

    I should either "activate" my single environment all the time or use explicit paths everywhere to run commands.

    I have these linkes in my .profile

    function activate() {
      if [ -s $HOME/.virtualenvs/$1/bin/activate ]; then
        . $HOME/.virtualenvs/$1/bin/activate
      else
        echo "No such env $1 in $HOME/.virtualenvs"
      fi
    }
    
    [ -s $HOME/.virtualenvs/default ] && activate default
    

    Also I use /usr/bin/env python in shebangs in my Python scripts.

  18. Leonid Borisenko

    Now, I don't exactly blame Ubuntu (or Debian?) maintainers who, after splitting uwsgi functionality into plugins, didn't think it necessary to include or recommend a single one of them to make uwsgi, you know, useful.

    Debian/Ubuntu uwsgi-core package (which is providing uWSGI binary) Suggests uwsgi-plugins-all metapackage, so don't say that uWSGI package maintainer didn't made a note about plugins' necessity.

    uWSGI binary could be useful as-is, without any plugin, so there is soft Suggests dependency on plugins metapackage, not Depends or Recommends. Anyway, even if uwsgi-core package will Depends on uwsgi-plugin-python, it will not help so much as user will have to enable plugin explicitly in configuration file ('plugins = python' line).

    uWSGI isn't a simple WSGI server (anymore). It's a modular general application server.

    However, I can understand frustration of a man/woman who installed the binary of server named uWSGI and found that it's not serving WSGI application as-is. Sorry about it.

    Also uwsgi package (the package named 'uwsgi') provides infrastructure for automatic starting of uWSGI instances (init.d script, common location for sockets and logs). If you are curious, take a look at extensive documentation at /usr/share/doc/uwsgi/README.Debian.gz

  19. Ivan Sagalaev

    Leonid, thanks for weighing in!

    You're right that "Suggests" and various docs are there. Technically. And I would gladly accept the blame of not noticing all this if I were the only one having this problem. But the mere fact that uwsgi wiki does have this warning about missing Python plugin and the fact that I did found the solution in a mailing list where someone else was having the same problem tells otherwise.

    I believe it can be solved with two things:

    • Fix the error message. I suppose the situation of someone trying to serve a particular protocol from a bare-bones server is detectable, and in this case a server could simply log an explicit warning about it. That would've saved a lot of trouble.

    • Have documentation directed to popular deployment environments. If many people run uwsgi on Ubuntu (and I believe it's true) then there should be a prominent document about running uwsgi on Ubuntu. If it turns out that many people try to run Django projects on uwsgi then make a patch for Django docs so they aren't misleading.

    Oh, and please don't take it as a go-ahead-and-do-it request! This is just my outsider's view, nothing more.

  20. Ivan Sagalaev

    Oh, and thanks for pointing me to the Debian docs on init.d scripts ant other stuff. I'll look into it!

  21. m0use.openid.com

    Why don't you use mod_wsgi+httpd instead of uwsgi? Is there any reasons? For me httpd+mod_wsgi solution looks simplier, because of:

    • HTTP between frontend and backed
    • mod_msgi can be configured for running different sites with different users (uwsgi too, but we should write init.d scripts manually)
    • httpd looks more stable and customizable solution

    What do you think about that?

  22. Ivan Sagalaev

    The first and foremost reason to not use Apache *) is that its multiplication model based on processes and threads is way more resource hungry for the given task than the combination of an async proxy + dynamic backend. I wouldn't go into details in this comment, this is a very well researched topic.

    Also some things you mentioned do not exactly look as advantages to me:

    • uwsgi protocol was specifically designed to be more efficient than FastCGI or HTTP between a frontend and a backend. Though I don't know how well it fullfills this goal. But anyway, this is not a bottleneck in my case, I'll choose smaller memory footprint over faster protocol any day.

    • User configuration in uwsgi is also per-site, you don't have to touch the master daemon for this. Also, nobody writes init.d scripts manually these days, you either use the already provided one or write a simple upstart config.

    • And "more stable" claim is just too vague to be taken seriously, sorry.

    *) I know that "httpd" is more correct but I prefer to use the more widely accepted name

Add comment