trvrm.github.io
Systems
Streaming PHP Output with FPM and NGINX
Sun 25 October 2015
Problem: We had a PHP report that takes around 45 seconds to render. To give the user feedback that their report was actually being generated, I wanted to change our PHP installation to deliver at least some output to the client as soon as possible, rather than waiting until the entire page was rendered.
In general, PHP and NGINX work hard to buffer output, so this was mostly an exercise in fighting with PHP and NGINX to make them stop doing optimizations that do, in general, make sense. It turns out that to do this, we had to make changes at three levels of our software stack: in our PHP code, in our PHP configuration settings, and in our NGINX configuration.
I added this at the top of the file in question:
ob_implicit_flush(1);
This tells PHP to simulate calling flush()
after every output block.
I added a new .ini file in /etc/php5/fpm/conf.d
with the following setting:
output_buffering = Off
This tells PHP not to buffer output.
In the .conf file for the reporting site, I added
fastcgi_keep_conn on;
gzip off;
in the location block, and
ssl_buffer_size 1k;
in the server block. This last one took me a while to figure out - it defaults to 16k, and even with all the other changes made, when accessing the site via HTTPS, NGINX will still buffer your output.
This Stack Overflow question got me pointed in the right direction.
Configuring Systems with Fabric and Cuisine
Thu 01 January 2015
I've mentioned Fabric before on this blog. Because so much of my development time is spent in Python, it makes sense for me to look for system administration tools that are also written in Python. Fabric fits the bill perfectly, and allows me to run tasks remotely on multiple machines simultaneously.
A useful addition to Fabric is Cuisine. Cuisine is a small set of functions that sit on top of Fabric, to abstract common administration operations such as file/dir operations, user/group creation, package install/upgrade, making it easier to write portable administration and deployment scripts.
Thu 01 January 2015
I have a large, complex mailing system that processes a significant amount of data every hour. While I'm developing it, I want to know what it's doing, and whether it's having any problems. So I use the excellent python logging library to produce comprehensive monitoring data.
The only problem is that these log files can get pretty big. And because I don't know ahead of time when I'm going to need to hunt through them, I tend to leave the logging system in a fairly verbose state.
Enter logrotate. This is a standard service on Ubuntu that regularly rotates your log files, throwing away old data, compressing middle-age data, and leaving young log files fresh and accessible. Thus you are protected from runaway log file growth and nasty calls in the middle of the night from your monitoring service telling you that your server just died because the hard drives were full.
A default ubuntu installation comes with logrotate already set up for various services.
If you don't have it, install it with
apt-get install logrotate
, and
then it's mostly just a question of copying a file from
/etc/logrotate.d/
and modifying it according to your needs.
vi /etc/logrotate.d/myservice
/var/log/myservice/*.log {
rotate 7
daily
compress
missingok
notifempty
}
And that's it! The actually invocation of the
logrotate
command will
get triggered regularly by a script in /etc/cron.daily
You can also force a rotation, a useful option when testing out a new configuration, via
logrotate -f /etc/logrotate.d/myservice
One quick word of warning: if you're using the python logging library, then
you'll want to use the
WatchedFileHandler
class. If the logfile gets
rotated out while it's in use, WatchedFileHandler will notice this, close the file
stream and open a new one.
Thu 01 January 2015
The documentation for Nginx and UWSGI is long and complex, but with ubuntu 14 it's actually pretty straightforward to get them up and running.
I present here a a setup that uses nginx and uwsgi emperor to host multiple python web applications simultaneously on an ubuntu 14 machine.
First, the packages
$ sudo apt-get install nginx
$ sudo apt-get install uwsgi uwsgi-emperor uwsgi-plugin-python
Our configuration files will now be under
/etc/nginx
and
/etc/uwsgi-emperor
You can start, stop, and reload nginx as follows:
$ sudo service nginx start
$ sudo service nginx stop
$ sudo service nginx reload
The last command is useful when changing configuration settings.
Now set up a site by creating a file in
/etc/nginx/sites-available
#/etc/nginx/sites-available/mysite
server{
server_name your_host_name;
location /app1 {
uwsgi_pass unix:/tmp/app1.socket;
include uwsgi_params;
}
location /app2 {
uwsgi_pass unix:/tmp/app2.socket;
include uwsgi_params;
}
}
Then,
$ sudo ln -s /etc/nginx/apps-available/mysite /etc/nginx/sites-enabled
Warning
A previous version of this tutorial had the sockets placed in
/run/uwsgi
.
This was a mistake, because under Ubuntu
/run
is mounted as a
tmpfs
, and its content will be deleted on reboot
Your uwsgi sub-directory will vanish and the uwsgi services will not restart.
Next, set up your 'vassals' (http://uwsgi-docs.readthedocs.org/en/latest/Emperor.html)
Create
/etc/uwsgi-emperor/vassals/app1.ini
as follows:
[uwsgi]
plugin = python
processes = 2
socket = /tmp/app1.socket
chmod-socket = 666
chdir = /srv/app1
wsgi-file = /srv/app1/main.py
uid = www-data
gid = www-data
And for your second application, create
/etc/uwsgi-emperor/vassals/app2.ini
as similarly:
[uwsgi]
plugin = python
processes = 2
socket = /tmp/app2.socket
chmod-socket = 666
chdir = /srv/app1
wsgi-file = /srv/app2/main.py
uid = www-data
gid = www-data
The simple act of creating or touching a .ini file in
/etc/uwsgi-emperor/vassals
will cause
the emperor process to try to restart your application.
Of course, your applications don't exist yet, so let's create them. The simplest wsgi application can be only a few lines long:
Create
/srv/app1/main.py
def application(env, start_response):
start_response('200 OK', [('Content-Type','text/html')])
return ["Hello World, I am app1"]
And
/srv/app2/main.py
def application(env, start_response):
start_response('200 OK', [('Content-Type','text/html')])
return ["I, however, am app2. "]
And that's it!
Visiting http://your_host_name/app1 or http://your_host_name/app2 should return the text you put in the python files.
Thu 01 January 2015
Postfix is a ghastly horror that really should be quietly eliminated. But that truism hides a deeper issue - email itself is a ghastly horror, the result of 30 years of hacks, edge-cases, non-conformant implementations and competing design constraints, that only persists because we still haven't come up with anything better.
Take the simple question 'what is a valid email address'.
I have a couple of standard regexes I use to validate email addresses,
such as
^[A-Z0-9._%+-]+@[A-Z0-9.-]+ .[A-Z]{2,4}$
,
but having spent years building mailing systems,
the best answer I can come up with is
a valid email address is one that gets delivered to its destination.
So, Postfix. If you ever need a taste of purgatory, take some time to browse through
the source code, a terrifying mix of old-school C and Perl. I once needed to create
a utility to monitor the state of a Postfix DEFERRED queue, and found it was vastly
easier to write my own queue parser in Python than understand the source to the existing
qshape
utility. Whoever wrote that clearly has an aversion to variable names
that are more than one character long.
Actually configuring a functional Postfix system requires commiting yourself to a long pilgrimage across the net, picking up scattered bits of wisdom from the hardy travellers who have passed this way before, and recorded their insights in ancient forgotten blog posts and wiki pages.
Documents such as http://www.howtoforge.com/virtual-users-domains-postfix-courier-mysql-squirrelmail-ubuntu-10.04 include gems like 'this howto is meant as a practical guide; it does not cover the theoretical backgrounds'.
You have been warned, it seems to be saying. Configuring this system really requires at the very least a Masters degree in Advanced Email Hackery.
That image is from this article at Linux Journal, which is actually a pretty good and comprehensive introduction to the architecture of Postfix
The key insight that everything else hangs on is that Postfix is not a program. Postfix is a large collection of programs: some of which interact with the user, and a large number which run in the background and perform all the various tasks of gathering, processing and delivering email.
These programs, together with a rather complex set of folders under
/var/spool/postfix
that store messages as they work their way through
the system, and another set of rather complex configuration files under
/etc/postfix
, are what comprises the
complete mail delivery system.
To add to the fun, many postfix configuration settings can be stored in MySQL, and it's possible to run multiple postfix instances in parallel with each other.
$ pstree -a
├─master
│ ├─anvil -l -t unix -u -c
│ ├─pickup -l -t unix -u -c
│ ├─qmgr -l -t unix -u
│ ├─smtpd -n smtp -t inet -u -c -o stress= -s 2
│ ├─smtpd -n smtp -t inet -u -c -o stress= -s 2
│ └─tlsmgr -l -t unix -u -c
├─master
│ ├─anvil -l -t unix -u -c
│ ├─pickup -l -t unix -u -c
│ ├─qmgr -l -t unix -u
│ ├─smtpd -n smtp -t inet -u -c -o stress=
│ └─smtpd -n smtp -t inet -u -c -o stress=
├─master
│ ├─anvil -l -t unix -u -c
│ ├─pickup -l -t unix -u -c
│ ├─qmgr -l -t unix -u
│ ├─smtpd -n smtp -t inet -u -c -o stress=
│ └─smtpd -n smtp -t inet -u -c -o stress=
├─master
│ ├─anvil -l -t unix -u -c
│ ├─pickup -l -t unix -u -c
│ ├─qmgr -l -t unix -u
│ ├─smtpd -n smtp -t inet -u -c -o stress=
│ └─smtpd -n smtp -t inet -u -c -o stress=
├─master
│ ├─anvil -l -t unix -u -c
│ ├─pickup -l -t unix -u -c
│ ├─qmgr -l -t unix -u
│ ├─smtpd -n smtp -t inet -u -c -o stress=
│ └─smtpd -n smtp -t inet -u -c -o stress=
That's a lot of processes....
And yet, despite this, Postfix seems to be about the best there is. Over the last few years I've built and maintained a massively parallel mail delivery, management and monitoring system on top of Postfix that, at the last count, had successfully delivered almost 10 million messages for clients of Nooro Online Research.
Remote Systems Administration with Fabric
Thu 01 January 2015
A tool I'm finding myself using more and more these days is Fabric.
Fabric is a python utility and library for streamlining systems administration tasks on multiple machines.
Although I'm primarily a developer and systems architect, I find that as our company grows I keep getting drawn into sysadmin and devops tasks. We now have quite a number of servers which I need to monitor and manage.
Now, as Larry Wall so profoundly said, one of the great virtues of a programmer is laziness, so if there's a way I can perform the same task on multiple machines without having to manually type out the commands dozens of times, then I'm all for it.
Fabric does precisely that. It's written in Python, and its configuration files
are python scripts, so there's no need to learn yet another domain-specific language.
I have a file called
fabfile.py
on my local machine that contains a growing
collection of little recipes, and with this I can interact with all the servers
in our infrastructure.
So for example, if I want to see at a glance which version of linux I'm running on all my servers, I have a task set up in my fabfile like this:
@parallel
def lsb_release():
run('lsb_release -a')
When I invoke this via:
$ fab lsb_release
I get a nice little print-out of the current version of all my servers. Fabric
runs the task in parallel against every host in the
env.hosts
variable,
which can be set at the command line or in the fabfile.
The following example allows me to see a list of every database running on every database server.
DATABASE_HOST_MACHINES=['dbserver1','dbserver2',...]
@hosts(DATABASE_HOST_MACHINES)
def databases():
with warn_only():
run('psql -c "select datname from pg_database;"')
And voila, a call to
$ fab databases
gives me a company-wide view of all our databases.
One further demonstration - this blog itself is generated using Fabric! For details, see the fabfile my Github repository.