In a recent article, I explained the Benefits of Squid Caching to Accelerate CGI Proxy. In that article, you saw that the particulars of CGI Proxy lend themselves well to acceleration using squid, primarily to save on ram usage, but also with some tangible cpu usage reductions. Today’s article follows up on that information to actually explain how to get this done. With squid, you’re in for a wild ride, but, with all that this flexible tool has to offer, it will open up a world of new possibilities in scalable and high performance web solutions.
I’ve found one of the hardest things regarding squid is making sense of the documentation. Most of it is out of date or incomplete, and squid was not exactly designed with this purpose at the front of it’s mind. As for versions, Squid 2.6 is the latest stable version available, and detours greatly from the syntax and capabilities of 2.5 and earlier, which most of the documentation is written for. Furthermore, as 3.0 is not yet advisable for production use, 2.6 is really the best choice. 2.6 gets most of its syntax from the 3.0 branch, but does not support all of its features. If you must venture into 2.6 without knowing what you are doing, the best source of information is the 3.0 documentation. The best source of knowhow on the subject of squid is the #squid channel on irc.freenode.net. From my experience, the people in that channel are helpful and knowledgeable.
There are a few main hurdles you’ll need to get over if you want to embark on converting your cgi proxy to support squid. Many of these may seem rather basic, and the end result is not that impressive from a code standpoint, but I have spent nearly a year working out solutions to these issues. Running proxies myself, I questioned whether it was prudent to give away this competitive advantage that I have, especially when it took so much work to get here. I guess I’m saying, be thankful for what I’m giving you. Theres no autoinstall script, so doing this will require reading, learning, experimentation and some skill, but if you’re willing to put in a few hours of effort, you will get the benefit of nearly a year of my experience with squid.
Without further ado, the tasks ahead of you are:
* Compile and Install Squid
* Configure apache to listen on an alternate port (in our examples, 82)
* Configure Squid in web accelerator mode
* Modify CGI Proxy to think that incoming connections on our alternate port (82) are really coming from standard port 80.
* Deal with those elusive friends, file descriptors (or lack thereof)
* Keep your logfile from becoming too big
* Make a script to keep squid / apache up, in the likely case something goes wrong
* Set up SPRI so that squid gets the higher priority it needs
* A few thoughts on abuse and logging
Advanced topics that I may (or may not) cover in a later article include:
* Supporting SSL sites via squid acceleration
* Using the URL_Rewriter function in squid to keep your cache namespace merged when it would normally splinter and become inefficient
* Modify CGI Proxy to support the above goal more effectively
* Explain various tweakable aspects of the squid configuration, and discuss some good potential optimizations
* Explain some clustering concepts I’ve tried with squid, and why to avoid them (I lovingly refer to this as the ‘cluster-fuck’)
Further room for improvements that I have not solved yet include:
* Avoid compile-permissions-hell when tweaking the URL_Rewriter function to be faster using inline C.
* Figure out how to get the best performance from the disk storage systems available to squid, while avoiding known issues
* Merge cache spaces across servers, and take advantage of clustering, without causing a “cluster-fuck”.
* Figure out how to avoid a severe performance degradation and ram leakage when hosting multiple proxy domains on the same server (this is a mod perl / apache problem).
So without further ado, lets get down to the dirty work:
* Compile and install squid
Squid is a little bit tricky in that it wants its own user and group, tries to put things in weird places that don’t always make sense, and has some performance issues with the default 1024 file descriptors. To get around this, we will be configuring squid at compile time to use more descriptors, install to /home/squid, and create the correct folders and permissions.
As of this writing, the newest version of squid is 2.6 stable 14. When I said there is no easy autoinstall I will be giving you, that is only half true. This portion of installing and configuring squid is best done automatically.
From shell on your server, run the following commands:
Wget http://www.squid-cache.org/Versions/v2/2.6/squid-2.6.STABLE14.tar.gz
adduser squid
tar –ungzip -xf squid-2.6.STABLE14.tar.gz
cd squid-2.6.STABLE14
ulimit -HSn 8192
./configure –prefix=/home/squid –enable-carp –enable-storeio=diskd,ufs –enable-ssl –enable-snmp
make
make install
chmod 777 /home/squid/var/
chmod 777 /home/squid/var/logs
/home/squid/sbin/squid -z
We can’t run it yet, but now, we have squid installed, ready to be configured. We downloaded and unzipped squid first of all, then added the user (squid) that we will be telling squid to use later. We then set the system maximum file descriptors to 8192 (plenty for all intents and purposes), immediately prior to compiling squid. If you don’t do this at this time, you cannot increase the file descriptors later. We then configured squid to install everything to /home/squid, and we enabled some things that you may or may not end up using. We then set the permissions on the logging and caching section so that squid won’t complain and die, and we ran the squid option to create the cache directories. All in 11 lines of shell.
Now that squid is installed, let’s work on the next problem:
* Configure apache to listen on an alternate port (in our examples, 82)
The file you will be configuring often, that you may have been insulated from by your favorite control panel, is httpd.conf, usually located at /usr/local/apache/conf/httpd.conf. Your favorite command line editor will help you here. I use The Joe Editor. Nano is also a good editor, and comes preinstalled in most linux distros.
In any case, open up your httpd.conf. There are a few things to look out for, that may tell your system to open port 80 when it shouldn’t.
Listen directives:
Listen 80
Listen 12.34.56.78:80
<IfDefine SSL>
Listen 80
Listen 443
</IfDefine>
Search your httpd.conf for these kind of entries. If you have a listen with an ip, but no port, that’s fine (though I don’t know if that’s considered valid in the first place). If you have a listen with just an 80 after it, change it to 82.
Port directive:
Port 80
This is the main port that apache listens on. With this port set, it will steal port 80 on every ip. Change this to 82
Virtualhosts:
NameVirtualHost 67.159.44.88:80
<VirtualHost 67.159.44.88:80>
ServerName fdc4.freeproxies.org
DocumentRoot /usr/local/apache/htdocs
</VirtualHost>
Unless you’re running SSL, your virtualhosts don’t need to be attached to a particular port. In the above examples, just remove the 80 entirely, don’t bother replacing it with 82. Make sure you don’t have duplicate NameVirtualHost entries with the same ip and port (or lack of port). Sometimes cpanel will make more than one of these entries for no particular reason.
That’s all you need to do to force apache onto the appropriate port, but, I suspect, it sometimes might steal port 80 anyway. I have a workaround for that we will discuss later.
Moving on…
It is not entirely straightforward from the docs to see how to configure squid for this purpose. What follows is what I put near the top of my squid.conf configuration file. Keep in mind I’m not 100% sure all of this is required, it’s just what I happen to have:
http_port __ipaddress__:80 vhost
cache_peer __ipaddress__ parent 82 0 no-query allow-miss originserver
acl notssl myport 80
cache_peer_access 67.159.45.233 allow notssl
acl porteighty port 80
acl porteighty port 82
acl alldest dst 0.0.0.0/0
acl acceld dst __ipaddress__/32
never_direct deny acceld
Farther down in the file, you’ll need to edit your security permissions. You don’t want your copy of squid acting as a spam relay! Basically, I took what rules they already had, commented out some, and added a couple. By virtue of using cache_peer with allow-miss and originserver, along with never_direct deny, you get rid of a lot of security problems anyway, because any request should be force fed to your apache server.
http_access allow manager localhost
http_access deny manager
#http_access allow manager
# Deny requests to unknown ports
http_access deny !Safe_ports
# Deny CONNECT to other than SSL ports
#http_access deny CONNECT !SSL_ports
http_access deny CONNECT
http_access allow all
http_access allow acceld
http_access deny alldest
You should also configure your cache directory. I use the UFS file system, because I had some stability issues with diskd. Here’s the line I’ve used on a vps I have. It gives me a 20gb cache. You can modify this to suit your needs:
cache_dir ufs /home/squid/var/cache 20000 16 256
If you have multiple drives, or want to put the cache somewhere other than where you initially installed, you’ll need to ‘chown squid’ the directory you’d like to use to do the caching. After creating and chowning the directory, you need to run squid with the -z option so it creates the necessary directory structure. i.e. /home/squid/sbin/squid -z
Keep in mind that having too large a cache will increase io-wait on your server. On many of my servers, I have four 500gb drives for caching, and I find it works well to use 100gb of space for cache on each. This gives about a 40% cache hit rate when used in conjunction with various cgi proxy modifications that help merge the cache namespace.
There’s a lot of things you can do to tweak squid I suppose, but that’s the basic install and it should serve you well.
* Modify CGI Proxy to not mind running on port 82:
CGI Proxy is no dummy, and it tries to figure out what port it is listening on and redirect all outgoing links to the same port. In order to make our accelerator work, we have to stop this.
Originally, you’ll have some code like this in CGI Proxy:
# If $RUNNING_ON_SSL_SERVER is ‘’, then guess based on SERVER_PORT.
$ENV_SERVER_PORT= $ENV{SERVER_PORT} ;
$RUNNING_ON_SSL_SERVER= ($ENV_SERVER_PORT==443) if $RUNNING_ON_SSL_SERVER eq ‘’ ;
You need to change it to this:
# If $RUNNING_ON_SSL_SERVER is ‘’, then guess based on SERVER_PORT.
$ENV_SERVER_PORT= $ENV{SERVER_PORT} ;
if ($ENV_SERVER_PORT == 82) { $ENV_SERVER_PORT=80; }
$RUNNING_ON_SSL_SERVER= ($ENV_SERVER_PORT==443) if $RUNNING_ON_SSL_SERVER eq ‘’ ;
That’s pretty much it.
* Deal with those elusive friends, file descriptors (or lack thereof)
Instead of being thread based like apache, squid has a single thread that uses multiple file descriptors to take care of all it’s communications. It needs some descriptors for each file it has open, as well as for each user connected to it and each connection it makes to the outside world. With this in mind, obviously you don’t want to run out of file descriptors.
The install script I gave you before is important because it has a line that sets the maximum file descriptors the software can use. You have to set this both at compile time and at runtime. If you compile the software with a lower (or system default) limit, it will never use the full amount until you recompile it with the higher limit. Also, if you start the software without first overriding the default, it will start the program with the lower value.
To get 8192 file descriptors, use the following command before compiling and before running squid:
ulimit -HSn 8192
Aside from the occasional bug that may cause you to run out of file descriptors (no matter how many you have), this will be plenty for any usage. If you have a hard time connecting to your server but it seems to not be heavily loaded, you may be running out of descriptors. The /home/squid/var/logs/cache.log file will have a notice of this if it happens.
* Keep your logfile from becoming too big
The default installation of squid will only deal with log files 2gb in size or less. You could recompile it to allow larger files, but a logfile over 2gb is pretty useless anyway, it will take too long to open and edit. In order to keep your server from crashing, you have to keep the logfile sizes lower.
I run a cron to rotate the squid logs every hour. This is a good way to deal with this problem. Unfortunately, squid sometimes doesn’t believe it’s running even when it is, so it won’t rotate the logs. In this case, the logs get bigger until squid eventually crashes. I know there’s a better solution out there, but I have a clunky workaround. I check to see if squid is running as a cron, and if it’s not, I stop apache, delete the current logfile, and restart squid and apache.
Here’s the short version of the rotate squid cron (should run hourly):
/home/squid/sbin/squid -k rotate
Just put that in a text file, chmod 755 it, and add it to your crontab with a line like this:
0 * * * * /home/funky/rotatesquid
Now for the second cron. I call this one squidup, because it helps keep squid up and running in the couple of cases where it might die:
#!/bin/bash
this=$(ps aux | grep httpd | awk ‘{print $11}’ | grep httpd | head -1)
if [ “$this” = “/usr/local/apache/bin/httpd” ]; then
echo “httpd was found”
else
echo “httpd was not found”
/etc/init.d/httpd restart
fi
thing=$(ps aux | grep squid | awk ‘{print $1}’ | grep squid | head -1)
if [ “$thing” = “squid” ]; then
echo “squid was found”
else
echo “squid not found”
/etc/init.d/httpd stop
sleep 4
killall httpd
sleep 4
killall -KILL httpd
sleep 4
rm /home/squid/var/logs/access.log
ulimit -HSn 8192
/home/squid/sbin/squid
sleep 4
/etc/init.d/httpd restart
fi
Again, put that in a text file, chmod 755 it, and set it to run as a cron. I run mine every two minutes:
*/2 * * * * /home/funky/squidup
* Set up SPRI so that squid gets the higher priority it needs
SPRI is a script that changes the priority of running threads automatically. It was originally available from RFX networks, but there is a bug in the script they haven’t fixed and aren’t apparently updating it. Marco aka TheWird, found out what the bug was and fixed it. Here is a copy of the fixed script that you can download.
To install spri, copy the attached file to your server, then execute the following commands:
tar –ungzip -xf spri-wird.tar.gz
cd spri-0.5/
./install.sh
After this, you need to set the priorities. in your favorite text editor, edit /usr/local/spri/prios/med-high and remove the line that says “squid”. Next you need to edit /usr/local/spri/prios/rt and add a line that says “squid”. This way squid will run with higher priority. Because there are many copies of apache running, and all of them are being fed by squid, if squid is not given higher priority, it will lag and your server will not process nearly the traffic that is possible.
Now you need to run spri as a cron. Add this to your crontab to make spri run every 45 minutes (you can change this to more often if you’d like):
*/45 * * * * root /usr/local/sbin/spri -q >> /dev/null 2>&1
I should be editing this blog entry, but, given how long everyone has waited, this should be good enough for now to get you up and running. Enjoy!