Previously I wrote an article on setting up squid, and after that wrote a few things about the various disk storage methods squid makes available. I also touched upon how squid was not always reliable, and how I made some scripts to deal with that. I have since refined these scripts, and now enjoy very few squid related problems leading to downtime. In this article I share these scripts with you. For disk storage, I said that COSS was the clear winner, and though this is generally true, there are a few more things to keep in mind.
COSS is great if you need to cache a large number of small files that do not exceed a certain total size. There are a few reasons for this. First of all, COSS can only access a certain number of total objects per storage location. In order to have a large cache size, you need to increase the maximum allowed size of each object, and increase the size of your stripes. This necessarily increases ram usage. It also has a limit, so that beyond about 100gb (I forget the exact value), you cannot really have a COSS partition bigger than that.
Furthermore, when you start or restart squid, you must rebuild all your cache directories. For file storage methods of AUFS or UFS, this happenes reasonably fast. For DISKD and COSS, it does not. For COSS in particular, to rebuild the cache information, it has to read the entirety of the storage space allotted to it. If you have slow disks or you have a large amount of cache space used, this will take an unreasonable length of time. I typically allot only 50gb for COSS for each drive in my system, for these reasons.
The reason COSS is so fast is that everything operates upon a single file. There is no file system overhead with creating or deleting each file that is being stored. Turning off journaling and last accessed time can reduce these overheads in the other file system methods, but COSS is still the winner here. The smaller and greater number of files you have, the more overhead is used with the non-COSS storage methods.
Conversely, if you have a smaller number of large files, the advantages of using COSS are minimized. Additionally, the inflexibility of COSS for files of widely varying or large size is a big disadvantage in this case. In the case that you are commonly caching files of 1 megabyte or greater, or you want to use hundreds of gigabytes on each disk for squid caching, AUFS with journaling and last-accessed-times turned off is your best bet.
Another issue with rebuilding the squid cache comes in during log file rotation. Normally, you can continue to use squid even if it is rebuilding its cache directories, albeit with a bit slower performance. If you have a script rotate your squid logs (as you should have), and the squid cache is rebuilding when you are rotating your logs, squid will not accept any more connections until it has finished rebuilding the storage. This is a big problem if your squid needs to rebuild periodically (due to squid cashing or restarting or even just rebooting the server). Because it can take 20 minutes or so to rebuild a 50gb cache size, the chances of the log rotation overlapping with your rebuild is fairly high.
For these reasons, I have created a script that, when rotating your squid logs, will check to see if a rebuild is in progress before rotating the logs. If a rebuild is in progress, it will not rotate the logs.
A further problem I had run into, is that squid will not rotate the logs if it could not find it’s PID file. For some reason this file will sometimes get deleted. Therefore, as part of rotating the logs, I have created a script that will rebuild the PID file.
A copy of the new rotatesquid script is available here: http://pastebin.ca/836062
Obviously you will have to change the directory referenced if you have your squid logs somewhere other than /home/squid/var/logs
This much improved squid log rotation script will save you from losing logfiles. In our previous method, we had a script that would, if it detected squid was not running, delete the last logfile (because a common squid crash was having a log file over 2gb), and then restart squid. By rebuilding the PID file, we are able to successfully rotate the squid logs all of the time, reducing crashes and logfile deletions. Furthermore, COSS requires you build squid with the ability to handle greater than 2gb file sizes. This means that if the PID file was missing, the log file could eventually reach 30gb or more. This is definitely not something we want.
Also, it occurred to me that deleting the logfile if squid was crashed was not the best way to deal with things. I have changed the ’squidup’ script to now rename the active logfile instead of deleting it, as part of the process of bringing up squid when it is down.
The new version of squidup is here: http://pastebin.ca/836071
This particular script makes sure both apache and squid are running. If squid is not running, it stops apache, renames the logfile, deletes the logfile (if renaming it failed), starts squid, and starts apache. If your squid logs are somewhere other than /home/squid/var/logs, then change the script accordingly.
Finally, one failure condition that had plagued me for a while was a steady loss of file descriptors. On some of my servers, over time, file descriptors will leak. Once file descriptors reach a critically low level, squid will become unresponsive and this may even take down the entire server. I wrote a script that will deal with this issue by checking to see how many squid file descriptors are available, and, if squid has not been restarted too recently, and the descriptors are too low, it will restart squid.
In order for this script to work you need a couple things. First of all, you will need snmp and snmp-tools installed on your server. You will also need squid set up to listen to snmp requests. Finally, you need to create a file with world read/write permissions so that the checking script can check and update the last time squid was restarted.
So here we go:
yum install net-snmp
yum install net-snmp-utils
mv /etc/snmp/snmpd.conf /etc/snmp/snmpd.conf.huge
Insert the following (changing the password to whatever you want)
proxy -v 1 -c password localhost:3401 .184.108.40.206.4.1.3495.1
Start snmp and configure it to run as a service:
service snmpd start
chkconfig –level 2345 snmpd on
Squid needs to be configured for snmp as well, and Squid must be compiled with the –enable-snmp switch
In your squid.conf file, include the following:
acl snmpcommunity snmp_community password
snmp_access allow snmpcommunity localhost
snmp_access deny all
If your firewall is for some reason blocking snmp, you could resolve that with the following:
mv /etc/sysconfig/iptables /etc/sysconfig/iptables.save
Finally, you need to create the file “lastrestart” in the same directory as the checking script and chmod it to 666.
Now you should be ready to run the script. The checking script, which I call “descriptorcheck” is here: http://pastebin.ca/836083
For descriptorcheck to restart squid, you need a script located at /root/restart with the following:
The above script assumes you have the “squidup” script located at /home/funky/squidup
In my copy of the descriptorcheck script, I set the limit for descriptors as 4000, and the earliest it can restart squid is 20 minutes after the last time it restarted it. In the case of a date rollover, where the last restart time is in the future of the current time, we assume that the last restart was more than 20 minutes ago. If you want to change the lower limit for descriptors or the maximum restart frequency, you can do that in the script.
Points to remember:
AUFS is good for large but few files, or excessively large cache directories
COSS is good for many small files, and cache directories of no more than 100gb.
Rotating your squid logs is important, but make sure not to rotate them if the cache storage is rebuilding.
If you run out of descriptors you are headed for trouble. Sometimes descriptors leak and the only way to fix this is to restart squid. Having a script do this for you automatically will save you downtime and hassle.
All scripts that run should have permissions (chmod) of 755.