In my last article, I explained how to use squid as part of a high performance CGI Proxy hosting platform. At one point in the article, I suggested you use the UFS file system because I hadn’t resolved issues with the other file systems available in squid. I’ve spent a few weeks tweaking disk performance in squid, and have some useful suggestions now. One thing that gave me helpful insight was the book Squid: The Definitive Guide.
What I found is that I was having an i/o bottleneck on my servers. Many of my servers have the following disk setup:
2x 500gb samsung sata
2x 500gb western digital sata
The samsung hard drives just weren’t fast enough, and though my western digitals were faster, they were still causing issues. Though I could tend to ignore this issue on my lower end servers, on my new core2quad intel cpus, I would hit an obvious disk bottleneck at about 2/3 of what the server could handle cpu wise. I didn’t want to give up caching, so I had to investigate a few things.
Before I go further, I should mention that I have a couple of servers now with 750gb western digital hard drives, and those drives run much much faster. Though you’ll still want to tweak your disk systems, if I could do it all over again, I would put 2 x 750gb WD drives in my servers instead of 4x 500gb drives. Performance would easily be double.
First of all, I found that reducing the disk cache size freed up a lot of ram and also reduced disk io substantially. I had previously used 100gb of each 500gb disk (400gb total) for caching. Squid has to store a bit of information about each cached item in ram, and with this setup, that amounted to 2gb. Luckily I had 8gb ram in each server, but this is still more usage than I would like. Reducing cache size means fewer files to shuffle into and out of the cache, as well as increases the likelyhood that a disk item is already cached in memory by the file system. A 20gb squid cache on a server whose operating system has allocated 4-6gb for file caching will see very little io wait.
Since reducing the disk cache substantially wasn’t really my goal, I decided to dig deeper. The book I mentioned earlier helped here. It mentioned three major things. First of all, explained how the squid file systems available work. Secondly, gave some tips on optimizing the performance of linux file system options. Third and most important, it showed performance benchmarks for these different options.
First I’ll explain the four squid file systems and some of my experience with them.
1) UFS. This is the tried and true file system for squid. It is also the slowest. It exists within the main squid process, so when I/O operations are performed, no other work may happen. It also means even if you have multiple hard drives, only one may be accessed at a time. This is obviously bad, but surprisingly, it can sometimes work to your benefit if your disk is especially slow servicing multiple simultaneous requests, as is the case with many ata drives and some poorly designed raid 5 configurations.
2) AUFS. Instead of existing within the main squid thread, it spawns extra threads to take care of disk accesses. The A stands for Asynchronous UFS. On some operating systems, the functionality to make this possible does not exist or is not enabled by default. In my fedora 6 / 7 installs, this has not been an issue. Because disk access happens in separate threads, squid can still handle requests and other work while it is waiting on disk accesses. This makes cache misses process faster in particular. If on your server, squid’s single thread CPU usage is the bottleneck, then you can squeeze out a little more this way on a multi core CPU.
If you decide to use AUFS you should tweak the number of threads used for disk access when you compile squid. By default, squid will spawn quite a few threads for AUFS, which in some cases may degrade performance. The reason for this is with too many threads, you are in effect telling the hard drive to do a bunch of stuff at once. Linux file systems mitigate this problem somewhat by ordering disk accesses properly. However, ATA drives and many SATA drives could more effectively handle requests in a more serial fashion. It should be mentioned, SCSI drives and to a lesser extent SATA drives with NCQ do not suffer as acutely from this problem.
3) DiskD. DiskD is basically the same as AUFS, except that it spawns a seperate process for disk access, one process per cache directory, instead of threads. This is more compatible on some operating systems, but does not perform quite as well. In particular, I find rebuilding the squid cache takes forever with DiskD, whereas AUFS does not have this problem. Squid must rebuild the disk cache whenever it starts or restarts, so this can cause really poor performance for an extended period of time. For this reason, and because performance of AUFS is better, I cannot recommend DiskD.
4) COSS. Coss is the shining star here. It is a little more complicated than the other disk systems, and harder to understand, but definitely worth it. Basically, instead of having a big batch of files, each one requiring seperate file i/o operations, it throws your entire cache in a single file that it navigates in a circular fashion. If you want to know more about this file system, I suggest reading the book I mentioned or looking up more online. Because it does not have to delete files, or constantly open and close file handles, it is miles ahead of any of the other file systems. I have found that I can get half the disk i/o wait from COSS as I can from the best tuned alternative. The worst setup I had was many times slower than COSS.
The downside is that you have to do things just a bit different for COSS. First, the default maximum file size is 1 megabyte. This can be changed at compile time for squid, or possibly at runtime, but I haven’t gotten that far yet. Also, COSS has to create one big file ahead of time. Therefore, “creating the cache directories” will take a long time. Don’t stop this process or your cache won’t be the right size. Also, importantly, COSS only has a 24 bit counter for blocks inside of the cache file, so the size of your caches is limited. You can tweak this by changing the block size. I use a block size of 4096 bytes, which allows somewhere around 60gb file cache. You can always use multiple COSS cache_dirs, even on the same hard drive, to work around this issue.
An example configuration line I use for COSS is as follows:
cache_dir coss /sdd/squidcache/mycossd 50000 max-size=1000000 maxfullbufs=4 membufs=20 block-size=4096
Two more important things. First, COSS doesnt use an actual cache directory, it uses a cache file. This is important because the cache swap log would normally be stored inside the cache directory. To solve this, I have a line like this:
Secondly, COSS requires large file support in Squid. This can be put in at compile time. When you run ./configure when compiling squid, here are the extra options you need:
./configure –enable-storeio=coss –with-large-files
If you want to support other file systems, use a line like this:
./configure –enable-storeio=diskd,aufs,ufs,coss –with-large-files
Although using COSS will give a big performance boost, it is also helpful to optimize your linux file system.
There are two basic things you can do to increase performance on your linux system. One is to disable journaling (using the ext2 file system instead of ext3 or a more exotic file system). The other is to disable the file system updating the last-accessed-times on your files.
Here is a link to instructions on how to disable journaling to convert ext3 to ext2:
It should be noted that the last line to delete .journal has been unnecessary in my experience. It is also important to note that to make these changes work, you also need to edit the /etc/fstab file, and change the mount instructions to “ext2″ instead of “ext3″. In that file is also where you can disable access time updates. Your fstab file might have a line like this for your main disk drive:
/dev/VolGroup00/LogVol00 / ext3 defaults,usrquota 1 1
On my systems, I also have some extra lines for my extra drives:
/dev/sdb1 /sdb ext3 defaults 1 2
/dev/sdc1 /sdc ext3 defaults 1 2
/dev/sdd1 /sdd ext3 defaults 1 2
After you remove journaling, you need to edit those lines that say ext3 to say ext2. You can also add an option like so to remove access time updates:
/dev/sdb1 /sdb ext2 defaults,noatime 1 2
/dev/sdc1 /sdc ext2 defaults,noatime 1 2
/dev/sdd1 /sdd ext2 defaults,noatime 1 2
Between those two changes you will see a tangible increase in hard disk performance with squid, regardless of what squid file system you choose. If you don’t want to disable journaling or access times on your primary hard drive partition, you should consider creating a partition exclusively for the squid cache. For various file system optimization reasons, you should make your partition 20-30% bigger than the amount of disk space you actually intend to use. At the very least, allow at least 10% free disk space on the partition.
Although this article is rather rough and could do with better formatting and editing, I hope the information contained in it has been useful for your caching squid server.
Leave a Reply
You must be logged in to post a comment.