I’ve been saying it all along, that if we want our children to be educated, offering internet access without filters is the way to do it. Those who have internet access at home, who can therefore access such “time wasting” websites as myspace, facebook, and others, are better equipped to be successful citizens in the digital age. Now it seems a study from the University of Minnesota agrees with me.

Quote:

“What we found was that students using social networking sites are actually practicing the kinds of 21st century skills we want them to develop to be successful today,”

“Students are developing a positive attitude towards using technology systems, editing and customizing content and thinking about online design and layout,” Greenhow continued.

“They’re also sharing creative original work like poetry and film and practicing safe and responsible use of information and technology.”

Original Story

With the recent blockage of youtube.com, which has thus far last nearly two weeks, many of the residents and visitors of Turkey have been unable to access this very popular site. Thankfully, administrators at freeproxies.org were able to handle the massive influx of Turkish visitors looking to access youtube.com any way that they could.

In just one short week, access to youtube.com from Turkish visitors has more than doubled our total traffic and more than tripled our total bandwidth usage. Whereas youtube support typically peaks our usage at about 150 megabits per second, in recent days it has been using over 800 megabits per second. Luckily our servers and our service provider have been up to the task to ramp up our service very quickly. This, combined with spare capacity we had on hand to deal with such an event, have led to the site running more or less as expected, despite serving over 25 youtube videos per second during peak hours.

As others have reported,  censorship on the internet is not possible the same way it had been with older forms of media. Thanks to proxy sites like beatfiltering.com and vtunnel.com, and proxy lists like freeproxies.org and proxy.org, web visitors can quickly circumvent any attempted blockages with ease. With google having recently removed the videos that caused Turkey to decide to block youtube in the first place, it is expected that access to the site will be restored soon. In the likely case that Turkey or another country decides to block access to this or another site again, freeproxies will be here to help those affected.

Because SSL websites are very hard to block, SSL Access to Vtunnel.com has always been popular, and now makes up about 50% of traffic to that site.

In case Vtunnel.com is blocked anyway, however, we now offer SSL access to several other web based proxies that we host:

https://www.btunnel.com
https://www.ctunnel.com
https://www.dtunnel.com
https://www.vtunnel.com
https://www.ztunnel.com
https://www.polysolve.com
https://www.beatfiltering.com

Enjoy!

There were no weird problems when enabling javascript support, so I’ve decided to extend this to all the proxies hosted by freeproxies.org. Enjoy!

It’s been a long time coming, but, now Vtunnel.com supports javascript.

As some of you may know, I’ve been at the forefront of the proxy industry, starting with a base of CGI Proxy, and working from there to add advanced features that improve the user experience. Unfortunately, some of these features have been at odds with implementing the javascript support that comes ready to go in CGI Proxy.

I have now come up with a solution that allows javascript to work, so long as html obfuscation is disabled. So, now you can enable javascripts on websites like myspace that benefit greatly from that, and, when javascripts are disabled, still enjoy the enhanced unblock protection allowed by html obfuscation.

If all goes well with javascript support in vtunnel, I will enable support on all the other proxies I host as well. Enjoy!

There were some issues with meta refresh encoding that was making some sites such as gmail and ebuddy not work. This has been resolved, and many more sites should work that have not worked recently.

Please let me know if you have any issues with getting booted to the homepage during a browsing session.

Previously I wrote an article on setting up squid, and after that wrote a few things about the various disk storage methods squid makes available. I also touched upon how squid was not always reliable, and how I made some scripts to deal with that. I have since refined these scripts, and now enjoy very few squid related problems leading to downtime. In this article I share these scripts with you. For disk storage, I said that COSS was the clear winner, and though this is generally true, there are a few more things to keep in mind.

COSS is great if you need to cache a large number of small files that do not exceed a certain total size. There are a few reasons for this. First of all, COSS can only access a certain number of total objects per storage location. In order to have a large cache size, you need to increase the maximum allowed size of each object, and increase the size of your stripes. This necessarily increases ram usage. It also has a limit, so that beyond about 100gb (I forget the exact value), you cannot really have a COSS partition bigger than that.

Furthermore, when you start or restart squid, you must rebuild all your cache directories. For file storage methods of AUFS or UFS, this happenes reasonably fast. For DISKD and COSS, it does not. For COSS in particular, to rebuild the cache information, it has to read the entirety of the storage space allotted to it. If you have slow disks or you have a large amount of cache space used, this will take an unreasonable length of time. I typically allot only 50gb for COSS for each drive in my system, for these reasons.

The reason COSS is so fast is that everything operates upon a single file. There is no file system overhead with creating or deleting each file that is being stored. Turning off journaling and last accessed time can reduce these overheads in the other file system methods, but COSS is still the winner here. The smaller and greater number of files you have, the more overhead is used with the non-COSS storage methods.

Conversely, if you have a smaller number of large files, the advantages of using COSS are minimized. Additionally, the inflexibility of COSS for files of widely varying or large size is a big disadvantage in this case. In the case that you are commonly caching files of 1 megabyte or greater, or you want to use hundreds of gigabytes on each disk for squid caching, AUFS with journaling and last-accessed-times turned off is your best bet.

Another issue with rebuilding the squid cache comes in during log file rotation. Normally, you can continue to use squid even if it is rebuilding its cache directories, albeit with a bit slower performance. If you have a script rotate your squid logs (as you should have), and the squid cache is rebuilding when you are rotating your logs, squid will not accept any more connections until it has finished rebuilding the storage. This is a big problem if your squid needs to rebuild periodically (due to squid cashing or restarting or even just rebooting the server). Because it can take 20 minutes or so to rebuild a 50gb cache size, the chances of the log rotation overlapping with your rebuild is fairly high.

For these reasons, I have created a script that, when rotating your squid logs, will check to see if a rebuild is in progress before rotating the logs. If a rebuild is in progress, it will not rotate the logs.

A further problem I had run into, is that squid will not rotate the logs if it could not find it’s PID file. For some reason this file will sometimes get deleted. Therefore, as part of rotating the logs, I have created a script that will rebuild the PID file.

A copy of the new rotatesquid script is available here: http://pastebin.ca/836062

Obviously you will have to change the directory referenced if you have your squid logs somewhere other than /home/squid/var/logs

This much improved squid log rotation script will save you from losing logfiles. In our previous method, we had a script that would, if it detected squid was not running, delete the last logfile (because a common squid crash was having a log file over 2gb), and then restart squid. By rebuilding the PID file, we are able to successfully rotate the squid logs all of the time, reducing crashes and logfile deletions. Furthermore, COSS requires you build squid with the ability to handle greater than 2gb file sizes. This means that if the PID file was missing, the log file could eventually reach 30gb or more. This is definitely not something we want.

Also, it occurred to me that deleting the logfile if squid was crashed was not the best way to deal with things. I have changed the ’squidup’ script to now rename the active logfile instead of deleting it, as part of the process of bringing up squid when it is down.

The new version of squidup is here: http://pastebin.ca/836071

This particular script makes sure both apache and squid are running. If squid is not running, it stops apache, renames the logfile, deletes the logfile (if renaming it failed), starts squid, and starts apache. If your squid logs are somewhere other than /home/squid/var/logs, then change the script accordingly.

Finally, one failure condition that had plagued me for a while was a steady loss of file descriptors. On some of my servers, over time, file descriptors will leak. Once file descriptors reach a critically low level, squid will become unresponsive and this may even take down the entire server. I wrote a script that will deal with this issue by checking to see how many squid file descriptors are available, and, if squid has not been restarted too recently, and the descriptors are too low, it will restart squid.

In order for this script to work you need a couple things. First of all, you will need snmp and snmp-tools installed on your server. You will also need squid set up to listen to snmp requests. Finally, you need to create a file with world read/write permissions so that the checking script can check and update the last time squid was restarted.

So here we go:

Installing snmp:

yum install net-snmp
yum install net-snmp-utils

Configuring SNMP:

mv /etc/snmp/snmpd.conf /etc/snmp/snmpd.conf.huge
nano /etc/snmp/snmpd.conf

Insert the following (changing the password to whatever you want)

rocommunity password
proxy -v 1 -c password localhost:3401 .1.3.6.1.4.1.3495.1

Start snmp and configure it to run as a service:

service snmpd start
chkconfig –level 2345 snmpd on

Squid needs to be configured for snmp as well, and Squid must be compiled with the –enable-snmp switch

In your squid.conf file, include the following:

acl snmpcommunity snmp_community password
snmp_port 3401
snmp_access allow snmpcommunity localhost
snmp_access deny all

If your firewall is for some reason blocking snmp, you could resolve that with the following:

mv /etc/sysconfig/iptables /etc/sysconfig/iptables.save
iptables -F

Finally, you need to create the file “lastrestart” in the same directory as the checking script and chmod it to 666.

Now you should be ready to run the script. The checking script, which I call “descriptorcheck” is here: http://pastebin.ca/836083

For descriptorcheck to restart squid, you need a script located at /root/restart with the following:

killall squid
sleep 15
killall squid
sleep 1
/home/funky/squidup
sleep 1
/usr/local/spri/spri -v

The above script assumes you have the “squidup” script located at /home/funky/squidup

In my copy of the descriptorcheck script, I set the limit for descriptors as 4000, and the earliest it can restart squid is 20 minutes after the last time it restarted it. In the case of a date rollover, where the last restart time is in the future of the current time, we assume that the last restart was more than 20 minutes ago. If you want to change the lower limit for descriptors or the maximum restart frequency, you can do that in the script.

Points to remember:

AUFS is good for large but few files, or excessively large cache directories

COSS is good for many small files, and cache directories of no more than 100gb.

Rotating your squid logs is important, but make sure not to rotate them if the cache storage is rebuilding.

If you run out of descriptors you are headed for trouble. Sometimes descriptors leak and the only way to fix this is to restart squid. Having a script do this for you automatically will save you downtime and hassle.

All scripts that run should have permissions (chmod) of 755.

Enjoy!

Every now and then, I come across a person needing help with their DNS resolution for their proxy. More often than not, they don’t know that DNS is their problem, and it shows up as general poor performance, or unable to load proxied pages.

I have since resolved this on my servers, but I thought it would be useful to share that information with you. The idea behind this post came from one (of many similar) threads on the proxy.org forum: http://proxy.org/forum/1179669925.html

I run a few proxies and I see that I have a problem with the loading time. It takes around 25seconds just to load google.

I have my apache optimized and also noticed that my other website and non-proxy surfing are fast.

But the proxy surfing is very slow. The server load is fairly low at around 0.50, so I don’t think this could be due to excessive load.

In a situation like this, you should look to your DNS resolvers first.

The key points of interest are:

  1. When site finally starts loading, it does so quickly.
  2. Your site’s homepage loads very quickly, but “in proxy” pages do not.
  3. Server is not under heavy load but goes very slow.
  4. You’ve taken other appropriate steps to optimize your server.

The trick here is to look at your /etc/resolv.conf file.

This file has a list of the DNS resolvers your server will use, in order. In the situation above, chances are your first resolver is bad, and all requests have to time out on that resolver before it tries another one. Eventually it finds a working resolver, and you’re in business, but meanwhile everything goes slowly.

Your resolv.conf file should look like this:

nameserver 127.0.0.1
nameserver 66.90.68.25
nameserver 66.90.68.26
nameserver 66.90.68.15
nameserver 66.90.68.16

If there’s other stuff in there, it may be causing you problems. If you’re getting a “parse error” in your resolv.conf when you do the nslookup, see the original thread for a way to fix that.

There is a command you can run from the linux command line to test each resolver:

nslookup www.google.com 127.0.0.1

In this case, we want to see what the dns server located at “127.0.0.1″ thinks the address for www.google.com is. Replace 127.0.0.1 for the ip address listed in your resolv.conf. Try a few different addresses to resolve. Note if you get an error message, or if it takes a while to do the resolution. If this happens, you should remove this resolver from your list.

Ideally, for maximum performance, you should run a local dns resolver on your proxy server. This will make your server immune from issues relating to poor performing upstream resolvers. In Cpanel, you’d just use the “enable nameserver” option, and then make sure the first entry in your resolv.conf file is “127.0.0.1″.

In my last article, I explained how to use squid as part of a high performance CGI Proxy hosting platform. At one point in the article, I suggested you use the UFS file system because I hadn’t resolved issues with the other file systems available in squid. I’ve spent a few weeks tweaking disk performance in squid, and have some useful suggestions now. One thing that gave me helpful insight was the book Squid: The Definitive Guide.

What I found is that I was having an i/o bottleneck on my servers. Many of my servers have the following disk setup:

2x 500gb samsung sata
2x 500gb western digital sata

The samsung hard drives just weren’t fast enough, and though my western digitals were faster, they were still causing issues. Though I could tend to ignore this issue on my lower end servers, on my new core2quad intel cpus, I would hit an obvious disk bottleneck at about 2/3 of what the server could handle cpu wise. I didn’t want to give up caching, so I had to investigate a few things.

Before I go further, I should mention that I have a couple of servers now with 750gb western digital hard drives, and those drives run much much faster. Though you’ll still want to tweak your disk systems, if I could do it all over again, I would put 2 x 750gb WD drives in my servers instead of 4x 500gb drives. Performance would easily be double.

First of all, I found that reducing the disk cache size freed up a lot of ram and also reduced disk io substantially. I had previously used 100gb of each 500gb disk (400gb total) for caching. Squid has to store a bit of information about each cached item in ram, and with this setup, that amounted to 2gb. Luckily I had 8gb ram in each server, but this is still more usage than I would like. Reducing cache size means fewer files to shuffle into and out of the cache, as well as increases the likelyhood that a disk item is already cached in memory by the file system. A 20gb squid cache on a server whose operating system has allocated 4-6gb for file caching will see very little io wait.

Since reducing the disk cache substantially wasn’t really my goal, I decided to dig deeper. The book I mentioned earlier helped here. It mentioned three major things. First of all, explained how the squid file systems available work. Secondly, gave some tips on optimizing the performance of linux file system options. Third and most important, it showed performance benchmarks for these different options.

First I’ll explain the four squid file systems and some of my experience with them.

1) UFS. This is the tried and true file system for squid. It is also the slowest. It exists within the main squid process, so when I/O operations are performed, no other work may happen. It also means even if you have multiple hard drives, only one may be accessed at a time. This is obviously bad, but surprisingly, it can sometimes work to your benefit if your disk is especially slow servicing multiple simultaneous requests, as is the case with many ata drives and some poorly designed raid 5 configurations.

2) AUFS. Instead of existing within the main squid thread, it spawns extra threads to take care of disk accesses. The A stands for Asynchronous UFS. On some operating systems, the functionality to make this possible does not exist or is not enabled by default. In my fedora 6 / 7 installs, this has not been an issue. Because disk access happens in separate threads, squid can still handle requests and other work while it is waiting on disk accesses. This makes cache misses process faster in particular. If on your server, squid’s single thread CPU usage is the bottleneck, then you can squeeze out a little more this way on a multi core CPU.

If you decide to use AUFS you should tweak the number of threads used for disk access when you compile squid. By default, squid will spawn quite a few threads for AUFS, which in some cases may degrade performance. The reason for this is with too many threads, you are in effect telling the hard drive to do a bunch of stuff at once. Linux file systems mitigate this problem somewhat by ordering disk accesses properly. However, ATA drives and many SATA drives could more effectively handle requests in a more serial fashion. It should be mentioned, SCSI drives and to a lesser extent SATA drives with NCQ do not suffer as acutely from this problem.

3) DiskD. DiskD is basically the same as AUFS, except that it spawns a seperate process for disk access, one process per cache directory, instead of threads. This is more compatible on some operating systems, but does not perform quite as well. In particular, I find rebuilding the squid cache takes forever with DiskD, whereas AUFS does not have this problem. Squid must rebuild the disk cache whenever it starts or restarts, so this can cause really poor performance for an extended period of time. For this reason, and because performance of AUFS is better, I cannot recommend DiskD.

4) COSS. Coss is the shining star here. It is a little more complicated than the other disk systems, and harder to understand, but definitely worth it. Basically, instead of having a big batch of files, each one requiring seperate file i/o operations, it throws your entire cache in a single file that it navigates in a circular fashion. If you want to know more about this file system, I suggest reading the book I mentioned or looking up more online. Because it does not have to delete files, or constantly open and close file handles, it is miles ahead of any of the other file systems. I have found that I can get half the disk i/o wait from COSS as I can from the best tuned alternative. The worst setup I had was many times slower than COSS.
The downside is that you have to do things just a bit different for COSS. First, the default maximum file size is 1 megabyte. This can be changed at compile time for squid, or possibly at runtime, but I haven’t gotten that far yet. Also, COSS has to create one big file ahead of time. Therefore, “creating the cache directories” will take a long time. Don’t stop this process or your cache won’t be the right size. Also, importantly, COSS only has a 24 bit counter for blocks inside of the cache file, so the size of your caches is limited. You can tweak this by changing the block size. I use a block size of 4096 bytes, which allows somewhere around 60gb file cache. You can always use multiple COSS cache_dirs, even on the same hard drive, to work around this issue.

An example configuration line I use for COSS is as follows:

cache_dir coss /sdd/squidcache/mycossd 50000 max-size=1000000 maxfullbufs=4 membufs=20 block-size=4096

Two more important things. First, COSS doesnt use an actual cache directory, it uses a cache file. This is important because the cache swap log would normally be stored inside the cache directory. To solve this, I have a line like this:

cache_swap_log /home/squid/var/cache_swap_log

Secondly, COSS requires large file support in Squid. This can be put in at compile time. When you run ./configure when compiling squid, here are the extra options you need:

./configure –enable-storeio=coss –with-large-files

If you want to support other file systems, use a line like this:

./configure –enable-storeio=diskd,aufs,ufs,coss –with-large-files

Although using COSS will give a big performance boost, it is also helpful to optimize your linux file system.

There are two basic things you can do to increase performance on your linux system. One is to disable journaling (using the ext2 file system instead of ext3 or a more exotic file system). The other is to disable the file system updating the last-accessed-times on your files.

Here is a link to instructions on how to disable journaling to convert ext3 to ext2:

http://www.redhat.com/docs/manuals/linux/RHL-7.3-Manual/ref-guide/s1-filesystem-ext2-revert.html

It should be noted that the last line to delete .journal has been unnecessary in my experience. It is also important to note that to make these changes work, you also need to edit the /etc/fstab file, and change the mount instructions to “ext2″ instead of “ext3″. In that file is also where you can disable access time updates. Your fstab file might have a line like this for your main disk drive:

/dev/VolGroup00/LogVol00 / ext3 defaults,usrquota 1 1

On my systems, I also have some extra lines for my extra drives:

/dev/sdb1 /sdb ext3 defaults 1 2
/dev/sdc1 /sdc ext3 defaults 1 2
/dev/sdd1 /sdd ext3 defaults 1 2

After you remove journaling, you need to edit those lines that say ext3 to say ext2. You can also add an option like so to remove access time updates:

/dev/sdb1 /sdb ext2 defaults,noatime 1 2
/dev/sdc1 /sdc ext2 defaults,noatime 1 2
/dev/sdd1 /sdd ext2 defaults,noatime 1 2

Between those two changes you will see a tangible increase in hard disk performance with squid, regardless of what squid file system you choose. If you don’t want to disable journaling or access times on your primary hard drive partition, you should consider creating a partition exclusively for the squid cache. For various file system optimization reasons, you should make your partition 20-30% bigger than the amount of disk space you actually intend to use. At the very least, allow at least 10% free disk space on the partition.

Although this article is rather rough and could do with better formatting and editing, I hope the information contained in it has been useful for your caching squid server.

In a recent article, I explained the Benefits of Squid Caching to Accelerate CGI Proxy. In that article, you saw that the particulars of CGI Proxy lend themselves well to acceleration using squid, primarily to save on ram usage, but also with some tangible cpu usage reductions. Today’s article follows up on that information to actually explain how to get this done. With squid, you’re in for a wild ride, but, with all that this flexible tool has to offer, it will open up a world of new possibilities in scalable and high performance web solutions.

I’ve found one of the hardest things regarding squid is making sense of the documentation. Most of it is out of date or incomplete, and squid was not exactly designed with this purpose at the front of it’s mind. As for versions, Squid 2.6 is the latest stable version available, and detours greatly from the syntax and capabilities of 2.5 and earlier, which most of the documentation is written for. Furthermore, as 3.0 is not yet advisable for production use, 2.6 is really the best choice. 2.6 gets most of its syntax from the 3.0 branch, but does not support all of its features. If you must venture into 2.6 without knowing what you are doing, the best source of information is the 3.0 documentation. The best source of knowhow on the subject of squid is the #squid channel on irc.freenode.net. From my experience, the people in that channel are helpful and knowledgeable.

There are a few main hurdles you’ll need to get over if you want to embark on converting your cgi proxy to support squid. Many of these may seem rather basic, and the end result is not that impressive from a code standpoint, but I have spent nearly a year working out solutions to these issues. Running proxies myself, I questioned whether it was prudent to give away this competitive advantage that I have, especially when it took so much work to get here. I guess I’m saying, be thankful for what I’m giving you. Theres no autoinstall script, so doing this will require reading, learning, experimentation and some skill, but if you’re willing to put in a few hours of effort, you will get the benefit of nearly a year of my experience with squid.

Without further ado, the tasks ahead of you are:

* Compile and Install Squid

* Configure apache to listen on an alternate port (in our examples, 82)

* Configure Squid in web accelerator mode

* Modify CGI Proxy to think that incoming connections on our alternate port (82) are really coming from standard port 80.

* Deal with those elusive friends, file descriptors (or lack thereof)

* Keep your logfile from becoming too big

* Make a script to keep squid / apache up, in the likely case something goes wrong

* Set up SPRI so that squid gets the higher priority it needs

* A few thoughts on abuse and logging

Advanced topics that I may (or may not) cover in a later article include:

* Supporting SSL sites via squid acceleration

* Using the URL_Rewriter function in squid to keep your cache namespace merged when it would normally splinter and become inefficient

* Modify CGI Proxy to support the above goal more effectively

* Explain various tweakable aspects of the squid configuration, and discuss some good potential optimizations

* Explain some clustering concepts I’ve tried with squid, and why to avoid them (I lovingly refer to this as the ‘cluster-fuck’)

Further room for improvements that I have not solved yet include:

* Avoid compile-permissions-hell when tweaking the URL_Rewriter function to be faster using inline C.

* Figure out how to get the best performance from the disk storage systems available to squid, while avoiding known issues

* Merge cache spaces across servers, and take advantage of clustering, without causing a “cluster-fuck”.

* Figure out how to avoid a severe performance degradation and ram leakage when hosting multiple proxy domains on the same server (this is a mod perl / apache problem).

So without further ado, lets get down to the dirty work:

* Compile and install squid

Squid is a little bit tricky in that it wants its own user and group, tries to put things in weird places that don’t always make sense, and has some performance issues with the default 1024 file descriptors. To get around this, we will be configuring squid at compile time to use more descriptors, install to /home/squid, and create the correct folders and permissions.

As of this writing, the newest version of squid is 2.6 stable 14. When I said there is no easy autoinstall I will be giving you, that is only half true. This portion of installing and configuring squid is best done automatically.

From shell on your server, run the following commands:

Wget http://www.squid-cache.org/Versions/v2/2.6/squid-2.6.STABLE14.tar.gz
adduser squid
tar –ungzip -xf squid-2.6.STABLE14.tar.gz
cd squid-2.6.STABLE14
ulimit -HSn 8192
./configure –prefix=/home/squid –enable-carp –enable-storeio=diskd,ufs –enable-ssl –enable-snmp
make
make install
chmod 777 /home/squid/var/
chmod 777 /home/squid/var/logs
/home/squid/sbin/squid -z

We can’t run it yet, but now, we have squid installed, ready to be configured. We downloaded and unzipped squid first of all, then added the user (squid) that we will be telling squid to use later. We then set the system maximum file descriptors to 8192 (plenty for all intents and purposes), immediately prior to compiling squid. If you don’t do this at this time, you cannot increase the file descriptors later. We then configured squid to install everything to /home/squid, and we enabled some things that you may or may not end up using. We then set the permissions on the logging and caching section so that squid won’t complain and die, and we ran the squid option to create the cache directories. All in 11 lines of shell.

Now that squid is installed, let’s work on the next problem:

* Configure apache to listen on an alternate port (in our examples, 82)

The file you will be configuring often, that you may have been insulated from by your favorite control panel, is httpd.conf, usually located at /usr/local/apache/conf/httpd.conf. Your favorite command line editor will help you here. I use The Joe Editor. Nano is also a good editor, and comes preinstalled in most linux distros.

In any case, open up your httpd.conf. There are a few things to look out for, that may tell your system to open port 80 when it shouldn’t.

Listen directives:

Listen 80
Listen 12.34.56.78:80

<IfDefine SSL>
Listen 80
Listen 443
</IfDefine>

Search your httpd.conf for these kind of entries. If you have a listen with an ip, but no port, that’s fine (though I don’t know if that’s considered valid in the first place). If you have a listen with just an 80 after it, change it to 82.

Port directive:

Port 80

This is the main port that apache listens on. With this port set, it will steal port 80 on every ip. Change this to 82

Virtualhosts:

NameVirtualHost 67.159.44.88:80

<VirtualHost 67.159.44.88:80>
ServerName fdc4.freeproxies.org
DocumentRoot /usr/local/apache/htdocs
</VirtualHost>

Unless you’re running SSL, your virtualhosts don’t need to be attached to a particular port. In the above examples, just remove the 80 entirely, don’t bother replacing it with 82. Make sure you don’t have duplicate NameVirtualHost entries with the same ip and port (or lack of port). Sometimes cpanel will make more than one of these entries for no particular reason.

That’s all you need to do to force apache onto the appropriate port, but, I suspect, it sometimes might steal port 80 anyway. I have a workaround for that we will discuss later.

Moving on…

It is not entirely straightforward from the docs to see how to configure squid for this purpose. What follows is what I put near the top of my squid.conf configuration file. Keep in mind I’m not 100% sure all of this is required, it’s just what I happen to have:

http_port __ipaddress__:80 vhost
cache_peer __ipaddress__ parent 82 0 no-query allow-miss originserver
acl notssl myport 80
cache_peer_access 67.159.45.233 allow notssl
acl porteighty port 80
acl porteighty port 82
acl alldest dst 0.0.0.0/0
acl acceld dst __ipaddress__/32
never_direct deny acceld

Farther down in the file, you’ll need to edit your security permissions. You don’t want your copy of squid acting as a spam relay! Basically, I took what rules they already had, commented out some, and added a couple. By virtue of using cache_peer with allow-miss and originserver, along with never_direct deny, you get rid of a lot of security problems anyway, because any request should be force fed to your apache server.

http_access allow manager localhost
http_access deny manager
#http_access allow manager
# Deny requests to unknown ports
http_access deny !Safe_ports
# Deny CONNECT to other than SSL ports
#http_access deny CONNECT !SSL_ports
http_access deny CONNECT
http_access allow all
http_access allow acceld
http_access deny alldest

You should also configure your cache directory. I use the UFS file system, because I had some stability issues with diskd. Here’s the line I’ve used on a vps I have. It gives me a 20gb cache. You can modify this to suit your needs:

cache_dir ufs /home/squid/var/cache 20000 16 256

If you have multiple drives, or want to put the cache somewhere other than where you initially installed, you’ll need to ‘chown squid’ the directory you’d like to use to do the caching. After creating and chowning the directory, you need to run squid with the -z option so it creates the necessary directory structure. i.e. /home/squid/sbin/squid -z

Keep in mind that having too large a cache will increase io-wait on your server. On many of my servers, I have four 500gb drives for caching, and I find it works well to use 100gb of space for cache on each. This gives about a 40% cache hit rate when used in conjunction with various cgi proxy modifications that help merge the cache namespace.

There’s a lot of things you can do to tweak squid I suppose, but that’s the basic install and it should serve you well.

* Modify CGI Proxy to not mind running on port 82:

CGI Proxy is no dummy, and it tries to figure out what port it is listening on and redirect all outgoing links to the same port. In order to make our accelerator work, we have to stop this.

Originally, you’ll have some code like this in CGI Proxy:

# If $RUNNING_ON_SSL_SERVER is ‘’, then guess based on SERVER_PORT.
$ENV_SERVER_PORT= $ENV{SERVER_PORT} ;
$RUNNING_ON_SSL_SERVER= ($ENV_SERVER_PORT==443) if $RUNNING_ON_SSL_SERVER eq ‘’ ;

You need to change it to this:

# If $RUNNING_ON_SSL_SERVER is ‘’, then guess based on SERVER_PORT.
$ENV_SERVER_PORT= $ENV{SERVER_PORT} ;
if ($ENV_SERVER_PORT == 82) { $ENV_SERVER_PORT=80; }
$RUNNING_ON_SSL_SERVER= ($ENV_SERVER_PORT==443) if $RUNNING_ON_SSL_SERVER eq ‘’ ;

That’s pretty much it.

* Deal with those elusive friends, file descriptors (or lack thereof)

Instead of being thread based like apache, squid has a single thread that uses multiple file descriptors to take care of all it’s communications. It needs some descriptors for each file it has open, as well as for each user connected to it and each connection it makes to the outside world. With this in mind, obviously you don’t want to run out of file descriptors.

The install script I gave you before is important because it has a line that sets the maximum file descriptors the software can use. You have to set this both at compile time and at runtime. If you compile the software with a lower (or system default) limit, it will never use the full amount until you recompile it with the higher limit. Also, if you start the software without first overriding the default, it will start the program with the lower value.

To get 8192 file descriptors, use the following command before compiling and before running squid:

ulimit -HSn 8192

Aside from the occasional bug that may cause you to run out of file descriptors (no matter how many you have), this will be plenty for any usage. If you have a hard time connecting to your server but it seems to not be heavily loaded, you may be running out of descriptors. The /home/squid/var/logs/cache.log file will have a notice of this if it happens.

* Keep your logfile from becoming too big

The default installation of squid will only deal with log files 2gb in size or less. You could recompile it to allow larger files, but a logfile over 2gb is pretty useless anyway, it will take too long to open and edit. In order to keep your server from crashing, you have to keep the logfile sizes lower.

I run a cron to rotate the squid logs every hour. This is a good way to deal with this problem. Unfortunately, squid sometimes doesn’t believe it’s running even when it is, so it won’t rotate the logs. In this case, the logs get bigger until squid eventually crashes. I know there’s a better solution out there, but I have a clunky workaround. I check to see if squid is running as a cron, and if it’s not, I stop apache, delete the current logfile, and restart squid and apache.

Here’s the short version of the rotate squid cron (should run hourly):

/home/squid/sbin/squid -k rotate

Just put that in a text file, chmod 755 it, and add it to your crontab with a line like this:

0 * * * * /home/funky/rotatesquid

Now for the second cron. I call this one squidup, because it helps keep squid up and running in the couple of cases where it might die:

#!/bin/bash

this=$(ps aux | grep httpd | awk ‘{print $11}’ | grep httpd | head -1)
if [ “$this” = “/usr/local/apache/bin/httpd” ]; then
echo “httpd was found”
else
echo “httpd was not found”
/etc/init.d/httpd restart
fi

thing=$(ps aux | grep squid | awk ‘{print $1}’ | grep squid | head -1)
if [ “$thing” = “squid” ]; then
echo “squid was found”
else
echo “squid not found”
/etc/init.d/httpd stop
sleep 4
killall httpd
sleep 4
killall -KILL httpd
sleep 4
rm /home/squid/var/logs/access.log
ulimit -HSn 8192
/home/squid/sbin/squid
sleep 4
/etc/init.d/httpd restart
fi

Again, put that in a text file, chmod 755 it, and set it to run as a cron. I run mine every two minutes:

*/2 * * * * /home/funky/squidup

* Set up SPRI so that squid gets the higher priority it needs

SPRI is a script that changes the priority of running threads automatically. It was originally available from RFX networks, but there is a bug in the script they haven’t fixed and aren’t apparently updating it. Marco aka TheWird, found out what the bug was and fixed it. Here is a copy of the fixed script that you can download.

To install spri, copy the attached file to your server, then execute the following commands:

tar –ungzip -xf spri-wird.tar.gz
cd spri-0.5/
./install.sh

After this, you need to set the priorities. in your favorite text editor, edit /usr/local/spri/prios/med-high and remove the line that says “squid”. Next you need to edit /usr/local/spri/prios/rt and add a line that says “squid”. This way squid will run with higher priority. Because there are many copies of apache running, and all of them are being fed by squid, if squid is not given higher priority, it will lag and your server will not process nearly the traffic that is possible.

Now you need to run spri as a cron. Add this to your crontab to make spri run every 45 minutes (you can change this to more often if you’d like):

*/45 * * * * root /usr/local/sbin/spri -q >> /dev/null 2>&1

I should be editing this blog entry, but, given how long everyone has waited, this should be good enough for now to get you up and running. Enjoy!

Next Page »