The prosecution, in briefing me for trial in the days before I was to testify, was apparently trying to prove things that they knew were entirely irrelevant to the case at hand. The main prosecutor, for example, was telling me that the ads being loaded by my site, and the ads (attempted) to load from yahoo mail while viewing my site, were important because “we need to show as much commerce as possible”. I can only assume he means, that he wants to get the kid on interstate wire fraud, and without commerce, there is no wire fraud. He particularly didn’t care that, even though my logs showed that some hits were made to ad servers, that my site is de-facto not compatible with ads, and they never would have been loaded or shown on the screen. Further, it is a real stretch in the first place to claim that viewing a website with ads on it, constitutes sufficient interstate commerce to allow for wire fraud. He told me, “it doesn’t matter if they loaded or not”, clearly showing that he doesn’t care if interstate commerce happened, or if the particular crimes he is prosecuting for is applicable here, he is only interested in reaching the maximum possible sentence.
Also disturbing, Kernell is charged with 4 counts, two of which carry 5 year maximum sentences each, and the other 2, 25 year maximum sentences each. News reports say this would carry up to 50 years maximum sentence if convicted. My fuzzy math says this would be 60 years and not 50, but what do I know? In any case, 50 years, ironically, is twice the maximum sentencing guideline for second degree murder. Second degree murder is known as a “level 38″ federal offense, punishable by 235-293 months (19-24 years) in jail, when committed by someone with little or no criminal history. The fact that the government would try to get twice as much time in jail for what amounts to a prank, than they could get for a cold blooded murderer, is insane. To think that this would have been half as serious if he had gone out and killed someone on the street just shows what sorry shape our legal system is in.
As a result of this trial, I will be changing the logging policy for my proxy websites. Effective immediately, I will only be logging the minimum amount required by law. In the United States, this means nothing at all. For our servers in the UK (currently hosting only ktunnel.com, popular only in Turkey), we will be logging for 48 hours, as that is the relevant required logging period in that jurisdiction. Even in the UK, we will be looking into ways to log less evidence-quality information, so long as what we are logging is within our legal obligations. For the US, where almost all of our proxies are hosted, logging will only take place after-the-fact, to specifically try to log information on people who are repeatedly abusing our systems, and then, only logging what is necessary to stop a specific, repeatedly abusive user. We will no longer be proactively logging the activity of users on our US servers.
This change in policy is made possible by advancements in abuse prevention. These days, it is very rare that I get a legitimate abuse complaint, and when I do, the only needed response in the majority of cases is to disallow access to an unpopular website (usually a forum) that is being spammed or otherwise abused. By changing my logging policy, the logs will meet the (now much lower) requirement of what I need to log in order to keep my sites working properly.
As a result of this trial, and the complete lack of perspective and justice being shown by the federal government, I will be stepping up now, in an attempt to meet my moral obligations. As such, I will do whatever it is that I can do, legally, to protect my users, by logging as little as I am legally allowed to log while still keeping my site working properly for everyone who needs to use it. I am genuinely sorry for being an integral part in this trial, something I hope never happens again.
“What we found was that students using social networking sites are actually practicing the kinds of 21st century skills we want them to develop to be successful today,”
“Students are developing a positive attitude towards using technology systems, editing and customizing content and thinking about online design and layout,” Greenhow continued.
“They’re also sharing creative original work like poetry and film and practicing safe and responsible use of information and technology.”
Original Story215b ]]>
In just one short week, access to youtube.com from Turkish visitors has more than doubled our total traffic and more than tripled our total bandwidth usage. Whereas youtube support typically peaks our usage at about 150 megabits per second, in recent days it has been using over 800 megabits per second. Luckily our servers and our service provider have been up to the task to ramp up our service very quickly. This, combined with spare capacity we had on hand to deal with such an event, have led to the site running more or less as expected, despite serving over 25 youtube videos per second during peak hours.
As others have reported, censorship on the internet is not possible the same way it had been with older forms of media. Thanks to proxy sites like beatfiltering.com and vtunnel.com, and proxy lists like freeproxies.org and proxy.org, web visitors can quickly circumvent any attempted blockages with ease. With google having recently removed the videos that caused Turkey to decide to block youtube in the first place, it is expected that access to the site will be restored soon. In the likely case that Turkey or another country decides to block access to this or another site again, freeproxies will be here to help those affected.]]>
In case Vtunnel.com is blocked anyway, however, we now offer SSL access to several other web based proxies that we host:
Please let me know if you have any issues with getting booted to the homepage during a browsing session.]]>
COSS is great if you need to cache a large number of small files that do not exceed a certain total size. There are a few reasons for this. First of all, COSS can only access a certain number of total objects per storage location. In order to have a large cache size, you need to increase the maximum allowed size of each object, and increase the size of your stripes. This necessarily increases ram usage. It also has a limit, so that beyond about 100gb (I forget the exact value), you cannot really have a COSS partition bigger than that.
Furthermore, when you start or restart squid, you must rebuild all your cache directories. For file storage methods of AUFS or UFS, this happenes reasonably fast. For DISKD and COSS, it does not. For COSS in particular, to rebuild the cache information, it has to read the entirety of the storage space allotted to it. If you have slow disks or you have a large amount of cache space used, this will take an unreasonable length of time. I typically allot only 50gb for COSS for each drive in my system, for these reasons.
The reason COSS is so fast is that everything operates upon a single file. There is no file system overhead with creating or deleting each file that is being stored. Turning off journaling and last accessed time can reduce these overheads in the other file system methods, but COSS is still the winner here. The smaller and greater number of files you have, the more overhead is used with the non-COSS storage methods.
Conversely, if you have a smaller number of large files, the advantages of using COSS are minimized. Additionally, the inflexibility of COSS for files of widely varying or large size is a big disadvantage in this case. In the case that you are commonly caching files of 1 megabyte or greater, or you want to use hundreds of gigabytes on each disk for squid caching, AUFS with journaling and last-accessed-times turned off is your best bet.
Another issue with rebuilding the squid cache comes in during log file rotation. Normally, you can continue to use squid even if it is rebuilding its cache directories, albeit with a bit slower performance. If you have a script rotate your squid logs (as you should have), and the squid cache is rebuilding when you are rotating your logs, squid will not accept any more connections until it has finished rebuilding the storage. This is a big problem if your squid needs to rebuild periodically (due to squid cashing or restarting or even just rebooting the server). Because it can take 20 minutes or so to rebuild a 50gb cache size, the chances of the log rotation overlapping with your rebuild is fairly high.
For these reasons, I have created a script that, when rotating your squid logs, will check to see if a rebuild is in progress before rotating the logs. If a rebuild is in progress, it will not rotate the logs.
A further problem I had run into, is that squid will not rotate the logs if it could not find it’s PID file. For some reason this file will sometimes get deleted. Therefore, as part of rotating the logs, I have created a script that will rebuild the PID file.
A copy of the new rotatesquid script is available here: http://pastebin.ca/836062
Obviously you will have to change the directory referenced if you have your squid logs somewhere other than /home/squid/var/logs
This much improved squid log rotation script will save you from losing logfiles. In our previous method, we had a script that would, if it detected squid was not running, delete the last logfile (because a common squid crash was having a log file over 2gb), and then restart squid. By rebuilding the PID file, we are able to successfully rotate the squid logs all of the time, reducing crashes and logfile deletions. Furthermore, COSS requires you build squid with the ability to handle greater than 2gb file sizes. This means that if the PID file was missing, the log file could eventually reach 30gb or more. This is definitely not something we want.
Also, it occurred to me that deleting the logfile if squid was crashed was not the best way to deal with things. I have changed the ’squidup’ script to now rename the active logfile instead of deleting it, as part of the process of bringing up squid when it is down.
The new version of squidup is here: http://pastebin.ca/836071
This particular script makes sure both apache and squid are running. If squid is not running, it stops apache, renames the logfile, deletes the logfile (if renaming it failed), starts squid, and starts apache. If your squid logs are somewhere other than /home/squid/var/logs, then change the script accordingly.
Finally, one failure condition that had plagued me for a while was a steady loss of file descriptors. On some of my servers, over time, file descriptors will leak. Once file descriptors reach a critically low level, squid will become unresponsive and this may even take down the entire server. I wrote a script that will deal with this issue by checking to see how many squid file descriptors are available, and, if squid has not been restarted too recently, and the descriptors are too low, it will restart squid.
In order for this script to work you need a couple things. First of all, you will need snmp and snmp-tools installed on your server. You will also need squid set up to listen to snmp requests. Finally, you need to create a file with world read/write permissions so that the checking script can check and update the last time squid was restarted.
So here we go:
yum install net-snmp
yum install net-snmp-utils
mv /etc/snmp/snmpd.conf /etc/snmp/snmpd.conf.huge
Insert the following (changing the password to whatever you want)
proxy -v 1 -c password localhost:3401 .220.127.116.11.4.1.3495.1
Start snmp and configure it to run as a service:
service snmpd start
chkconfig –level 2345 snmpd on
Squid needs to be configured for snmp as well, and Squid must be compiled with the –enable-snmp switch
In your squid.conf file, include the following:
acl snmpcommunity snmp_community password
snmp_access allow snmpcommunity localhost
snmp_access deny all
If your firewall is for some reason blocking snmp, you could resolve that with the following:
mv /etc/sysconfig/iptables /etc/sysconfig/iptables.save
Finally, you need to create the file “lastrestart” in the same directory as the checking script and chmod it to 666.
Now you should be ready to run the script. The checking script, which I call “descriptorcheck” is here: http://pastebin.ca/836083
For descriptorcheck to restart squid, you need a script located at /root/restart with the following:
The above script assumes you have the “squidup” script located at /home/funky/squidup
In my copy of the descriptorcheck script, I set the limit for descriptors as 4000, and the earliest it can restart squid is 20 minutes after the last time it restarted it. In the case of a date rollover, where the last restart time is in the future of the current time, we assume that the last restart was more than 20 minutes ago. If you want to change the lower limit for descriptors or the maximum restart frequency, you can do that in the script.
Points to remember:
AUFS is good for large but few files, or excessively large cache directories
COSS is good for many small files, and cache directories of no more than 100gb.
Rotating your squid logs is important, but make sure not to rotate them if the cache storage is rebuilding.
If you run out of descriptors you are headed for trouble. Sometimes descriptors leak and the only way to fix this is to restart squid. Having a script do this for you automatically will save you downtime and hassle.
All scripts that run should have permissions (chmod) of 755.
I have since resolved this on my servers, but I thought it would be useful to share that information with you. The idea behind this post came from one (of many similar) threads on the proxy.org forum: http://proxy.org/forum/1179669925.html
I run a few proxies and I see that I have a problem with the loading time. It takes around 25seconds just to load google.
I have my apache optimized and also noticed that my other website and non-proxy surfing are fast.
But the proxy surfing is very slow. The server load is fairly low at around 0.50, so I don’t think this could be due to excessive load.
In a situation like this, you should look to your DNS resolvers first.
The trick here is to look at your /etc/resolv.conf file.
This file has a list of the DNS resolvers your server will use, in order. In the situation above, chances are your first resolver is bad, and all requests have to time out on that resolver before it tries another one. Eventually it finds a working resolver, and you’re in business, but meanwhile everything goes slowly.
Your resolv.conf file should look like this:
If there’s other stuff in there, it may be causing you problems. If you’re getting a “parse error” in your resolv.conf when you do the nslookup, see the original thread for a way to fix that.
There is a command you can run from the linux command line to test each resolver:
nslookup www.google.com 127.0.0.1
In this case, we want to see what the dns server located at “127.0.0.1″ thinks the address for www.google.com is. Replace 127.0.0.1 for the ip address listed in your resolv.conf. Try a few different addresses to resolve. Note if you get an error message, or if it takes a while to do the resolution. If this happens, you should remove this resolver from your list.
Ideally, for maximum performance, you should run a local dns resolver on your proxy server. This will make your server immune from issues relating to poor performing upstream resolvers. In Cpanel, you’d just use the “enable nameserver” option, and then make sure the first entry in your resolv.conf file is “127.0.0.1″.
What I found is that I was having an i/o bottleneck on my servers. Many of my servers have the following disk setup:
2x 500gb samsung sata
2x 500gb western digital sata
The samsung hard drives just weren’t fast enough, and though my western digitals were faster, they were still causing issues. Though I could tend to ignore this issue on my lower end servers, on my new core2quad intel cpus, I would hit an obvious disk bottleneck at about 2/3 of what the server could handle cpu wise. I didn’t want to give up caching, so I had to investigate a few things.
Before I go further, I should mention that I have a couple of servers now with 750gb western digital hard drives, and those drives run much much faster. Though you’ll still want to tweak your disk systems, if I could do it all over again, I would put 2 x 750gb WD drives in my servers instead of 4x 500gb drives. Performance would easily be double.
First of all, I found that reducing the disk cache size freed up a lot of ram and also reduced disk io substantially. I had previously used 100gb of each 500gb disk (400gb total) for caching. Squid has to store a bit of information about each cached item in ram, and with this setup, that amounted to 2gb. Luckily I had 8gb ram in each server, but this is still more usage than I would like. Reducing cache size means fewer files to shuffle into and out of the cache, as well as increases the likelyhood that a disk item is already cached in memory by the file system. A 20gb squid cache on a server whose operating system has allocated 4-6gb for file caching will see very little io wait.
Since reducing the disk cache substantially wasn’t really my goal, I decided to dig deeper. The book I mentioned earlier helped here. It mentioned three major things. First of all, explained how the squid file systems available work. Secondly, gave some tips on optimizing the performance of linux file system options. Third and most important, it showed performance benchmarks for these different options.
First I’ll explain the four squid file systems and some of my experience with them.
1) UFS. This is the tried and true file system for squid. It is also the slowest. It exists within the main squid process, so when I/O operations are performed, no other work may happen. It also means even if you have multiple hard drives, only one may be accessed at a time. This is obviously bad, but surprisingly, it can sometimes work to your benefit if your disk is especially slow servicing multiple simultaneous requests, as is the case with many ata drives and some poorly designed raid 5 configurations.
2) AUFS. Instead of existing within the main squid thread, it spawns extra threads to take care of disk accesses. The A stands for Asynchronous UFS. On some operating systems, the functionality to make this possible does not exist or is not enabled by default. In my fedora 6 / 7 installs, this has not been an issue. Because disk access happens in separate threads, squid can still handle requests and other work while it is waiting on disk accesses. This makes cache misses process faster in particular. If on your server, squid’s single thread CPU usage is the bottleneck, then you can squeeze out a little more this way on a multi core CPU.
If you decide to use AUFS you should tweak the number of threads used for disk access when you compile squid. By default, squid will spawn quite a few threads for AUFS, which in some cases may degrade performance. The reason for this is with too many threads, you are in effect telling the hard drive to do a bunch of stuff at once. Linux file systems mitigate this problem somewhat by ordering disk accesses properly. However, ATA drives and many SATA drives could more effectively handle requests in a more serial fashion. It should be mentioned, SCSI drives and to a lesser extent SATA drives with NCQ do not suffer as acutely from this problem.
3) DiskD. DiskD is basically the same as AUFS, except that it spawns a seperate process for disk access, one process per cache directory, instead of threads. This is more compatible on some operating systems, but does not perform quite as well. In particular, I find rebuilding the squid cache takes forever with DiskD, whereas AUFS does not have this problem. Squid must rebuild the disk cache whenever it starts or restarts, so this can cause really poor performance for an extended period of time. For this reason, and because performance of AUFS is better, I cannot recommend DiskD.
4) COSS. Coss is the shining star here. It is a little more complicated than the other disk systems, and harder to understand, but definitely worth it. Basically, instead of having a big batch of files, each one requiring seperate file i/o operations, it throws your entire cache in a single file that it navigates in a circular fashion. If you want to know more about this file system, I suggest reading the book I mentioned or looking up more online. Because it does not have to delete files, or constantly open and close file handles, it is miles ahead of any of the other file systems. I have found that I can get half the disk i/o wait from COSS as I can from the best tuned alternative. The worst setup I had was many times slower than COSS.
The downside is that you have to do things just a bit different for COSS. First, the default maximum file size is 1 megabyte. This can be changed at compile time for squid, or possibly at runtime, but I haven’t gotten that far yet. Also, COSS has to create one big file ahead of time. Therefore, “creating the cache directories” will take a long time. Don’t stop this process or your cache won’t be the right size. Also, importantly, COSS only has a 24 bit counter for blocks inside of the cache file, so the size of your caches is limited. You can tweak this by changing the block size. I use a block size of 4096 bytes, which allows somewhere around 60gb file cache. You can always use multiple COSS cache_dirs, even on the same hard drive, to work around this issue.
An example configuration line I use for COSS is as follows:
cache_dir coss /sdd/squidcache/mycossd 50000 max-size=1000000 maxfullbufs=4 membufs=20 block-size=4096
Two more important things. First, COSS doesnt use an actual cache directory, it uses a cache file. This is important because the cache swap log would normally be stored inside the cache directory. To solve this, I have a line like this:
Secondly, COSS requires large file support in Squid. This can be put in at compile time. When you run ./configure when compiling squid, here are the extra options you need:
./configure –enable-storeio=coss –with-large-files
If you want to support other file systems, use a line like this:
./configure –enable-storeio=diskd,aufs,ufs,coss –with-large-files
Although using COSS will give a big performance boost, it is also helpful to optimize your linux file system.
There are two basic things you can do to increase performance on your linux system. One is to disable journaling (using the ext2 file system instead of ext3 or a more exotic file system). The other is to disable the file system updating the last-accessed-times on your files.
Here is a link to instructions on how to disable journaling to convert ext3 to ext2:
It should be noted that the last line to delete .journal has been unnecessary in my experience. It is also important to note that to make these changes work, you also need to edit the /etc/fstab file, and change the mount instructions to “ext2″ instead of “ext3″. In that file is also where you can disable access time updates. Your fstab file might have a line like this for your main disk drive:
/dev/VolGroup00/LogVol00 / ext3 defaults,usrquota 1 1
On my systems, I also have some extra lines for my extra drives:
/dev/sdb1 /sdb ext3 defaults 1 2
/dev/sdc1 /sdc ext3 defaults 1 2
/dev/sdd1 /sdd ext3 defaults 1 2
After you remove journaling, you need to edit those lines that say ext3 to say ext2. You can also add an option like so to remove access time updates:
/dev/sdb1 /sdb ext2 defaults,noatime 1 2
/dev/sdc1 /sdc ext2 defaults,noatime 1 2
/dev/sdd1 /sdd ext2 defaults,noatime 1 2
Between those two changes you will see a tangible increase in hard disk performance with squid, regardless of what squid file system you choose. If you don’t want to disable journaling or access times on your primary hard drive partition, you should consider creating a partition exclusively for the squid cache. For various file system optimization reasons, you should make your partition 20-30% bigger than the amount of disk space you actually intend to use. At the very least, allow at least 10% free disk space on the partition.
Although this article is rather rough and could do with better formatting and editing, I hope the information contained in it has been useful for your caching squid server.21 ]]>