Many of you trying to run proxies know that it’s harder to get noticed now that everyone and his dog is running a stock copy of phproxy. If you have nothing of value to give the user, why would they use your proxy? Any advertising you do to get visitors needs to ask “what do I have to offer you?” If the answer is “not much”, you need to rethink your business strategy.

Is your proxy harder to block than other proxies? That is something valuable to give your users. It seems that 99% of the proxies out there have “proxy” in the url or in their page’s source code. If you can’t be bothered to remove that, then you obviously don’t care about if your proxy doesn’t get blocked. There are tons of ways to make your proxy harder to block. If you make no effort to find out what they are and do it, even if people want to use your proxy they won’t be able to.

Is your proxy fast and reliable? If the answer is no, it had better be because your cluster of high end servers tuned for maximum performance just can’t keep up with overwhelming demand. If the answer is no because you don’t know how to run your proxy, don’t stop abuse (hotlinking, other massive drains on resources), don’t stop bots from screwing you over, can’t or won’t optimize your server for maximum performance, or simply are unwilling to invest in appropriate levels of server power, then you aren’t offering something of value to the user.

Is your domain name easy to remember? Hard to remember domain names are not of value to users. It means they have to write down the name, and they can’t easily tell their friends about it. Compare Vtunnel.com with Proxurity.org. Which do you think people are going to tell their friends about? Which do you think they will remember tomorrow? Because both names are about equally unintelligible to people who don’t speak english, the second name will drive away English speaking visitors, but will lure in those from Iran who don’t care what the proxy is called. Domains that don’t end in .com are harder for people to remember, because its implicitly implied that .com means website, they don’t even have to think about it. To say that all the good domains are taken is just not true. Vtunnel.com never meant anything regarding proxies until people started using it as a proxy. When you expand your search beyond “blockedwebsite-proxy.com”, you’ll find a lot of easy to remember .com names.

Does your proxy offer something users can’t get elsewhere? When I started Vtunnel.com, most free web proxies didn’t support form post, for whatever reason. So all I had to do was enable that option, tell people I did so, and boom, instant competitive advantage. It’s a little harder these days to get the same competitive advantage, but it is not impossible with some work. Does your site go to sites that others can’t? Even a stock install of cgi proxy beta 15 is usually better than 75% of the proxies out there. With a few modifications you could make yours better than 90% of the proxies available. That offers real value. When you have something worth buying, it makes it a whole lot easier to sell it and get repeat customers. Just because your product is free doesn’t mean it has no competition.

Are you advertising to people who want your service? I’ll say that Proxy.org is the most effective paid advertising available for proxies. I’ll also say that if you exclusively rely on proxy.org to promote your service, you aren’t serious about your business. Most people advertise on proxy lists, proxy topsites, proxy.org, or try to compete for keyword placement when the user searches for “proxy-something-or-other”. This may be helpful, but you’re only targeting users who already know what value your website has. These are people who know what proxies are, and for whatever reason, aren’t happy with the ones they’ve found before. If yours is no better, they will move on. Instead, consider that most people don’t know what a proxy is. Now consider that in the US, every school and library is mandated to filter their internet access, or they will lose federal technology funding. It seems like a real no brainer then to try to reach those people who don’t know what a proxy is, but could get some benefit from using one. The simple reason is that there are literally millions of desirable (to US advertisers) potential customers out there who have no idea anyone offers what you offer. Instead of trying to compete for the same few people who already know about proxies (and know that yours sucks), target the people who know nothing about proxies, and will happily keep using yours until it gets blocked.

Is your site easy to use? Does it instill confidence in the user, or scream “stay away”? Proxies are often considered a shady part of the internet. The site exists for the purpose of getting around restrictions that some authority thought were a good idea to put in in the first place. Many proxy owners blatantly cheat their advertisers, violating ad network terms of service and trying to trick people into clicking on ads for unrelated products they’d never want. The url bar is often hidden so that your users are truly confused and often click an ad or do a paid search because they can’t figure out how to use your service. At the same time, by virtue of what a proxy is, this same user has to trust you with their personal data. Their passwords, their accounts, their browsing habits. Does anyone else here see a problem with blatantly trying to trick the user and scam your advertisers, while simultaneously running a business that requires a great deal of trust in you from your users? Be honest, and do right by the customer. Make it easy for them to figure out what the service is about, and how to use it. Make a page that explains who you are and why you run the service. Outline your privacy policy. Make the policy strong enough to protect users, but with clauses that allow you to be a good net citizen and obey the law when necessary. And then abide by your policy, whatever it may be. Don’t make your site’s layout trick the user, as those users won’t come back. Remember, the user is trusting you with their myspace / email / whatever password, which is very important to them and they probably use it for all their logins. Users who don’t trust you won’t use your service.

Always be improving. You have a good site? Great! People love it, you’ve advertised to the right crowd, it stays up, it doesn’t get blocked quickly, people trust you and your site, they can remember its name, they tell their friends about it, and it performs at least as well as the proxies run by your biggest competitors. Congratulations. But don’t stop there. Your competitors will always be trying to improve, and so should you. If your site is great today, it will be average in 6 months. If it is average now, it will be below average in 6 months. You can ride off your previous successes, but only for so long. Always try to find ways to improve. Find advertisers that pay better but are less irritating to your users. Find entirely new groups of people to advertise to. Figure out how to make your server load less with the same traffic level. Figure out how to make it harder to block your proxies, or to support more of the features and technologies that users want. Give users no reason to look for competing services.

Sometimes, the amount of work to do all of the above is just too much. It doesn’t make sense to spend the time it takes to run a truly world class service if you’re only making $20 a day. But if you don’t put in that time, or you don’t have to skill to do it, you’ll never make more than that. You can run 100 proxies and make $2 a day, or run one proxy that makes $400 / day. If you can’t be bothered to do things right, users will move on, your marketing efforts will be exponentially less successful, and you will make a LOT less money. If enough people do so, you will also collectively hurt the reputation of the proxy industry and make it that much harder for end users to find proxies that suit their needs. In short, doing things right matters.

For many of you running CGI Proxy, speed and scalability issues have always reared their ugly head, threatening to slow down your service to a crawl. I have been dealing with these issues running Vtunnel.com since 2005, and have come up with some good ways to reduce the load. The first big set of changes to make were described in our previous article, the Freeproxies guide to performance tuning apache for mod perl and cgi proxy. Today we’re going to take that a step further and introduce to you the benefits of using the open source squid cache in the cgi proxy environment.

First I would like to say that without great inspiration from the book “Scalable internet architectures“, it never would have occurred to me the great performance benefits that could be obtained using squid. Anyone dealing with environments that need to scale beyond a single server would benefit from reading this book.

In order to understand how squid can help you, first you have to understand how the various technologies you are using interact, and where the bottlenecks are.

So lets first describe a cgi proxy transaction without squid.

1) The user requests a page from your server, and apache receives this request. If there is a free apache slot available, it is now allotted to this request. You need a slot for each active request you’d like to process, and each slot uses a significant amount of ram.

2) Apache now will load up cgi proxy into the apache slot if it is not already running, and run that perl script. In the process of running, cgi proxy will most likely need to resolve a domain name, connect to the remote site, download some content, and reprocess it to send to the end user. All of these things take time, during which your expensive apache slot is being held up. This process is also the most cpu intensive of the things that are done during the life of the query.

3) After CGI Proxy in your apache slot has fetched what it was looking for and modified it suitably, it is time to send it to the end user. The apache slot is still being used, and ram wasted, even though very little processing is now going on. This is what we call ’spoon feeding’ the client their data. If network conditions between your server and the end user are poor, the time you spend spoon feeding the client will be significant.

In this case we are assuming we have plenty of bandwidth available, as our solution does not significantly help with that particular issue.

So we can see there are two bottlenecks:

1) The amount of CPU power available to process requests

2) The amount of apache slots you have available (roughly equivalent to total ram)

Now, many people may have thought of putting a squid cache on the ‘backend’, i.e., between your server and the websites it is requesting from. This will only save on incoming bandwidth, and, because the servers you are downloading from are usually pretty fast, the benefits of caching here are minimal. If you are being metered on both incoming and outgoing bandwidth, this may save you a bit of incoming bandwidth, but that’s about it.

However, we are going to take a totally different approach, and put squid on the ‘frontend’, between your clients and the apache server. We can help with both cpu usage and ram usage using squid caching in this way.

First, the obvious. Any request coming in that is cachable by your frontend squid is a request that does not have to be processed by Apache or CGI Proxy at all. This increases speed, reduces cpu usage, and reduces ram usage (since we didn’t need an apache slot). This by itself is probably good enough to consider using squid, but it is not where the most benefit lies.

The biggest performance benefit is gained from the fact that apache now has to talk to squid locally, rather than to your remote clients.

In order to see how this is important, lets go back to our description of a transaction with your proxy:

1-> User Requests Something From Server

Apache picks a slot for the user (if available. if not available, queue the request)

Apache does a handshake negotiation with the remote user

2-> Apache starts CGI Proxy / mod perl if it is not running in the apache slot (this is incredibly cpu intensive)
CGI Proxy figures out what to do with the request

CGI Proxy does a DNS Lookup of the remote server
CGI Proxy connects to the remote server (and does a handshake)

CGI Proxy downloads some stuff from the website

CGI Proxy processes the result (the majority of cpu usage typically happens here)

3-> CGI Proxy spoon feeds the result to the client. The farther away the client is (in ping times) and the slower their internet connection, the longer this will take.

Now, let’s compare that to with using squid:

1-> User requests something from server

Squid does a handshake with the user

Squid checks to see if the item is in the cache already (if it is, skip to step 3)

2-> Squid connects to apache and does a handshake locally (i.e. with zero latency). Chances are good this can be skipped entirely due to keepalives
Squid requests the item from apache

Apache assigns a slot to the request

Apache loads up cgi proxy / mod perl if it is not running in the slot (a cpu intensive thing to do). Note this is less likely to happen than before because squid is more likely to make good use of keepalives than the end user.

CGI Proxy figures out what to do with the request

CGI Proxy does a DNS lookup of the remote server

CGI Proxy connects to the remote server (and does a handshake)

CGI Proxy downloads some stuff from the website

CGI Proxy processes the result (again, the majority of cpu usage happens here)

CGI Proxy blasts the result over to squid. This takes very little time since squid is located on the same server. The apache slot is now free for additional requests.

3-> Squid spoon feeds to result to the client. This can take a long time, but, squid does not significantly degrade its performance or increase its resource usage when additional clients remain connected.

As you can see, with caching, a percentage of the time the most of the work (step 2) can be skipped altogether. Also, even when this is not possible, the amount of time that your apache slots need to be taken up is significantly reduced. Squid, which does not use significantly more ram when more users remain connected, is now responsible for the bulk of the time consuming work, whereas CGI Proxy and mod perl are now responsible primarily for actual processing only.

For those of you wanting a more concrete example, here are equations that relate the performance of your service:

Requests per second = Seconds to process each request * Number of simultaneous requests that can be processed

Number of simultaneous requests that can be processed = Total Ram available / Ram usage per apache slot

Seconds to process each request = (Total CPU Power / CPU time to process each request) + wait time overhead

When you relate these three equations, you can see the primary limiting factors of your system. By implementing squid on the front end, you reduce cpu time per request when the requests are cachable. You also get a small benefit on cpu by cycling your apache connections less often. More importantly, you significantly reduce wait time overhead by having apache talk to squid instead of to a remote client.

All of these factors significantly reduce seconds to process each request, putting less pressure on you to increase the available maximum connections. This is good because the only way to increase the number of simultaneous connections with apache is to get more ram (which costs money), or use less ram per slot (which you’ve already done if you followed our previous article).

Just for completeness, I will mention two other factors affecting overhead wait. One is the quality of the bandwidth at your web host. The other is the speed of your dns resolution. If your bandwidth quality is low, your server will spend a lot of time trying to download pages from the internet, tying up your apache slots. If your DNS resolution is slow, CGI Proxy will sit trying to resolve server addresses for a long time. Both of these things undermine the purpose of the optimizations we’ve just made. It is beyond the scope of this article to discuss improving these areas, but it is something you should keep in mind. It is worth mentioning that the best way to improve dns response times is to set up a local DNS resolver on your server.

Now that you know why you should use squid caching with cgi proxy, the obvious question is, “how do I do it?”.

Because the answer to that question is complicated and has some pitfalls, it will be the topic of our next article here on the freeproxies blog. Stay tuned.

In our efforts to continue to improve the end user experience, we came accross a nifty way to support web based messengers in your browser. Now, many of you may know of sites like meebo.com that allow messenger support via browser, but, if they’re blocked, you’ve been out of luck.

Now, we found one that works via proxies. It is ebuddy mobile, and was designed for mobile phones. Because of needing to fit within the design constraints of mobile phones, this service is also works well with proxies, requiring only a small tweak to our meta-refresh handling to work properly.

On the home page of all of our proxies, you will now see quick links to the ebuddy mobile system for AIM, MSN and Yahoo messengers. If you’re interested in using them on other proxies, here are the direct links:

http://www.ebuddy.com/mobile/msn.php
http://www.ebuddy.com/mobile/aim.php
http://www.ebuddy.com/mobile/yahoo.php

Enjoy!

I have been hard at work the last two months realizing many improvements to the underlying CGI Proxy code that I have always wanted to do but never had time for.

As a result we now have:

  • Rudimentary Hotmail support
  • A cleaner looking interface
  • Improved RC4 url handling
  • Proper proxying of meta refresh tags
  • Various techniques to reduce abuse and thereby improve performance for everyone
  • Improved Youtube support
  • HTML obfuscation
  • Better support for SSL
  • Several CGI Proxy Bug Fixes
  • Backend caching improvements
  • Various changes that make general maintenance much easier
  • URL Expiration

All of the above things represent a major breakthrough that will allow us to provide much better performance to end users while maintaining the profitability of providing our free service. It also means that one major part of developing free proxies is done. With the backend having most of the features and bug fixes that I’ve wanted for a long time, I can now focus my efforts on building the front end to this system, which will allow me to give free proxy hosting to anyone who wants it. Our anticipated release date is early September.

One change in particular that I’m proud of is the HTML Obfuscation routine. This encodes all the html on every page you view via proxy into hex code. In this way, content based filtering solutions will be utterly frustrated when they try to categorize the content on any page you view via proxy. Your browser can decode all this content on the fly by using a simple javascript routine, however, web based filtering solutions will be unable to do this, as they are not sophisticated enough to decode javascript. This technology is optionally available on the home pages and ancillary pages on the proxies I run, both for myself and for others. I have activated obfuscation support on all the proxies owned by myself. This means that web based filters will have to manually categorize any of my proxies, keeping them from being blocked for longer periods of time and at a larger number of institutions and networks. As a noticed side benefit, obfuscating all html code improved the performance of all proxies by eliminating abusive bot access to the proxies I run. I suppose the vast majority of bots cannot parse links on pages that were dynamically inserted with javascript. So in addition to protecting user’s privacy and making my site harder to block, I scored a big win on performance by eliminating a huge source of automated and abusive access.
Nothing comes for free however, and this particular technique has a couple of drawbacks.

  • First of all, because all page content is wrapped in a layer of javascript, it makes it nearly impossible to support proxying of javascript code via my service. Javascript does not interpret page elements that have been dynamically written to the page in the same way that it can reference elements that were hard coded into the page. As a result, most websites out there would have to have massive changes to their codebase made in order to be compatible with the html obfuscation system. Because I am no javascript expert, and because javascript behaves inconsistantly in general and especially accross platforms, support for javascript proxying would almost certainly come at a cost of disabling obfuscation.
  • Secondly, wrapping all html in a hex encoding greatly increases bandwidth usage, since the majority of characters that used to take up just one byte now are represented by three characters apiece. To deal with this problem, I had to enable Gzip compression on my proxy, which necessarily increases cpu usage, but dealt with the huge bandwidth usage increase from this change.
  • Third, by obfuscating the home pages of the proxies, the content of my sites becomes invisible to search engines and all bots. This means that the benefit of having my sites stay unblocked longer is offset by the corresponding reduction in traffic that search engines can send me, because there are virtually no keywords whatsoever that spiders can use to categorize my site. Users who use contextual advertising on their homepage (such as adsense) need to be careful, because enabling obfuscation on the entirety of those pages will make it impossible for the contextual ad network to properly target your ads.

I believe that the massive improvements in user privacy, unblock-ability, and reduction in abuse more than make up for the above tradeoffs, but I felt I should explain them since they are significant. When I launch, users will be able to use tags on their home pages to decide what parts (if any) of their home page should be obfuscated. Meanwhile, I am deciding that html obfuscation is more important than javascript support, especially given the fact that javascript support in the current version of cgi proxy is incredibly buggy to begin with.

Thank you all for visiting my site. I am very excited to be working on the free proxies project.

All the proxies hosted by freeproxies.org now have youtube support. We’re a bit late in announcing this here, simply because we wanted to make sure our servers could handle the load before making anything official. We’re proud to say its a great success! Now you can watch all those youtube videos at work or school.

For those of you who cannot go to freeproxies because “proxy” is in the url, we’ve pointed freetunnels.com .net and .org to load this site as well. Enjoy!

Every now and then, someone asks me “Hey, how do I ‘tune’ my webserver to better run these proxies? I’ve heard tuning is really important but what does it mean?” or they’ll ask “why is it important to run a dedicated server just for a proxy site? Can’t I run my sites on (some cheaper hosting method)?”

Hopefully once you’re done reading this article, you’ll better understand the answers to these common questions, and come out with a knowledge of how specifically to tune apache for J Marshall’s Cgi-Proxy script to run on Apache. You should also be able to take the concepts herein and apply them to performance tuning Apache for any website. So without further ado, I give you:

The Freeproxies Guide to Tuning Apache To run CGI-Proxy

Optimization is a matter of the right tool for the job, and reducing dependance on whichever resource is your current bottleneck. I find that with cgi proxy, having plenty of ram is a necessity, and that tuning your apache maxclients, read timeout, keepalive timeout, requests per child, and max keepalive requests are essential to a properly functioning server. First of all, to edit these and all the other apache configurations, you need to edit your httpd.conf file (with the text editor of your choice). All of these modifications balance end-user perceived speed against ram and cpu usage.

The default settings of apache are pretty good for lightly trafficked servers to improve the response time the end user perceives, as well as supporting all the options and features a webmaster might want to use. This is at the expense of greater ram use, and occasionally, increased cpu usage. Therefore, using a more specialized apache configuration can get you a lot more performance for the task at hand.

The settings we’ll be going for, since cgi proxy tends to be ram limited (once you’ve installed mod_perl), are designed to balance a user’s perceived site loading speed against the need to minimize the number of client slots that are being used up. So the first thing we’ll mention is the “maxclients” parameter, since it’s the simplest one to understand and also the hardest one to just give you a good value for. Noah didn’t come down from on high to deliver the proper maxclients value, so lets explore what goes into that decision.

Maxclients, in case you’re wondering, is the maximum number of people that can be connected to your webserver at one time. We put a limit on max clients because each connection takes a certain amount of cpu and ram to process, and if we run out of either, the site will load very slowly and the server may even crash. Maxclients therefore needs to be set so that you will run out of client slots before you run out of ram.

Each client slot is another copy of apache running on your server, and depending on what modules you’ve loaded into it, will use up different amounts of ram. Your goal in general is to reduce the amount of per-client ram usage while still having the modules you need installed to run your web service properly. Since cgi-proxy uses perl, mod_perl is absolutley necessary to get any kind of reasonable performance level. mod_perl does throw an entire copy of perl into each apache process, so ram usage is quite high when you use mod_perl, especially with cgi proxy. However, you gain massive savings in cpu usage, so it isn’t worth considering going without mod_perl. Exactly the benefits and pitfalls of mod_perl are best left to another article.

At this point, we can therefore say that your total ram needs are as follows:

Ram needs = Ram usage per client slot * number of slots needed + operating system overhead.

And, to find out the maxclients value you can support, you can use the following equation:

Maxclients = (Total system ram - Operating system overhead) / Ram usage per client slot.

So our goal here is to reduce usage per client slot, and reduce the total number of slots needed to serve a given amount of traffic. Reducing operating system overhead is outside of the scope of this article. Luckily there are a few things we can do with our apache configuration to reduce the number of client slots we’ll be using and to reduce the amount of ram each one takes up. In httpd.conf, here is what we’ll be editing:

Keepalive Timeout: This should be be low on any busy site so that client slots free up for new users quickly. Keepalive timeout is the number of seconds a client slot will be sitting around unused, waiting for its client to make new requests before it gives up and allows itself to be used by another client instead. The advantage to keepalives are that the same client doesn’t have to try to connect to your site over and over again to get multiple resources (images, web pages, etc). Performance will suffer greatly if you turn off keepalives entirely, but the default of 15 seconds is way way too high for our usage. On my sites, I use a keepalive timeout of 1 second. This is long enough for a user downloading a single page to get all the page elements downloaded without having to continually reconnect to my server, but short enough to free up the slot quickly for other users.

Connect timeout: For the same reasons, connect timeout should be low. The default of 300 seconds means a bad client can take up a client slot for 5 minutes without using it! Depending on my mood, I’ve set connect timeout between 5 and 15 seconds. You can start with a low number and raise it later if you notice users are getting timeout errors.

Requests per child: This should also be a low number. Other apache tuning guides recommend a number in the thousands or tens of thousands. That’s great when ram is not at a premium, or when clients tend not to use more ram as they serve more requests, as it means apache won’t spend much time killing and creating new apache instances. It is important to understand however, what effect this has on memory usage.

When a child apache process starts, most of the ram it takes up is actually shared with other programs (shared pages). As a process changes the data stored in these shared pages, it has to make a copy just for itself. As time goes on, a process will be using more and more real memory, and less shared memory. Therefore, to free up ram (at the cost of some cpu), we will eventually kill off an apache child and replace it with a new one. The more often you do this, the less ram you use, so for our purposes, we want a low requests / child process.

How low is low? For cgi proxy, anywhere from 50-150 works for me. Setting to 1000 or higher is a death wish for our proxy (but is fine, I’m told, for other kinds of websites). Keep in mind that a keepalive request does not count against this total requests / child, so you can actually get substantially more mileage out of each apache child process than this setting implies.

Max keepalive requests is a somewhat less important figure. This is the maximum number of items apache is willing to serve to a single client from a single connection. I have mine set to 100. Basically, you want to make sure that an entire pageview from a user can be served within a single http connection. This is so that the user doesn’t have to disconnect and renegotiate a new connection to your server in order to download everything needed for that pageview. Because the overhead in creating dozens of connections is so high, you will want to have keepalives turned on to 1 second instead of disabling keepalives altogether. The main reason you’d want a low keepalive requests value is to make sure no single user can monopolize a client slot and “own” it forever.

Minspare, Maxspare and Startservers dont tend to matter as much as the other settings. Start servers should probably be pretty close to maxspare, but not more than maxspare. If starservers was higher than maxspare, then when you start apache it’s going to create all these new apache threads to satisfy start servers, and then immediately kill them to satisfy maxspare. I personally have minspare of 20, maxspare of 100 and start servers of 100, but keep in mind these should be some reasonable proportion of your maxclients. The things to keep in mind are that you dont want to waste ram by having too many idle clients sitting around, but you do want to have some extra ones around in case you get a burst of traffic. You dont want min and max spare to be too close together because then you’ll be constantly creating and killing apache processes, which eats up cpu time.

I know I said before I wouldn’t be giving you maxclients values handed done from a stone tablet, but you’ve got to start somewhere, so here’s what I use:

For cgi proxy, 200 is a reasonable maxclients value to test for on a server with 2gb ram. On my 8gb servers, i’m testing out 550 slots, but I figure I can probably go a bit higher. If you max your clients and you have more free ram, you can add more to maxclients. If you start using swap memory excessively (or even have a server crash as you run out of ram *and* swap), then you need to reduce your maxclients.

If you make use of ssl, or you have php or other large modules compiled into apache, then you likely will need fewer clients. For static page serving, or even php hosting, these maxclients values may seem abysmally low… and they are. But, cgi proxy uses a lot of cpu, so if everything on your server is working properly, these number of clients should be good enough to max your cpu.

There’s a few other things we can do to performance tune our server to get maximum mileage out of it. We’ve already discussed how many client slots you *can* have with your given ram and configuration, and now we’re going to figure out how many we actually need. This can be found by the following equation:

Slots needed = (requests / second) * (seconds to process each request)

So if you want more requests / second (more traffic), you need to reduce the time apache spends serving each request, or you need to add more slots (more ram).

Interestingly, the time it takes to process a request can be broken up into a few areas (a proxy adds a couple that a normal website wouldn’t have):

1) client connection handshaking
2) dns resolution (since we’re proxying content, we need to know the ip of the server the user is requesting content from)
3) downloading the target website from the internet
4) doing cgi-proxy related processing
5) sending the request result back to the user

As you can see, theres a lot of things that can conspire against you to use up valuable client slots.

Bad ping times or congested bandwidth between your server and the target website or client will make parts 1), 3) and 5) go slower, increasing seconds per request and requiring more client slots to fufill the same number of requests. You may think this is not under your control, but actually, you can reduce the time apache spends on all three parts by setting up Squid to run as a reverse proxy (also known as web server acceleration mode). Doing this makes squid responsible for “spoon feeding” the cgi proxy results to your clients, so that apache doesn’t have to. Because squid uses less ram per connection than apache, our expensive apache client slots can go ahead and work on more important matters than dealing with client connections. Setting up squid is outside the scope of this article, but it is something worth investigating.

A maxed cpu will make part 4) go slower, so if you’re maxing out your cpu, you will end up using more slots as each request is taking longer to process. By having more slots you’ll be processing more requests, and each of those will go slower still. If you run out of real memory and start using the swapfile, the amount of time taken for cgi-proxy processing is going to go through the roof, exacerbating this problem. If you dont have a reasonable upper bound on the number of requests your server is willing to process, this can cause your server to max out its ram *and* swap, and then crash.

Section 2 is less obvious, but if your dns resolvers are slow or the first one in your list is down, this will require every dns request to timeout before your server can ask the next server in the list to fufill the request. This will substantially increase the time needed to serve every page. You will therefore get fewer pageviews as your users get impatient and leave, and your client slots will easily max as it is taking a lot longer to serve each page request.

Your goal therefore, is to resolve dns names as quickly as possible. Your isp or datacenter should have provided the ip addresses of their own resolvers that you can use. You can edit your resolvers in /etc/resolv.conf in most linux installations. If you have Bind installed on your server, you can make the first resolver in the list 127.0.0.1, which has your server resolve hostnames locally. This can speed things up dramatically. You will still need other resolvers in the list, because if you can’t resolve a name locally, bind will go ask another server to do it for you, and then cache the result for later use.

Finally, to save on ram, you should disable as many modules in apache as you can, while still having your proxy work correctly. The added modules and extentions for apache can be edited in httpd.conf. For cgi proxy, this means you definitely dont want php compiled into the same server you run mod_perl on, if at all possible. You also should turn off the ssl related modules unless you actually use them, as this is a major ram hog as well. The rest of the common modules you can do without are fairly small, but in total, removing them can get you that last bit of power you want out of your server.

In httpd.conf, go down the list of modules and google for them. For most modules, the first google result should be the apache documentation for that module. After reading about the module, if you decide you can go without it, comment out the line, restart apache, and make sure your service still works as expected. Go through that for every module in the list, and you can improve your memory usage significantly.

WHEW! That was a lot to take in all at once wasn’t it? Congratulations on making it this far.

All these modifications make for a fairly good working environment for cgi proxy, but a pretty poor working environment for a typical webserver, which is why proxies should have their own dedicated servers tuned for their specific use. Shared hosting is an especially bad place for a proxy because a shared host needs to have all kinds of modules installed to satisfy the diverse needs of its customer base, as well as provide good management and reporting tools for the hosting company. Most hosting providers do not or will not install mod_perl for ram usage and security reasons. Without mod_perl, even the smallest web proxy can bring a server to its knees.

This article only scratches the surface as far as the different things you can do to optimize your server and get the most out of your limited computing resources. There are other more advanced optimizations you could do, involving setting up complicated caching arcitectures, or significantly modifying the proxy code to either do less stuff, or do the same stuff faster.

If you’ve just got done eagerly implementing all the advice in this article, and are thirsting for more great ideas, there is a book I would recommend to you. I just got done reading a book that is very informative about everything you would need to know (conceptually at least) for building scalable internet architectures. Not surprisingly, the book is called “Scalable Internet Architectures” by Theo Schlossnagle. In particular, the book goes into detail on using squid to improve serving performance, setting up failsafe hosting environments, and designing applications that scale. Anyone who thinks their web application may grow to use more than one server can benefit from the concepts detailed in this book.

The RC4 test has produced the desired results, and seems stable enough for general usage. Now, when you use the polysolve.com proxy, you have an option as to wether you want to rc4 encode your urls, or use the more standard rot13/hex encoding. This should help get around some filters that may filter against rot13 encoded text. In the near future, all freeproxies.org proxies will support RC4 encoding as an option.

As part of our goal to reduce the blockability of the proxies we host, we are now testing a new url encoding that uses a timestamp as the encryption key to encrypt proxied urls with RC4. You can try out a proxy using this encoding at http://test.polysolve.com.

The advantages of using RC4 encryption for urls, as opposed to more normal rot13 or hex encoding, is that for a given website you are visiting, the url used to reach that website is not always the same. This is important when you realize that all proxied urls contain an encoded version of “http”, which, with hex / rot13 encoding, always displays as “68747470″, which can easily be string matched by url filters, effectively blocking any proxy that uses hex / rot13, with a reletively low false positive rate.

In plain english, this means that it will be harder to make rules that effectively block all proxies, and instead admins and censors will have to block proxies on an individual basis, which is time consuming and error prone. Another win for anonymity and free speech!

Welcome to the freeproxies blog. Here we will be updating you on the development of the free proxy hosting service, as well as providing useful tidbits of information regarding administrating your own proxy.

« Previous Page