Script help needed please

Jez Hancock jez.hancock at munk.nu
Thu Aug 14 07:44:48 PDT 2003


On Thu, Aug 14, 2003 at 08:49:49AM -0500, Jack L. Stone wrote:
> Server Version: Apache/1.3.27 (Unix) FrontPage/5.0.2.2510 PHP/4.3.1
> The above is typical of the servers in use, and with csh shells employed,
> plus IPFW.
> 
> My apologies for the length of this question, but the background seems
> necessary as brief as I can make it so the question makes sense.
> 
> The problem:
> We have several servers that provide online reading of Technical articles
> and each have several hundred MB to a GB of content.
> 
> When we started providing the articles 6-7 years ago, folks used browsers
> to read the articles. Now, the trend has become a more lazy approach and
> there is an increasing use of those download utilities which can be left
> unattended to download entire web sites taking several hours to do so.
> Multiply this by a number of similar downloads and there goes the
> bandwidth, denying those other normal online readers the speed needed for
> loading and browsing in the manner intended. Several hundred will be
> reading at a time and several 1000 daily.
<snip>
There is no easy solution to this, but one avenue might be to look at
bandwidth throttling in an apache module.

One that I've used before is mod_throttle which is in the ports:

/usr/ports/www/mod_throttle

which allows you to throttle users by ip address to a certain number of
documents and/or up to a certain transfer limit.  IIRC it's fairly
limited though in that you can only apply per IP limits to _every_
virtual host - ie in the global httpd.conf context.

A more finegrained solution (from what I've read, haven't tried it) is
mod_bwshare - this one isn't in the ports but can be found here:

http://www.topology.org/src/bwshare/

this module overcomes some of the shortfalls of mod_throttle and allows
you to specify finer granularity over who consumes how much bandwidth
over what time period.

> Now, my question: Is it possible to write a script that can constantly scan
> the Apache logs to look for certain footprints of those downloaders,
> perhaps the names, like "HTTRACK", being one I see a lot. Whenever I see
> one of those sessions, I have been able to abort them by adding a rule to
> the firewall to deny the IP address access to the server. This aborts the
> downloading, but have seen the attempts constantly continue for a day or
> two, confirming unattended downloads.
> 
> Thus, if the script could spot an "offender" and then perhaps make use of
> the firewall to add a rule containing the offender's IP address and then
> flush to reset the firewall, this would at least abort the download and
> free up the bandwidth (I already have a script that restarts the firewall).
> 
> Is this possible and how would I go about it....???
If you really wanted to go down this route then I found a script someone
wrote a while back to find 'rude robots' from a httpd logfile which you
could perhaps adapt to do dynamic filtering in conjunction with your
firewall:

http://stein.cshl.org/~lstein/talks/perl_conference/cute_tricks/log9.html

If you have any success let me know.

-- 
Jez

http://www.munk.nu/


More information about the freebsd-questions mailing list