Script help needed please
Jez Hancock
jez.hancock at munk.nu
Thu Aug 14 07:44:48 PDT 2003
On Thu, Aug 14, 2003 at 08:49:49AM -0500, Jack L. Stone wrote:
> Server Version: Apache/1.3.27 (Unix) FrontPage/5.0.2.2510 PHP/4.3.1
> The above is typical of the servers in use, and with csh shells employed,
> plus IPFW.
>
> My apologies for the length of this question, but the background seems
> necessary as brief as I can make it so the question makes sense.
>
> The problem:
> We have several servers that provide online reading of Technical articles
> and each have several hundred MB to a GB of content.
>
> When we started providing the articles 6-7 years ago, folks used browsers
> to read the articles. Now, the trend has become a more lazy approach and
> there is an increasing use of those download utilities which can be left
> unattended to download entire web sites taking several hours to do so.
> Multiply this by a number of similar downloads and there goes the
> bandwidth, denying those other normal online readers the speed needed for
> loading and browsing in the manner intended. Several hundred will be
> reading at a time and several 1000 daily.
<snip>
There is no easy solution to this, but one avenue might be to look at
bandwidth throttling in an apache module.
One that I've used before is mod_throttle which is in the ports:
/usr/ports/www/mod_throttle
which allows you to throttle users by ip address to a certain number of
documents and/or up to a certain transfer limit. IIRC it's fairly
limited though in that you can only apply per IP limits to _every_
virtual host - ie in the global httpd.conf context.
A more finegrained solution (from what I've read, haven't tried it) is
mod_bwshare - this one isn't in the ports but can be found here:
http://www.topology.org/src/bwshare/
this module overcomes some of the shortfalls of mod_throttle and allows
you to specify finer granularity over who consumes how much bandwidth
over what time period.
> Now, my question: Is it possible to write a script that can constantly scan
> the Apache logs to look for certain footprints of those downloaders,
> perhaps the names, like "HTTRACK", being one I see a lot. Whenever I see
> one of those sessions, I have been able to abort them by adding a rule to
> the firewall to deny the IP address access to the server. This aborts the
> downloading, but have seen the attempts constantly continue for a day or
> two, confirming unattended downloads.
>
> Thus, if the script could spot an "offender" and then perhaps make use of
> the firewall to add a rule containing the offender's IP address and then
> flush to reset the firewall, this would at least abort the download and
> free up the bandwidth (I already have a script that restarts the firewall).
>
> Is this possible and how would I go about it....???
If you really wanted to go down this route then I found a script someone
wrote a while back to find 'rude robots' from a httpd logfile which you
could perhaps adapt to do dynamic filtering in conjunction with your
firewall:
http://stein.cshl.org/~lstein/talks/perl_conference/cute_tricks/log9.html
If you have any success let me know.
--
Jez
http://www.munk.nu/
More information about the freebsd-questions
mailing list