Script help needed please

Jack L. Stone jackstone at sage-one.net
Thu Aug 14 08:36:59 PDT 2003


At 03:44 PM 8.14.2003 +0100, Jez Hancock wrote:
>On Thu, Aug 14, 2003 at 08:49:49AM -0500, Jack L. Stone wrote:
>> Server Version: Apache/1.3.27 (Unix) FrontPage/5.0.2.2510 PHP/4.3.1
>> The above is typical of the servers in use, and with csh shells employed,
>> plus IPFW.
>> 
>> My apologies for the length of this question, but the background seems
>> necessary as brief as I can make it so the question makes sense.
>> 
>> The problem:
>> We have several servers that provide online reading of Technical articles
>> and each have several hundred MB to a GB of content.
>> 
>> When we started providing the articles 6-7 years ago, folks used browsers
>> to read the articles. Now, the trend has become a more lazy approach and
>> there is an increasing use of those download utilities which can be left
>> unattended to download entire web sites taking several hours to do so.
>> Multiply this by a number of similar downloads and there goes the
>> bandwidth, denying those other normal online readers the speed needed for
>> loading and browsing in the manner intended. Several hundred will be
>> reading at a time and several 1000 daily.
><snip>
>There is no easy solution to this, but one avenue might be to look at
>bandwidth throttling in an apache module.
>
>One that I've used before is mod_throttle which is in the ports:
>
>/usr/ports/www/mod_throttle
>
>which allows you to throttle users by ip address to a certain number of
>documents and/or up to a certain transfer limit.  IIRC it's fairly
>limited though in that you can only apply per IP limits to _every_
>virtual host - ie in the global httpd.conf context.
>
>A more finegrained solution (from what I've read, haven't tried it) is
>mod_bwshare - this one isn't in the ports but can be found here:
>
>http://www.topology.org/src/bwshare/
>
>this module overcomes some of the shortfalls of mod_throttle and allows
>you to specify finer granularity over who consumes how much bandwidth
>over what time period.
>
>> Now, my question: Is it possible to write a script that can constantly scan
>> the Apache logs to look for certain footprints of those downloaders,
>> perhaps the names, like "HTTRACK", being one I see a lot. Whenever I see
>> one of those sessions, I have been able to abort them by adding a rule to
>> the firewall to deny the IP address access to the server. This aborts the
>> downloading, but have seen the attempts constantly continue for a day or
>> two, confirming unattended downloads.
>> 
>> Thus, if the script could spot an "offender" and then perhaps make use of
>> the firewall to add a rule containing the offender's IP address and then
>> flush to reset the firewall, this would at least abort the download and
>> free up the bandwidth (I already have a script that restarts the firewall).
>> 
>> Is this possible and how would I go about it....???
>If you really wanted to go down this route then I found a script someone
>wrote a while back to find 'rude robots' from a httpd logfile which you
>could perhaps adapt to do dynamic filtering in conjunction with your
>firewall:
>
>http://stein.cshl.org/~lstein/talks/perl_conference/cute_tricks/log9.html
>
>If you have any success let me know.
>
>-- 
>Jez
>

Interesting. Looks like a step in the right direction. Will weigh this one
along the possibilities.

Many thanks...!

Best regards,
Jack L. Stone,
Administrator

SageOne Net
http://www.sage-one.net
jackstone at sage-one.net


More information about the freebsd-questions mailing list