Script help needed please

Jack L. Stone jackstone at
Thu Aug 14 08:36:59 PDT 2003

At 03:44 PM 8.14.2003 +0100, Jez Hancock wrote:
>On Thu, Aug 14, 2003 at 08:49:49AM -0500, Jack L. Stone wrote:
>> Server Version: Apache/1.3.27 (Unix) FrontPage/ PHP/4.3.1
>> The above is typical of the servers in use, and with csh shells employed,
>> plus IPFW.
>> My apologies for the length of this question, but the background seems
>> necessary as brief as I can make it so the question makes sense.
>> The problem:
>> We have several servers that provide online reading of Technical articles
>> and each have several hundred MB to a GB of content.
>> When we started providing the articles 6-7 years ago, folks used browsers
>> to read the articles. Now, the trend has become a more lazy approach and
>> there is an increasing use of those download utilities which can be left
>> unattended to download entire web sites taking several hours to do so.
>> Multiply this by a number of similar downloads and there goes the
>> bandwidth, denying those other normal online readers the speed needed for
>> loading and browsing in the manner intended. Several hundred will be
>> reading at a time and several 1000 daily.
>There is no easy solution to this, but one avenue might be to look at
>bandwidth throttling in an apache module.
>One that I've used before is mod_throttle which is in the ports:
>which allows you to throttle users by ip address to a certain number of
>documents and/or up to a certain transfer limit.  IIRC it's fairly
>limited though in that you can only apply per IP limits to _every_
>virtual host - ie in the global httpd.conf context.
>A more finegrained solution (from what I've read, haven't tried it) is
>mod_bwshare - this one isn't in the ports but can be found here:
>this module overcomes some of the shortfalls of mod_throttle and allows
>you to specify finer granularity over who consumes how much bandwidth
>over what time period.
>> Now, my question: Is it possible to write a script that can constantly scan
>> the Apache logs to look for certain footprints of those downloaders,
>> perhaps the names, like "HTTRACK", being one I see a lot. Whenever I see
>> one of those sessions, I have been able to abort them by adding a rule to
>> the firewall to deny the IP address access to the server. This aborts the
>> downloading, but have seen the attempts constantly continue for a day or
>> two, confirming unattended downloads.
>> Thus, if the script could spot an "offender" and then perhaps make use of
>> the firewall to add a rule containing the offender's IP address and then
>> flush to reset the firewall, this would at least abort the download and
>> free up the bandwidth (I already have a script that restarts the firewall).
>> Is this possible and how would I go about it....???
>If you really wanted to go down this route then I found a script someone
>wrote a while back to find 'rude robots' from a httpd logfile which you
>could perhaps adapt to do dynamic filtering in conjunction with your
>If you have any success let me know.

Interesting. Looks like a step in the right direction. Will weigh this one
along the possibilities.

Many thanks...!

Best regards,
Jack L. Stone,

SageOne Net
jackstone at

More information about the freebsd-questions mailing list