performance impact of large /etc/hosts files

Wed Dec 12 04:54:19 PST 2007

Am Mittwoch, 12. Dezember 2007 13:38:59 schrieben Sie:
> I want to do precisely the opposite. It should affect only a single
> machine. It would even be better if it would affect only a single
> account on that machine.

Affecting only a single machine/a single account has nothing to do with the 
fact that you manage and implement it centrally; the two concepts are 
orthogonal.

Basically, this should come around to giving squid (from what I'd do in your 
case) different rule sets based on authentication to the proxy and/or 
originating IP in your internal network, which leads to different behaviour 
depending on the accessing person/program.

Basically, why I personally rather like the squid (i.e., proxy-based) approach 
to ad-blocking is the fact that if you try to do this at a lower level than 
the HTTP-level, there's bound to be pages that display wrong/broken, simply 
because not being able to fetch images (because they supposedly come 
from "localhost") means that most browsers are not going to display the space 
reserved to it and will mess up the page layout, even when specifying width= 
_and_ height= in an img-tag (when only specifying one of the attributes or 
none, the page layout will be broken anyway). Opera is my favourite candidate 
for messing up page layouts in this case.

On another note, Opera has an (IMHO) huge timeout for failed (i.e., refused, 
not timed out) connections to the target host, and if many images refer to 
localhost through some DNS or hosts magic, this is going to majorly slow down 
page display/buildup on non-css based layouts, which sadly there still are 
enough out there (and for some of which the ad-slots are an integral part of 
the page layout, such as some german news sites).

If you do the blocking at the topmost level (i.e., through squid or some other 
HTTP proxy), the proxy can generate an empty/transparent image with the 
appropriate proportions to fill the now void space, which the extension I 
referenced earlier will do automatically for you. This doesn't stop the 
connection to the ad host from happening (i.e., isn't a traffic saver, but 
who cares about that nowadays I'd say), but it does stop the end-user from 
seeing the ad (and/or its content). It even allows you more fine-grained 
control over which URLs to block, so that you don't have to filter by host 
specifically, but might also filter by directory (which is required at some 
sites, as the ads/unwanted content comes from the same host as the actual 
content you're interested in).

It's a matter of choice how much duress you want the end-user to endure, 
basically, seeing that user-based discrimination on a proxy also requires 
authentication (unless you implement packet redirects on the end-user 
machines to different ports of the firewall depending on the user originating 
the outgoing packet, but this is just as bad to keep synchronized in the 
end). But, anyway, it would be my way to go to achieve what you're trying to 
do efficiently.

Just my 5 (Euro)-cents.

-- 
Heiko Wundram
Product & Application Development