Tools to find "unlegal" files ( videos , music etc )
luvbeastie at larseighner.com
Tue Jul 19 09:53:44 UTC 2011
On Tue, 19 Jul 2011, C. P. Ghost wrote:
> Speaking with my university sysadmin hat on: you're NOT allowed to
> peek inside personal files of your users, UNLESS the user has waived
> his/her rights to privacy by explicitly agreeing to the TOS and
> there's legal language in the TOS that allows staff to inspect files
> (and then staff needs to abide by those rules in a very strict and
> cautious manner). So unless the TOS are very explicit, a sysadmin or
> an IT head can get in deep trouble w.r.t. privacy laws.
Yes, but I am not an expert on privacy laws in France, and I suspect
you are not either. Whether examining the magic number (first four bytes)
of a file constitutes a breach of privacy is a matter for legal advice
applicable to the particular jurisdiction. You certainly can look at the
external package: file size and name.
>> You may want to look for files that are unusually large.
>> They could possibly be ISOs, dvdrips, HD movie dumps...
> Not to forget encrypted RAR files (which btw. could contain anything,
> including legitimate content, so be careful here).
>> We have the same problem here with users sharing movies on the file
>> servers, and what makes it worse is some of their movie files are
>> legit because they're, for example, official trailers that are
>> reworked and redistributed to our customers.
>> You won't win this, tell your boss it can not be done.
> What can technically be done is that the copyright owner provides a
> list of hashes for his files, and requests that you traverse your
> filesystems, looking for files that match those hashes. AND, even
> then, all you can do is flag the files, and you'll have to check with
> the user that he/she doesn't own a license permitting him/her to own
> that file!
You cannot generate a hash without at a certain automated level opening the
file. If you can do that, couldn't you generate a hash of the first four
bytes to match with hashes of known magic numbers? If you can "look" at the
whole file, surely you can "look" at just the first four bytes.
Of course software cannot determine legal issues, such as whether works are
properly licensed or are pornographic according to local legislation, etc.
> However, even that isn't foolproof: nothing prevents a user from
> flipping a bit or two, rescaling, resampling, splitting the files into
> multiple files in a non-obvious manner, adding random bytes at the end
> etc...: the result would still be infringing, but can't be detected
> automatically (at least not in a reasonable amount of time).
This is a bit like security. There is no absolute that can be achieved.
You don't have to be smarter than God, you just have to be smarter than the
users. Now the whole point of infringing schemes is that most dumb users
have to be able to use the files they download. They can reasonablely do
things like rename the files or pass them through a commonly available
decoder. No point in trying to "file share" if users have to be the NSA to
play the music.
You can scan (where legal) for the common stuff. You can't find stuff
encoded by Dr. Evil Genius Hacker -- but neither can the party claiming to
be infringed and neither can Suzie Shebop who just wants free music.
8800 N IH35 APT 1191 AUSTIN TX 78753-5266
More information about the freebsd-questions