Tools to find "unlegal" files ( videos , music etc )

C. P. Ghost cpghost at cordula.ws
Tue Jul 19 10:08:44 UTC 2011


On Tue, Jul 19, 2011 at 11:51 AM, Lars Eighner
<luvbeastie at larseighner.com> wrote:
> On Tue, 19 Jul 2011, C. P. Ghost wrote:
>
>> Speaking with my university sysadmin hat on: you're NOT allowed to
>> peek inside personal files of your users, UNLESS the user has waived
>> his/her rights to privacy by explicitly agreeing to the TOS and
>> there's legal language in the TOS that allows staff to inspect files
>> (and then staff needs to abide by those rules in a very strict and
>> cautious manner). So unless the TOS are very explicit, a sysadmin or
>> an IT head can get in deep trouble w.r.t. privacy laws.
>
> Yes, but I am not an expert on privacy laws in France, and I suspect
> you are not either.  Whether examining the magic number (first four bytes)
> of a file constitutes a breach of privacy is a matter for legal advice
> applicable to the particular jurisdiction.  You certainly can look at the
> external package: file size and name.

Fair enough. Automatically scanning files, hashing them etc... may or
may not run afoul privacy laws... which vary widely from jurisdiction
to jurisdiction. And yes, I'm no expert on french privacy laws.

>> What can technically be done is that the copyright owner provides a
>> list of hashes for his files, and requests that you traverse your
>> filesystems, looking for files that match those hashes. AND, even
>> then, all you can do is flag the files, and you'll have to check with
>> the user that he/she doesn't own a license permitting him/her to own
>> that file!
>
> You cannot generate a hash without at a certain automated level opening the
> file.  If you can do that, couldn't you generate a hash of the first four
> bytes to match with hashes of known magic numbers? If you can "look" at the
> whole file, surely you can "look" at just the first four bytes.

To check the magic numbers, you don't need a hash. Just check the
magic numbers (where legally allowable). However, a magic number would
merely say: this is an MP3, this is a MPEG file etc...: it is just a
hint (and a very weak one at that) as to the types of files. You as
staff will STILL have to manually look at the file: the MP3 could
contain random noise, the MPEG could contain a private video or video
letter etc.

So practically, you'll get a list of users owning multimedia
files. Unless your organization forbids files by content type, you
still face the problem of identifying the "infringingness" of said
files, and this can only be done reliably by manual (human)
inspection. And here, we're right again deep in privacy protection
land where things get incredibly hairy.

>> However, even that isn't foolproof: nothing prevents a user from
>> flipping a bit or two, rescaling, resampling, splitting the files into
>> multiple files in a non-obvious manner, adding random bytes at the end
>> etc...: the result would still be infringing, but can't be detected
>> automatically (at least not in a reasonable amount of time).
>
> This is a bit like security.  There is no absolute that can be achieved. You
> don't have to be smarter than God, you just have to be smarter than the
> users.  Now the whole point of infringing schemes is that most dumb users
> have to be able to use the files they download.  They can reasonablely do
> things like rename the files or pass them through a commonly available
> decoder.  No point in trying to "file share" if users have to be the NSA to
> play the music.
>
> You can scan (where legal) for the common stuff.  You can't find stuff
> encoded by Dr. Evil Genius Hacker -- but neither can the party claiming to
> be infringed and neither can Suzie Shebop who just wants free music.

Yep.

But Dr. Evil Genius Hacker could write a user friendly program that
does all this, and John Stupiduser Doe would still be able to use
it. Just think of the encrypted RAR files: how many users know how
encryption works?  Yet, it's the most widely used form for sharing
files nowadays by countless technically ignorant users.

> Lars Eighner
> http://www.larseighner.com/index.html
> 8800 N IH35 APT 1191 AUSTIN TX 78753-5266

-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/


More information about the freebsd-questions mailing list