'file' Command Giving False Positives

Lowell Gilbert freebsd-questions-local at be-well.ilk.org
Fri Jul 2 20:33:10 UTC 2010

Tim Daneliuk <tundra at tundraware.com> writes:

> At this point, I'm inclined to believe that 'file' alone is
> insufficient to do this and, at best - even with more tools -
> it's going to be a probabilities game - i.e. "What percentage
> of false positives is acceptable?"

file(1) is only intended to be a set of heuristics.  It has a remarkably
good set of heuristics at this point, but you're right that this cannot
be solved simply by analyzing the contents of the files.  For use in a
system that you expect to scale, you will always be better off keeping
meta-data in some other form (if you can, which is frequently not
possible).  If the whole data path is under your (customer's) control,
it's not so hard; you can use file names, or put every file into a tar
file along with a text file that indicates the data type, and on and on
through as many approaches as you have the time to dream up.  [If my
examples are unclear, I can expand on them to make the point better.]

This is made considerably worse by the fact that you've said that your
files are encrypted.  Some forms of encryption store some meta-data at a
known place (like first) in the file, but generally this won't be the
case.  Now consider that there is a finite chance of running into a
combination of cleartext, encryption, and password that you end up with
an encrypted file that happens to have exactly the same contents as
/bin/ls (it's vanishingly unlikely that this exact scenario would
happen, but it's a good illustration of the problem).  

All of which is just agreeing with your suggestion that it's a
"probabilities game" of reducing the error rate to acceptability; UNLESS
you can control some other source of information.  For an example of the
latter, I have a backup file from this morning, named
"be-well.100702._usr.l2.dump.gz.idea".  If the files are coming in from
the outside (untrustworthy input), you can't do this.  One thing you
*could* do in that case is use a custom magic(5) file for this
application.  You may well not care about input that really is an MS-DOS
executable, so you can remove the patterns for all of them.  Or AmigaOS,
or laser printer firmware, or...

Anyway, good luck.

More information about the freebsd-questions mailing list