bin/148150: Poor file(1) performance

Garrett Cooper yanefbsd at gmail.com
Sat Jun 26 01:00:18 UTC 2010


The following reply was made to PR bin/148150; it has been noted by GNATS.

From: Garrett Cooper <yanefbsd at gmail.com>
To: Peter Jeremy <peterjeremy at acm.org>
Cc: FreeBSD-gnats-submit at freebsd.org
Subject: Re: bin/148150: Poor file(1) performance
Date: Fri, 25 Jun 2010 17:54:37 -0700

 On Fri, Jun 25, 2010 at 3:56 PM, Peter Jeremy <peterjeremy at acm.org> wrote:
 >
 >>Number: =A0 =A0 =A0 =A0 148150
 >>Category: =A0 =A0 =A0 bin
 >>Synopsis: =A0 =A0 =A0 Poor file(1) performance
 >>Confidential: =A0 no
 >>Severity: =A0 =A0 =A0 non-critical
 >>Priority: =A0 =A0 =A0 low
 >>Responsible: =A0 =A0freebsd-bugs
 >>State: =A0 =A0 =A0 =A0 =A0open
 >>Quarter:
 >>Keywords:
 >>Date-Required:
 >>Class: =A0 =A0 =A0 =A0 =A0sw-bug
 >>Submitter-Id: =A0 current-users
 >>Arrival-Date: =A0 Fri Jun 25 23:00:09 UTC 2010
 >>Closed-Date:
 >>Last-Modified:
 >>Originator: =A0 =A0 Peter Jeremy
 >>Release: =A0 =A0 =A0 =A0FreeBSD 8.1-PRERELEASE amd64
 >>Organization:
 > n/a
 >>Environment:
 > System: FreeBSD server.vk2pj.dyndns.org 8.1-PRERELEASE FreeBSD 8.1-PREREL=
 EASE #4: Sun Jun 13 09:18:30 EST 2010 root at server.vk2pj.dyndns.org:/var/obj=
 /usr/src/sys/server amd64
 >
 > FreeBSD aspire.vk2pj.dyndns.org 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #12=
 : Mon Jun 14 11:34:12 EST 2010 =A0 =A0 root at builder.vk2pj.dyndns.org:/obj/u=
 sr/src/sys/aspire =A0i386
 >
 >>Description:
 >
 > =A0 =A0 =A0 =A0I recently had reason to run file(1) on a large number of
 > =A0 =A0 =A0 =A0files and felt the performance wasn't up to par. =A0When I
 > =A0 =A0 =A0 =A0investigated further, I found that about 95% of the runtim=
 e
 > =A0 =A0 =A0 =A0related to the two regex's to recognize REXX files:
 >
 > # OS/2 batch files are REXX. the second regex is a bit generic, oh well
 > # the matched commands seem to be common in REXX and uncommon elsewhere
 > 100 =A0 =A0 regex/c =3D3D^[\ \t]{0,10}call[\ \t]{1,10}rxfunc OS/2 REXX ba=
 tch file text
 > 100 =A0 =A0 regex/c =3D3D^[\ \t]{0,10}say\ ['"] =A0 =A0 =A0 =A0 =A0 =A0OS=
 /2 REXX batch file text
 >
 > =A0 =A0 =A0 =A0Since REXX files are not present in my environment, I can
 > =A0 =A0 =A0 =A0avoid the issue by just commenting out the offending lines=
 .
 > =A0 =A0 =A0 =A0Someone with more expertise in magic(5) might be able to
 > =A0 =A0 =A0 =A0suggest a better fix.
 >
 > =A0 =A0 =A0 =A0I have tried reporting this to the upstream maintainers an=
 d
 > ` =A0 =A0 =A0 received a "not interested" response.
 >
 >>How-To-Repeat:
 > =A0 =A0 =A0 =A0Copy /usr/share/misc/magic to magic.old
 > =A0 =A0 =A0 =A0Apply the equivalent of the below patch to create magic.ne=
 w
 > =A0 =A0 =A0 =A0time(1) file(1) on the same set of files using magic.old a=
 nd magic.new
 >
 > =A0 =A0 =A0 =A0Using my home directory on my i386 netbook, I get:
 > file -m magic.new * > /dev/null =A01.42s user 0.13s system 98% cpu 1.576 =
 total
 > file -m magic.new * > /dev/null =A01.35s user 0.10s system 98% cpu 1.469 =
 total
 > file -m magic.new * > /dev/null =A01.35s user 0.10s system 98% cpu 1.470 =
 total
 > file -m magic.old * > /dev/null =A033.35s user 0.11s system 98% cpu 34.05=
 5 total
 > file -m magic.old * > /dev/null =A033.12s user 0.14s system 98% cpu 33.71=
 4 total
 > file -m magic.old * > /dev/null =A033.08s user 0.11s system 98% cpu 33.60=
 6 total
 >
 > =A0 =A0 =A0 =A0Using my home directory on my amd64 desktop, I get:
 > file -m magic.new * > /dev/null =A02.18s user 0.41s system 28% cpu 9.111 =
 total
 > file -m magic.new * > /dev/null =A02.11s user 0.49s system 24% cpu 10.707=
  total
 > file -m magic.new * > /dev/null =A02.05s user 0.56s system 23% cpu 10.989=
  total
 > file -m magic.old * > /dev/null =A028.54s user 0.51s system 78% cpu 37.08=
 8 total
 > file -m magic.old * > /dev/null =A028.54s user 0.52s system 89% cpu 32.57=
 5 total
 > file -m magic.old * > /dev/null =A028.71s user 0.47s system 99% cpu 29.37=
 1 total
 >
 > =A0 =A0 =A0 =A0The poorer wallclock performance on my amd64 is because it=
 's
 > =A0 =A0 =A0 =A0running ZFS without adequate RAM whereas my netbook is UFS=
  on SSD
 > =A0 =A0 =A0 =A0and the actual directory contents are completely different=
 .
 >>Fix:
 >
 > =A0 =A0 =A0 =A0The following just comments out the REXX test.
 >
 > Index: Magdir/msdos
 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 > RCS file: /usr/ncvs/src/contrib/file/Magdir/msdos,v
 > retrieving revision 1.3
 > diff -u -r1.3 msdos
 > --- Magdir/msdos =A0 =A0 =A0 =A04 May 2009 00:37:44 -0000 =A0 =A0 =A0 1.3
 > +++ Magdir/msdos =A0 =A0 =A0 =A019 Jun 2010 03:23:23 -0000
 > @@ -18,8 +18,8 @@
 >
 > =A0# OS/2 batch files are REXX. the second regex is a bit generic, oh wel=
 l
 > =A0# the matched commands seem to be common in REXX and uncommon elsewher=
 e
 > -100 =A0 =A0regex/c =3D^[\ \t]{0,10}call[\ \t]{1,10}rxfunc OS/2 REXX batc=
 h file text
 > -100 =A0 =A0regex/c =3D^[\ \t]{0,10}say\ ['"] =A0 =A0 =A0OS/2 REXX batch =
 file text
 > +#100 =A0 regex/c =3D^[\ \t]{0,10}call[\ \t]{1,10}rxfunc OS/2 REXX batch =
 file text
 > +#100 =A0 regex/c =3D^[\ \t]{0,10}say\ ['"] =A0 =A0 =A0OS/2 REXX batch fi=
 le text
 
 FWIW I think that this is more indicative of poor regexp(3)
 performance or possibly tighter constraints placed on the regexp
 compiler / parser to do the act of parsing the string.
 
 Not saying that what you proposed isn't valid, but it's definitely an
 interesting note that should be brought up to the upstream folks.
 
 Thanks!
 -Garrett


More information about the freebsd-bugs mailing list