Strange performance issue with grep -r -i as non-root user

Jeremy Chadwick freebsd at jdc.parodius.com
Sun Mar 6 07:56:33 UTC 2011


On Sat, Mar 05, 2011 at 09:04:50PM -1000, Clifton Royston wrote:
> On Sat, Mar 05, 2011 at 07:07:20PM -0800, Jeremy Chadwick wrote:
> ...
> > $ unset LANG
> >   - Result: still 80x slower with -i
> > $ unset LANG LC_COLLATE
> >   - Result: still 80x slower with -i
> > $ unset LANG LC_CTYPE
> >   - Result: normal/fast.
> > $ unset LC_CTYPE
> >   - Result: still 80x slower with -i
> > $ unset LC_CTYPE LC_COLLATE
> >   - Result: still 80x slower with -i
> > $ unset LC_COLLATE
> >   - Result: still 80x slower with -i
> > 
> > So the LANG + LC_CTYPE combo when used together are what cause this.
> 
>   Doesn't the above say that having either one set does it?

You're correct -- I phrased this incorrectly, my apologies.

>   I would guess it's probably that either one requires the 8.x
> grep -i to make a conversion function call for each char (or perhaps
> line) of input to ensure the proper upper/lower case conversion rules
> are followed.

A colleague of mine (who I wish I would have asked first) knew of this
quirk with grep (apparently some other utilities behave oddly as well
with LANG/LC_CTYPE; he mentioned less as another example), stating that
a locale can induce very long delays like this solely due to the amount
of processing needed to scan through lists of certain characters which
are not always linear in order (thus multiple scans are needed).

With ASCII this appears to be significantly easier given that uppercase
range from 0x41-0x5a and lowercase from 0x61-0x7a.  There's
significantly less "stuff" to do in this situation.

His statement, despite vague/no technical reference details, does make
sense to me.

I should also state (I forget if I did already) that the delays seen
weren't actually "in" read(2) -- truss -d shows the amount of time that
passes between syscalls.  The delays I was seeing were *between* read(2)
calls, which acts as a further indicator that some code internal to grep
(or libc) was spinning/churning much more heavily when a locale was
used.

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |



More information about the freebsd-stable mailing list