Re: find(1): I18N gone wild ?
- Reply: Yuri : "Re: find(1): I18N gone wild ?"
- In reply to: Mark Millard : "Re: find(1): I18N gone wild ?"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 21 Apr 2023 18:03:30 UTC
Mark Millard wrote: > Dimitry Andric <dim_at_FreeBSD.org> wrote on > Date: Fri, 21 Apr 2023 10:38:05 UTC : > >> On 21 Apr 2023, at 12:01, Ronald Klop <ronald-lists@klop.ws> wrote: >>> Van: Poul-Henning Kamp <phk@phk.freebsd.dk> >>> Datum: maandag, 17 april 2023 23:06 >>> Aan: current@freebsd.org >>> Onderwerp: find(1): I18N gone wild ? >>> This surprised me: >>> >>> # mkdir /tmp/P >>> # cd /tmp/P >>> # touch FOO >>> # touch bar >>> # env LANG=C.UTF-8 find . -name '[A-Z]*' -print >>> ./FOO >>> # env LANG=en_US.UTF-8 find . -name '[A-Z]*' -print >>> ./FOO >>> ./bar >>> >>> Really ?! >> ... >>> My Mac and a Linux server only give ./FOO in both cases. Just a 2 cents remark. >> >> Same here. However, I have read that with unicode, you should *never* >> use [A-Z] or [0-9], but character classes instead. That seems to give >> both files on macOS and Linux with [[:alpha:]]: >> >> $ LANG=en_US.UTF-8 find . -name '[[:alpha:]]*' -print >> ./BAR >> ./foo >> >> and only the lowercase file with [[:lower:]]: >> >> $ LANG=en_US.UTF-8 find . -name '[[:lower:]]*' -print >> ./foo >> >> But on FreeBSD, these don't work at all: >> >> $ LANG=en_US.UTF-8 find . -name '[[:alpha:]]*' -print >> <nothing> >> >> $ LANG=en_US.UTF-8 find . -name '[[:lower:]]*' -print >> <nothing> >> >> This is an interesting rabbit hole... :) > > FreeBSD: > > -name pattern > True if the last component of the pathname being examined matches > pattern. Special shell pattern matching characters (“[”, “]”, > “*”, and “?”) may be used as part of pattern. These characters > may be matched explicitly by escaping them with a backslash > (“\”). > > I conclude that [[:alpha:]] and [[:lower:]] were not > considered "Special shell pattern"s. "man glob" > indicates it is a shell specific builtin. > > macOS says similarly. Different shells, different > pattern notations and capabilities? Well, "man bash" > reports: [snip] > Seems like: pick your shell (as shown by echo $SHELL) and > that picks the pattern match rules used. (May be controllable > in the specific shell.) No, the pattern is not passed to shell and shell used should not matter (pattern should be properly escaped). The rules are here: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13 ...which in turn refers to the following link for bracket expressions: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05 Why we don't support all of that is different story.