Regex character and collation class documentation
mfv at bway.net
Sat Nov 11 15:45:53 UTC 2017
> On Fri, 2017-11-10 at 08:59 "James B. Byrne via freebsd-questions"
> <freebsd-questions at freebsd.org> wrote:
>On Thu, November 9, 2017 16:36, mfv wrote:
>>> On Wed, 2017-11-08 at 12:47 "James B. Byrne via freebsd-questions"
>>>However I see no reference to [.NUL.] anywhere. The sed man page has
>>>no reference to nul or NUL at all and tr only has this to say:
>>> The tr utility has historically not permitted the manipulation
>>> of NUL bytes in its input and, additionally, stripped NUL's from
>>> its input stream. This implementation has removed this behavior
>>> as a bug.
>>>Is there a master list of character/collation classes for FreeBSD
>>>regex? I have read the man pages for grep and re_format. In no case
>>>is the character or collation class NUL mentioned.
>>>Where is the usage of [.NUL.] documented?
>> Hello James,
>> This may help you with a bit of hacking.
>> I asked myself the same question but could not find a satisfactory
>> answer. After remembering that "man ascii" has names for all
>> non-printable ASCII characters, I placed some of these characters in
>> a text file and then removed the same characters using their name.
>> - the character ^@ was removed using [[.NUL.]]
>> - the character ^G was removed using [[.BEL.]]
>> - the character ^F was removed using [[.ACK.]]
>> - etc,
>> I did not try all non-printable characters but a large sampling
>> followed this pattern. Trying to use SP for a space produced the
>> following error:
>> sed: 1: "/[[.SP.]]/d": RE error: invalid collating element
>> Perhaps there are other exceptions similar to SP.
>> This syntax also recognises printable characters as well. For
>> example the character 'A' was removed using 's/[[.A.]]//g'.
>> I would have preferred some formal documentation on this matter but
>> like yourself am still searching.
>> Cheers ...
>Thank you. I discovered that a [.<symbol>.] collation reference
>pertains to the active LOCALE setting as defined by LC_ALL. At least
>so I find in the documentation I have read. But I would not have
>thought to look in man ascii for the answer to my question.
Thanks for this information.
As a result I did some more digging and discovered that the valid names
for [[.<name>.]] are contained in /usr/src/lib/libc/regex/cname.h. The
names in "man ascii" are a subset of cname.h.
It also explains why [[.SP.]] generates an error message. Even though
SP is listed in "man ascii" it is not specified in cname.h.
More information about the freebsd-questions