Regex character and collation class documentation

mfv mfv at bway.net
Sat Nov 11 15:45:53 UTC 2017


> On Fri, 2017-11-10 at 08:59 "James B. Byrne via freebsd-questions"
> <freebsd-questions at freebsd.org> wrote:
>
>On Thu, November 9, 2017 16:36, mfv wrote:
>>> On Wed, 2017-11-08 at 12:47 "James B. Byrne via freebsd-questions"
>>>However I see no reference to [.NUL.] anywhere.  The sed man page has
>>>no reference to nul or NUL at all and tr only has this to say:
>>>
>>>     The tr utility has historically not permitted the manipulation
>>>     of NUL bytes in its input and, additionally, stripped NUL's from
>>>     its input stream.  This implementation has removed this behavior
>>>     as a bug.
>>>
>>>
>>>Is there a master list of character/collation classes for FreeBSD
>>>regex?  I have read the man pages for grep and re_format.  In no case
>>>is the character or collation class NUL mentioned.
>>>
>>>Where is the usage of [.NUL.] documented?
>>>  
>>
>> Hello James,
>>
>> This may help you with a bit of hacking.
>>
>> I asked myself the same question but could not find a satisfactory
>> answer.  After remembering that "man ascii" has names for all
>> non-printable ASCII characters, I placed some of these characters in
>> a text file and then removed the same characters using their name.
>>
>> Thus:
>>  - the character ^@ was removed using [[.NUL.]]
>>  - the character ^G was removed using [[.BEL.]]
>>  - the character ^F was removed using [[.ACK.]]
>>  - etc,
>>
>> I did not try all non-printable characters but a large sampling
>> followed this pattern.  Trying to use SP for a space produced the
>> following error:
>>
>> sed: 1: "/[[.SP.]]/d": RE error: invalid collating element
>>
>> Perhaps there are other exceptions similar to SP.
>>
>> This syntax also recognises printable characters as well.  For
>> example the character 'A' was removed using 's/[[.A.]]//g'.
>>
>> I would have preferred some formal documentation on this matter but
>> like yourself am still searching.
>>
>> Cheers ...
>>
>> Marek
>>
>>  
>
>Thank you.  I discovered that a [.<symbol>.] collation reference
>pertains to the active LOCALE setting as defined by LC_ALL. At least
>so I find in the documentation I have read.  But I would not have
>thought to look in man ascii for the answer to my question.
>
>

Hello James,

Thanks for this information.

As a result I did some more digging and discovered that the valid names
for [[.<name>.]] are contained in /usr/src/lib/libc/regex/cname.h.  The
names in "man ascii" are a subset of cname.h.

It also explains why [[.SP.]] generates an error message.  Even though
SP is listed in "man ascii" it is not specified in cname.h.

Cheers ...

Marek


More information about the freebsd-questions mailing list