Regex character and collation calss documentation

mfv mfv at bway.net
Thu Nov 9 21:37:03 UTC 2017


> On Wed, 2017-11-08 at 12:47 "James B. Byrne via freebsd-questions"
> <freebsd-questions at freebsd.org> wrote:
>
>I have been perusing the available documentation respecting regex on
>FreeBSD and cannot find a reference to [.NUL.]. Everything that I have
>found points to ctype.h. The only class names I can find therein are:
>
>int     isalnum(int);   [:alnum:]
>int     isalpha(int);   [:alpha:]
>int     iscntrl(int);   [:cntrl:]
>int     isdigit(int);   [:digit:]
>int     isgraph(int);   [:graph:]
>int     islower(int);   [:lower:]
>int     isprint(int);   [:print:]
>int     ispunct(int);   [:punct:]
>int     isspace(int);   [:space:]
>int     isupper(int);   [:upper:]
>int     isxdigit(int);  [:xdigit:]
>
>From reading the reference at
>https://docs.freebsd.org/info/regex/regex.pdf and comparing it to the
>uncommented lines in ctype.h on my FreeBSD-11.1 desktop host one could
>reasonably deduce that the following should be available on FreeBSD in
>addition to the above:
>
>int     isascii(int);   [:ascii:]
>
>int     isblank(int);   [:blank:]
>
>int     ishexnumber(int); [:hexnumber:]
>int     isideogram(int);  [:ideogram:]
>int     isnumber(int);    [:number:]
>int     isphonogram(int); [:phonogram:]
>int     isrune(int);      [:rune:]
>int     isspecial(int);   [:special:]
>
>But of these only [[:blank:]] is recognized by grep; whatever else
>might employ the rest.
>
>[[:ascii:]]
>grep: Invalid character class name
>[[:hexnumber:]]
>grep: Invalid character class name
>[[:ideogram:]]
>grep: Invalid character class name
>[[:number:]]
>grep: Invalid character class name
>[[:phonogram:]]
>grep: Invalid character class name
>[[:rune:]]
>grep: Invalid character class name
>[[:special:]]
>grep: Invalid character class name
>
>
>However I see no reference to [.NUL.] anywhere.  The sed man page has
>no reference to nul or NUL at all and tr only has this to say:
>
>     The tr utility has historically not permitted the manipulation
>     of NUL bytes in its input and, additionally, stripped NUL's from
>     its input stream.  This implementation has removed this behavior
>     as a bug.
>
>
>Is there a master list of character/collation classes for FreeBSD
>regex?  I have read the man pages for grep and re_format.  In no case
>is the character or collation class NUL mentioned.
>
>Where is the usage of [.NUL.] documented?
>

Hello James,

This may help you with a bit of hacking.

I asked myself the same question but could not find a satisfactory
answer.  After remembering that "man ascii" has names for all
non-printable ASCII characters, I placed some of these characters in a
text file and then removed the same characters using their name.

Thus:
 - the character ^@ was removed using [[.NUL.]]
 - the character ^G was removed using [[.BEL.]]
 - the character ^F was removed using [[.ACK.]]
 - etc,

I did not try all non-printable characters but a large sampling
followed this pattern.  Trying to use SP for a space produced the
following error:

sed: 1: "/[[.SP.]]/d": RE error: invalid collating element

Perhaps there are other exceptions similar to SP.

This syntax also recognises printable characters as well.  For example
the character 'A' was removed using 's/[[.A.]]//g'.

I would have preferred some formal documentation on this matter but
like yourself am still searching.

Cheers ...

Marek


More information about the freebsd-questions mailing list