Regex character and collation calss documentation
mfv
mfv at bway.net
Thu Nov 9 21:37:03 UTC 2017
> On Wed, 2017-11-08 at 12:47 "James B. Byrne via freebsd-questions"
> <freebsd-questions at freebsd.org> wrote:
>
>I have been perusing the available documentation respecting regex on
>FreeBSD and cannot find a reference to [.NUL.]. Everything that I have
>found points to ctype.h. The only class names I can find therein are:
>
>int isalnum(int); [:alnum:]
>int isalpha(int); [:alpha:]
>int iscntrl(int); [:cntrl:]
>int isdigit(int); [:digit:]
>int isgraph(int); [:graph:]
>int islower(int); [:lower:]
>int isprint(int); [:print:]
>int ispunct(int); [:punct:]
>int isspace(int); [:space:]
>int isupper(int); [:upper:]
>int isxdigit(int); [:xdigit:]
>
>From reading the reference at
>https://docs.freebsd.org/info/regex/regex.pdf and comparing it to the
>uncommented lines in ctype.h on my FreeBSD-11.1 desktop host one could
>reasonably deduce that the following should be available on FreeBSD in
>addition to the above:
>
>int isascii(int); [:ascii:]
>
>int isblank(int); [:blank:]
>
>int ishexnumber(int); [:hexnumber:]
>int isideogram(int); [:ideogram:]
>int isnumber(int); [:number:]
>int isphonogram(int); [:phonogram:]
>int isrune(int); [:rune:]
>int isspecial(int); [:special:]
>
>But of these only [[:blank:]] is recognized by grep; whatever else
>might employ the rest.
>
>[[:ascii:]]
>grep: Invalid character class name
>[[:hexnumber:]]
>grep: Invalid character class name
>[[:ideogram:]]
>grep: Invalid character class name
>[[:number:]]
>grep: Invalid character class name
>[[:phonogram:]]
>grep: Invalid character class name
>[[:rune:]]
>grep: Invalid character class name
>[[:special:]]
>grep: Invalid character class name
>
>
>However I see no reference to [.NUL.] anywhere. The sed man page has
>no reference to nul or NUL at all and tr only has this to say:
>
> The tr utility has historically not permitted the manipulation
> of NUL bytes in its input and, additionally, stripped NUL's from
> its input stream. This implementation has removed this behavior
> as a bug.
>
>
>Is there a master list of character/collation classes for FreeBSD
>regex? I have read the man pages for grep and re_format. In no case
>is the character or collation class NUL mentioned.
>
>Where is the usage of [.NUL.] documented?
>
Hello James,
This may help you with a bit of hacking.
I asked myself the same question but could not find a satisfactory
answer. After remembering that "man ascii" has names for all
non-printable ASCII characters, I placed some of these characters in a
text file and then removed the same characters using their name.
Thus:
- the character ^@ was removed using [[.NUL.]]
- the character ^G was removed using [[.BEL.]]
- the character ^F was removed using [[.ACK.]]
- etc,
I did not try all non-printable characters but a large sampling
followed this pattern. Trying to use SP for a space produced the
following error:
sed: 1: "/[[.SP.]]/d": RE error: invalid collating element
Perhaps there are other exceptions similar to SP.
This syntax also recognises printable characters as well. For example
the character 'A' was removed using 's/[[.A.]]//g'.
I would have preferred some formal documentation on this matter but
like yourself am still searching.
Cheers ...
Marek
More information about the freebsd-questions
mailing list