Regex character and collation class documentation

mfv mfv at bway.net
Mon Nov 13 19:35:41 UTC 2017


> On Mon, 2017-11-13 at 09:09 "James B. Byrne via freebsd-questions"
> <freebsd-questions at freebsd.org> wrote:
>
>On Sat, November 11, 2017 10:45, mfv wrote:
>
>> As a result I did some more digging and discovered that the valid
>> names for [[.<name>.]] are contained in /usr/src/lib/libc/regex
>> /cname.h.  The names in "man ascii" are a subset of cname.h.
>>
>> It also explains why [[.SP.]] generates an error message.  Even
>> though SP is listed in "man ascii" it is not specified in cname.h.
>>
>> Cheers ...
>>
>> Marek
>>  
>
>A file named cname.h does not even exist on my system.  At least if it
>does then find does not report it.  On the other hand, this file:
>
>/usr/local/include/nstring.h
>
>contains this:
>
>/* The standard C library routines isdigit(), for some weird
>   historical reason, does not take a character (type 'char') as its
>   argument.  Instead it takes an integer.  When the integer is a whole
>   number, it represents a character in the obvious way using the local
>   character set encoding.  When the integer is negative, the results
>   are undefined.
>
>   Passing a character to isdigit(), which expects an integer,
>   results in isdigit() sometimes getting a negative number.
>
>   On some systems, when the integer is negative, it represents exactly
>   the character you want it to anyway (e.g. -1 is the character that
>   is encoded 0xFF).  But on others, it does not.
>
>   (The same is true of other routines like isdigit()).
>
>   Therefore, we have the substitutes for isdigit() etc. that take an
>   actual character (type 'char') as an argument.
>*/
>
>#define ISALNUM(C) (isalnum((unsigned char)(C)))
>#define ISALPHA(C) (isalpha((unsigned char)(C)))
>#define ISCNTRL(C) (iscntrl((unsigned char)(C)))
>#define ISDIGIT(C) (isdigit((unsigned char)(C)))
>#define ISGRAPH(C) (isgraph((unsigned char)(C)))
>#define ISLOWER(C) (islower((unsigned char)(C)))
>#define ISPRINT(C) (isprint((unsigned char)(C)))
>#define ISPUNCT(C) (ispunct((unsigned char)(C)))
>#define ISSPACE(C) (isspace((unsigned char)(C)))
>#define ISUPPER(C) (isupper((unsigned char)(C)))
>#define ISXDIGIT(C) (isxdigit((unsigned char)(C)))
>#define TOUPPER(C) ((char)toupper((unsigned char)(C)))
>
>But nowhere can I find 'isnul' or ISNUL'.
>
>
>

Hello James,

Do you have /usr/src on your system?  All the directories
under /usr/src are the source code used to build FreeBSD on one's own
computer.

If not, here is a link to the GIT repository where the source code
for /usr/src/lib/libc/regex/cname.h can be seen:

 https://github.com/freebsd/freebsd/blob/master/lib/libc/regex/cname.h

All names listed on the left can be used in sed to match the character
to the right.  For example, /[[.asterisk.]]{3}/ matches ***.

Some of the characters have two names.  For example, the octal control
character '\007' is represented by 'BEL' as well as 'alert'.

I do not know the purpose of /usr/local/include/nstring.h.  As such I
can not shed any light on that particular file.

Cheers ...

Marek


More information about the freebsd-questions mailing list