Regex character and collation class documentation
mfv
mfv at bway.net
Mon Nov 13 19:35:41 UTC 2017
> On Mon, 2017-11-13 at 09:09 "James B. Byrne via freebsd-questions"
> <freebsd-questions at freebsd.org> wrote:
>
>On Sat, November 11, 2017 10:45, mfv wrote:
>
>> As a result I did some more digging and discovered that the valid
>> names for [[.<name>.]] are contained in /usr/src/lib/libc/regex
>> /cname.h. The names in "man ascii" are a subset of cname.h.
>>
>> It also explains why [[.SP.]] generates an error message. Even
>> though SP is listed in "man ascii" it is not specified in cname.h.
>>
>> Cheers ...
>>
>> Marek
>>
>
>A file named cname.h does not even exist on my system. At least if it
>does then find does not report it. On the other hand, this file:
>
>/usr/local/include/nstring.h
>
>contains this:
>
>/* The standard C library routines isdigit(), for some weird
> historical reason, does not take a character (type 'char') as its
> argument. Instead it takes an integer. When the integer is a whole
> number, it represents a character in the obvious way using the local
> character set encoding. When the integer is negative, the results
> are undefined.
>
> Passing a character to isdigit(), which expects an integer,
> results in isdigit() sometimes getting a negative number.
>
> On some systems, when the integer is negative, it represents exactly
> the character you want it to anyway (e.g. -1 is the character that
> is encoded 0xFF). But on others, it does not.
>
> (The same is true of other routines like isdigit()).
>
> Therefore, we have the substitutes for isdigit() etc. that take an
> actual character (type 'char') as an argument.
>*/
>
>#define ISALNUM(C) (isalnum((unsigned char)(C)))
>#define ISALPHA(C) (isalpha((unsigned char)(C)))
>#define ISCNTRL(C) (iscntrl((unsigned char)(C)))
>#define ISDIGIT(C) (isdigit((unsigned char)(C)))
>#define ISGRAPH(C) (isgraph((unsigned char)(C)))
>#define ISLOWER(C) (islower((unsigned char)(C)))
>#define ISPRINT(C) (isprint((unsigned char)(C)))
>#define ISPUNCT(C) (ispunct((unsigned char)(C)))
>#define ISSPACE(C) (isspace((unsigned char)(C)))
>#define ISUPPER(C) (isupper((unsigned char)(C)))
>#define ISXDIGIT(C) (isxdigit((unsigned char)(C)))
>#define TOUPPER(C) ((char)toupper((unsigned char)(C)))
>
>But nowhere can I find 'isnul' or ISNUL'.
>
>
>
Hello James,
Do you have /usr/src on your system? All the directories
under /usr/src are the source code used to build FreeBSD on one's own
computer.
If not, here is a link to the GIT repository where the source code
for /usr/src/lib/libc/regex/cname.h can be seen:
https://github.com/freebsd/freebsd/blob/master/lib/libc/regex/cname.h
All names listed on the left can be used in sed to match the character
to the right. For example, /[[.asterisk.]]{3}/ matches ***.
Some of the characters have two names. For example, the octal control
character '\007' is represented by 'BEL' as well as 'alert'.
I do not know the purpose of /usr/local/include/nstring.h. As such I
can not shed any light on that particular file.
Cheers ...
Marek
More information about the freebsd-questions
mailing list