svn commit: r225561 - user/gabor/tre-integration/lib/libc/regex

Ben Kaduk minimarmot at gmail.com
Wed Sep 14 21:35:41 UTC 2011


On 9/14/11, Gabor Kovesdan <gabor at freebsd.org> wrote:
> Author: gabor
> Date: Wed Sep 14 21:08:02 2011
> New Revision: 225561
> URL: http://svn.freebsd.org/changeset/base/225561
>
> Modified: user/gabor/tre-integration/lib/libc/regex/regex.3
> ==============================================================================
> --- user/gabor/tre-integration/lib/libc/regex/regex.3	Wed Sep 14 20:13:10
> 2011	(r225560)
> +++ user/gabor/tre-integration/lib/libc/regex/regex.3	Wed Sep 14 21:08:02
> 2011	(r225561)
> @@ -62,24 +96,57 @@
>  .Ft void
>  .Fn regfree "regex_t *preg"
>  .Sh DESCRIPTION
> -These routines implement
> +These routines implement pattern matchinf of

"matching"

>  .St -p1003.2
> -regular expressions
> -.Pq Do RE Dc Ns s ;
> -see
> -.Xr re_format 7 .
> +regular expressions.
> +The
> +.Xr re_format 7
> +manual can be consulted for the syntax and use of these.

s/the syntax and use of these/their syntax and usage/ is probably clearer.

> +.Pp
>  The
>  .Fn regcomp
>  function
> -compiles an RE written as a string into an internal form,
> +compiles a regular expression written as a string into an internal form.
> +The
> +.Fn regncomp
> +function works in the very same way,
> +but takes another argument to specify the length of the pattern.
> +This function can accept patterns with NUL bytes inside because.

"can accept patterns that include NUL bytes." is probably enough.
(The trailing "because" is very odd in technical writing.)

> +The
> +.Fn regwcomp
> +and
> +.Fn regwncomp
> +functions work like the two former ones but take the pattern in
> +the wide string form.
> +.Pp
> +The
>  .Fn regexec
> -matches that internal form against a string and reports results,
> -.Fn regerror
> -transforms error codes from either into human-readable messages,
> +function matches that internal form against a string and reports results.
> +The
> +.Fn regnexec
> +function works in the same way but takes another argument to specify
> +the length of the pattern,
> +allowing NUL bytes in the input string.
> +Besides,

I would probably s/Besides/Additionally/

> +for long inputs strings it is more efficient to call this function if
> +the length is already known beause it will not require the matcher to
> +calculate the length and read the input bytes one by one.
> +The
> +.Fn regwexec
>  and
> +.Fn regwnexec
> +functions work like the two former ones but take the input as a
> +wide string.
> +.Pp
> +The
> +.Fn regerror
> +function transforms error codes from the above functions into
> +human-readable messages.
> +.Pp
> +The
>  .Fn regfree
> -frees any dynamically-allocated storage used by the internal form
> -of an RE.
> +function frees any dynamically-allocated storage used by the internal form
> +of a regular expression.
>  .Pp
>  The header
>  .In regex.h
> @@ -127,31 +193,26 @@ to improve readability.
>  .It Dv REG_NOSPEC
>  Compile with recognition of all special characters turned off.
>  All characters are thus considered ordinary,
> -so the
> -.Dq RE
> -is a literal string.
> -This is an extension,
> -compatible with but not specified by
> -.St -p1003.2 ,
> -and should be used with
> -caution in software intended to be portable to other systems.
> -.Dv REG_EXTENDED
> -and
> +so the reqular expression is a literal string.
> +.It Dv REG_LITERAL
> +Synonim for

"Synonym"

> +.Dv REG_NOSPEC.
> +.It Dv REG_EXTENDED
> +may not be used together with
>  .Dv REG_NOSPEC
> -may not be used
> +or
> +.Dv REG_LITERAL
>  in the same call to
>  .Fn regcomp .
>  .It Dv REG_ICASE
>  Compile for matching that ignores upper/lower case distinctions.
> -See
> -.Xr re_format 7 .
>  .It Dv REG_NOSUB
>  Compile for matching that need only report success or failure,
>  not what was matched.
>  .It Dv REG_NEWLINE
>  Compile for newline-sensitive matching.
>  By default, newline is a completely ordinary character with no special
> -meaning in either REs or strings.
> +meaning in either regular expressins or strings.
>  With this flag,
>  .Ql [^
>  bracket expressions and
> @@ -170,66 +231,79 @@ The regular expression ends,
>  not at the first NUL,
>  but just before the character pointed to by the
>  .Va re_endp
> +or
> +.Va re_wendp
>  member of the structure pointed to by
>  .Fa preg .
> +The former is used for the functions that take a single- or multi-byte
> +string,
> +while the second is used for those taking a wide string.
>  The
>  .Va re_endp
>  member is of type
> -.Ft "const char *" .
> -This flag permits inclusion of NULs in the RE;
> +.Ft "const char *"
> +and the
> +.Va re_wendp
> +member is of type
> +.Ft "const wchar_t *" .
> +This flag permits inclusion of NULs in the regular expression;
>  they are considered ordinary characters.
> -This is an extension,
> -compatible with but not specified by
> -.St -p1003.2 ,
> -and should be used with
> -caution in software intended to be portable to other systems.
>  .El
>  .Pp
>  When successful,
> +the
>  .Fn regcomp
> -returns 0 and fills in the structure pointed to by
> +family of functions returns
> +.Dv REG_OK
> +and fills in the structure pointed to by
>  .Fa preg .
> -One member of that structure
> -(other than
> -.Va re_endp )
> -is publicized:
> +The
>  .Va re_nsub ,
> -of type
> +member of the structure of type
>  .Ft size_t ,
> -contains the number of parenthesized subexpressions within the RE
> -(except that the value of this member is undefined if the
> +contains the number of parenthesized subexpressions within the regular
> +expression (except when the
>  .Dv REG_NOSUB
> -flag was used).
> +flag was used for the compilation of the pattern).
>  If
>  .Fn regcomp
>  fails, it returns a non-zero error code;
>  see
> -.Sx DIAGNOSTICS .
> +.Sx RETURN VALUES .
>  .Pp
>  The
>  .Fn regexec
> -function
> -matches the compiled RE pointed to by
> +family of functions match the compiled regular expression  pointed to by
>  .Fa preg
>  against the
> -.Fa string ,
> +.Fa string
> +(possibly having a length of
> +.Fa len
> +when using the variants that take the input length),
>  subject to the flags in
>  .Fa eflags ,
> -and reports results using
> +and reports match through its return value.

This is not quite grammatically correct.  From just a cursory reading
of the surrounding text, I'm not sure if it should be "a match" or
"matches", though.

> +The
>  .Fa nmatch ,
>  .Fa pmatch ,
> -and the returned value.
> -The RE must have been compiled by a previous invocation of
> -.Fn regcomp .

I think the commas need to disappear and an 'and' between the
arguments be added?

> +arguments are also filled in to hold submatches unless the pattern was
> +compiled using the
> +.Dv REG_NOSUB
> +falg.

"flag"

> +The regular expression  must have been compiled by a previous invocation of

There's an extra space here.

> +.Fn regcomp
> +or any of its alternative forms.
>  The compiled form is not altered during execution of
> -.Fn regexec ,
> -so a single compiled RE can be used simultaneously by multiple threads.
> +.Fn regexec
> +or its alternatives,
> +so a single compiled regular expression can be used simultaneously by
> +multiple threads,
> +and it can be used with any variant of the
> +.Fn regexec
> +functions.
> +(I.e. a multi-byte pattern can be matched to wide string input and
> +vice versa.)
>  .Pp
> -By default,
> -the NUL-terminated string pointed to by
> -.Fa string
> -is considered to be the text of an entire line, minus any terminating
> -newline.
>  The
>  .Fa eflags
>  argument is the bitwise OR of zero or more of the following flags:
> @@ -278,22 +347,17 @@ does not imply
>  .Dv REG_STARTEND
>  affects only the location of the string,
>  not how it is matched.
> -.El
>  .Pp
> +The function indicates a match by returning
> +.Dv REG_OK ,
> +no match with
> +.Dv REG_NOMATCH ,
> +or returns an error code different from the above two values
> +if an error has occured during the execution.
>  See
> -.Xr re_format 7
> -for a discussion of what is matched in situations where an RE or a
> -portion thereof could match any of several substrings of
> -.Fa string .
> -.Pp
> -Normally,
> -.Fn regexec
> -returns 0 for success and the non-zero code
> -.Dv REG_NOMATCH
> -for failure.
> -Other non-zero error codes may be returned in exceptional situations;
> -see
> -.Sx DIAGNOSTICS .
> +.Sx RETURN VALUES
> +for the detailed description of error codes.

s/the/a/ would be slightly more correct.

> +.El
>  .Pp
>  If
>  .Dv REG_NOSUB
[...]
> -REs are anchors, not ordinary characters.
> -.Sh DIAGNOSTICS
> -Non-zero error codes from
> +thus all of them are thread-safe.
> +.Sh RETURN VALUES
> +Non-zero error codes from the
>  .Fn regcomp
>  and
>  .Fn regexec
> +family of functions
>  include the following:
>  .Pp
>  .Bl -tag -width REG_ECOLLATE -compact
> +.It Dv REG_OK
> +Operation successfully executed.
> +Synonim for 0,

"Synonym"

> +to provide better code readability.
>  .It Dv REG_NOMATCH
>  The
>  .Fn regexec
> -function
> -failed to match
> +functions

I think the singular "function" may actually still be right, here,
since a single function is returning REG_NOMATCH at a time.

> +failed to match.
>  .It Dv REG_BADPAT
> -invalid regular expression
> +Invalid regular expression.
> +This implementation only returns this code when the regular expression
> +passed to
> +.Fn regcomp
> +contains an illegal multibyte sequence.
>  .It Dv REG_ECOLLATE
> -invalid collating element
> +Invalid collating element.
> +Returned whenever equivalence classes or multicharacter collating elements
> +are used in a bracket expression.
> +.Pq They are not supported yet.
>  .It Dv REG_ECTYPE
> -invalid character class
> +Invalid character class name.
>  .It Dv REG_EESCAPE
> -.Ql \e
> -applied to unescapable character
> +The last character was a backslash.
>  .It Dv REG_ESUBREG
> -invalid backreference number
> +Invalid backreference number.
>  .It Dv REG_EBRACK
> -brackets
> +Brackets
>  .Ql "[ ]"
> -not balanced
> +not balanced.

I might do "are not balanced", to have a verb in the sentence.  It
would need to happen in all the following, too, though.

Thanks for updating the man page!

-Ben Kaduk


>  .It Dv REG_EPAREN
> -parentheses
> +Parentheses
>  .Ql "( )"
> -not balanced
> +not balanced.
>  .It Dv REG_EBRACE
> -braces
> +Braces
>  .Ql "{ }"
> -not balanced
> +not balanced.
>  .It Dv REG_BADBR
> -invalid repetition count(s) in
> -.Ql "{ }"
> +Invalid repetition count(s) in
> +.Ql "{ }" :
> +not a number, more than two numbers, first larger than second, or number
> too large.
>  .It Dv REG_ERANGE
> -invalid character range in
> -.Ql "[ ]"
> +Invalid character range in
> +.Ql "[ ]" ,
> +i.e. ending point is earlier in the collating order than the starting
> point.
>  .It Dv REG_ESPACE
> -ran out of memory
> +Out of memory.
>  .It Dv REG_BADRPT
> -.Ql ?\& ,
> -.Ql *\& ,
> -or
> -.Ql +\&
> -operand invalid
> -.It Dv REG_EMPTY
> -empty (sub)expression
> -.It Dv REG_ASSERT
> -cannot happen - you found a bug
> -.It Dv REG_INVARG
> -invalid argument, e.g.\& negative-length string
> -.It Dv REG_ILLSEQ
> -illegal byte sequence (bad multibyte character)
> +Invalid use of repetition operators: two or more repetition operators have
> been
> +chained in an undefined way.
>  .El
>  .Sh SEE ALSO
>  .Xr grep 1 ,


More information about the svn-src-user mailing list