svn commit: r267675 - head/lib/libc/regex

Bruce Evans brde at optusnet.com.au
Fri Jun 20 22:23:40 UTC 2014


> El 6/20/2014 1:23 PM, Stefan Farfeleder escribió:
>> On Fri, Jun 20, 2014 at 03:29:10PM +0000, Pedro F. Giffuni wrote:
>>> ...
>>> Log:
>>>    regex: Make use of reallocf().
>>> 
>>>    Use of reallocf is useful in libraries as we are not certain the
>>>    application will exit after NULL.
>>> ...
>>>    This somewhat reduces portability but if since you are building
>>>    this as part of libc it is likely you have our non-standard
>>>    reallocf(3) already.

It also somewhat increases namespace pollution.

>>> Modified: head/lib/libc/regex/regcomp.c
>>> ==============================================================================
>>> --- head/lib/libc/regex/regcomp.c	Fri Jun 20 13:26:49 2014 
>>> (r267674)
>>> +++ head/lib/libc/regex/regcomp.c	Fri Jun 20 15:29:09 2014 
>>> (r267675)
>>> @@ -1111,7 +1111,7 @@ allocset(struct parse *p)
>>>   {
>>>   	cset *cs, *ncs;
>>> 
>>> -	ncs = realloc(p->g->sets, (p->g->ncsets + 1) * sizeof(*ncs));
>>> +	ncs = reallocf(p->g->sets, (p->g->ncsets + 1) * sizeof(*ncs));
>>>   	if (ncs == NULL) {
>>>   		SETERROR(REG_ESPACE);
>>>   		return (NULL);
>>> @@ -1174,7 +1174,7 @@ CHadd(struct parse *p, cset *cs, wint_t
>>>   	if (ch < NC)
>>>   		cs->bmp[ch >> 3] |= 1 << (ch & 7);
>>>   	else {
>>> -		newwides = realloc(cs->wides, (cs->nwides + 1) *
>>> +		newwides = reallocf(cs->wides, (cs->nwides + 1) *
>>>   		    sizeof(*cs->wides));
>>>   		if (newwides == NULL) {
>>>   			SETERROR(REG_ESPACE);
>> 
>> I don't think these changes are OK. If reallocf() fails here, the
>> cs->wides pointer will be freed and later freeset() will call
>> free(cs->wides), probably crashing. The other cases are most probably
>> similar though I haven't examined them closely.
>> 
>
> OK ...
>
> I don't think there is any problem:
>
> If reallocf fails, newwides will be set to NULL and if free() is called it 
> doesn't do anything when the argument is NULL.
>
> Also freeset() is meant to be called to "free a now-unused set" and it is not 
> called within the library. I would think using a value when the allocation 
> has failed is a much more serious issue than attempting to fail to free it 
> after trying to use it. ;-).

It doesn't look OK to me.  It turns the careful use of newwides, etc.,
into garbage, and leaves a garbage pointer in cs->wides, etc., to cause
problems later.  It might work without this careful use of a temporary
variable, but even that is not clear.  Consider:

- start with a consistent data structure with some pointers to malloced
   storage in it; foo->p say
- simple reallocf() allocation strategy:
     foo->p = reallocf(foo->p, size)
     if (foo->p == NULL)
       return (ERROR);
   This leaves the data structure consistent if and only if the only thing
   in it relevant to p is p itself, with foo->p serving as a flag for
   validity.
- more complicated reallocf() allocation strategy:
     foo->p = reallocf(foo->p, size)
     if (foo->p == NULL) {
       foo->psize = 0;
       foo->pvalid = 0;
       foo->foovalid = 0;
       ...
       return (ERROR);
     }
   This still can't do things like cleaning up what foo->p points to, since
   reallocf() clobbered that.
This shows that reallocf() is only useful in simple cases.  In general, you
must keep the pointer valid to clean things up:
     newp = reallocf(foo->p, size)
     if (newp == NULL) {
       for (i = 0; i < foo->psize; i++)
 	free(foo->p[i]);
       foo->p = NULL;
       foo->psize = 0;
       foo->pvalid = 0;
       foo->foovalid = 0;
       ...
       return (ERROR);
     }

Of course, malloc never fails so all this error checking is more a source
of complexity and bugs than useful.

Re-quoting/recovering some context:

@ Modified: head/lib/libc/regex/regcomp.c
@ ==============================================================================
@ --- head/lib/libc/regex/regcomp.c	Fri Jun 20 13:26:49 2014	(r267674)
@ +++ head/lib/libc/regex/regcomp.c	Fri Jun 20 15:29:09 2014	(r267675)
@ @@ -1111,7 +1111,7 @@ allocset(struct parse *p)
@  {
@  	cset *cs, *ncs;
@ 
@ -	ncs = realloc(p->g->sets, (p->g->ncsets + 1) * sizeof(*ncs));
@ +	ncs = reallocf(p->g->sets, (p->g->ncsets + 1) * sizeof(*ncs));
@  	if (ncs == NULL) {
@  		SETERROR(REG_ESPACE);
@  		return (NULL);

This code shows that even Henry didn't know how to program memory allocation
in 1988.  The code was much worse in 4.4BSD-Lite2:

old@ 		if (p->g->sets == NULL)
old@ 			p->g->sets = (cset *)malloc(nc * sizeof(cset));
old@ 		else
old@ 			p->g->sets = (cset *)realloc((char *)p->g->sets,
old@ 							nc * sizeof(cset));

This just leaked memory.  It was improved by assigning to ncs and by
removing the pre-C90 and C++ casts.  Now it is unimproved by turning the
careful use of ncs into garbage and leaving garbage in *p.

old@ 		if (p->g->setbits == NULL)
old@ 			p->g->setbits = (uch *)malloc(nbytes);
old@ 		else {
old@ 			p->g->setbits = (uch *)realloc((char *)p->g->setbits,
old@ 								nbytes);
old@ 			/* xxx this isn't right if setbits is now NULL */
old@ 			for (i = 0; i < no; i++)
old@ 				p->g->sets[i].ptr = p->g->setbits + css*(i/CHAR_BIT);
old@ 		}

Honest memory mismanagement.  It just assumes that malloc() and realloc()
can't fail.  In -current, the function is much simpler and doesn't have
this code.  I think part of the simplication is to use realloc() of NULL
instead of malloc() to start up.

@ @@ -1174,7 +1174,7 @@ CHadd(struct parse *p, cset *cs, wint_t 
@  	if (ch < NC)
@  		cs->bmp[ch >> 3] |= 1 << (ch & 7);
@  	else {
@ -		newwides = realloc(cs->wides, (cs->nwides + 1) *
@ +		newwides = reallocf(cs->wides, (cs->nwides + 1) *
@  		    sizeof(*cs->wides));
@  		if (newwides == NULL) {
@  			SETERROR(REG_ESPACE);
@ @@ -1203,7 +1203,7 @@ CHaddrange(struct parse *p, cset *cs, wi
@  		CHadd(p, cs, min);
@  	if (min >= max)
@  		return;
@ -	newranges = realloc(cs->ranges, (cs->nranges + 1) *
@ +	newranges = reallocf(cs->ranges, (cs->nranges + 1) *
@  	    sizeof(*cs->ranges));
@  	if (newranges == NULL) {
@  		SETERROR(REG_ESPACE);
@ @@ -1227,7 +1227,7 @@ CHaddtype(struct parse *p, cset *cs, wct
@  	for (i = 0; i < NC; i++)
@  		if (iswctype(i, wct))
@  			CHadd(p, cs, i);
@ -	newtypes = realloc(cs->types, (cs->ntypes + 1) *
@ +	newtypes = reallocf(cs->types, (cs->ntypes + 1) *
@  	    sizeof(*cs->types));
@  	if (newtypes == NULL) {
@  		SETERROR(REG_ESPACE);
@ @@ -1350,7 +1350,7 @@ enlarge(struct parse *p, sopno size)
@  	if (p->ssize >= size)
@  		return 1;
@ 
@ -	sp = (sop *)realloc(p->strip, size*sizeof(sop));
@ +	sp = (sop *)reallocf(p->strip, size*sizeof(sop));
@  	if (sp == NULL) {
@  		SETERROR(REG_ESPACE);
@  		return 0;

Similarly in 4 more cases, except most of them weren't in old versions.

@ @@ -1368,7 +1368,7 @@ static void
@  stripsnug(struct parse *p, struct re_guts *g)
@  {
@  	g->nstates = p->slen;
@ -	g->strip = (sop *)realloc((char *)p->strip, p->slen * sizeof(sop));
@ +	g->strip = (sop *)reallocf((char *)p->strip, p->slen * sizeof(sop));
@  	if (g->strip == NULL) {
@  		SETERROR(REG_ESPACE);
@  		g->strip = p->strip;

This was the only realloc() that had not been cleaned up, so using
reallocf() has a chance of improving it.  It still has the pre-C90 and
C++ casts.  But there is an obvious new bug visible without reading more
than the patch context:
- the local variable might not be needed now, since the variable assigned
   to, g->strip, is different from the variable realloced, p->strip
- on realloc failure, p->strip becomes invalid
- the error handling is completely broken.  It points g->strip at the old
   (now deallocated) storage.  This is immediate use of the garbage pointer
   created by using reallocf().

Clearly, only realloc() code of the form "p = realloc(p, size);", where the
pointer assigned to is the same as the pointer realloced, can be improved
by blindly converting it to use reallocf().

The last function is most interesting.  It seems to be to reduce the size.
So an allocation failure is even less likely than usually, except lots of
small allocations and reallocations are more likely to waste space than
save it.  But if failure occurs it is almost harmless, and the error
handling of keeping using the previous allocation was good.  Maybe Henry
knew how to program memory allocation after all.  (The last function
survived previously-unchanged from 4.4BSDLite-2 except for removing K&R
support and 'register'.)

Bruce


More information about the svn-src-all mailing list