svn commit: r265367 - head/lib/libc/regex

Mon May 5 22:12:17 UTC 2014

On 5 May 2014, at 22:51, Andrey Chernov <ache at freebsd.org> wrote:

> For standard malloc/realloc interface it is up to the caller to check
> n*size not overflows. You must trust caller already does such check.

Do a search of the CVE database sometime to see how well placed that trust generally is.  Or even look at the code in question, where none of the realloc() or malloc() calls does overflow checking.

> Using calloc() to enforce it instead of caller is semantically wrong,

Relying on a standard function to behave according to the standard is semantically wrong?

> and especially strange when the caller is standard C library under your
> control.

I don't follow this.  If libc can't rely on standards conformance from itself then other code stands no chance.

> It was unclear what type of ckecking you mean initially

You mean when I said 'the overflow-checking behaviour of calloc'?  I'm sorry, but I'm not sure how I could have made that clearer.

> and confirm my
> statement that such code is hard to understand.

I disagree.  Favouring calloc() over malloc() unless profiling indicates that calloc() is a bottleneck has been recommended practice for a *very* long time and I'm honestly surprised to encounter C programmers who have not come across the advice.  

> Even if it is for
> arithmetic overflow, it is still semantically incorrect, see my other
> answer.

Your other answer did not say *why* you think it's 'semantically incorrect'.  The standard requires calloc() to do overflow checking and that is the reason for its use in the overwhelming number of cases.

> Main purpose of calloc is to zero memory, not to check its
> argument, so its argument checking is side effect. It should be
> implemented by the caller (as I already answer) and not by the price of
> zeroing.

It is unfortunate that the zeroing and the overflow checking were conflated in the standard, but that certainly doesn't mean that it is the only purpose of calloc.  

If you want to argue that the price of zeroing is too high, then I would like to see some profiling data to back it up.  Between the cost of performing an allocation and the cost of doing a regex search, I'd be surprised if the cost of a bzero() were not in the noise.  To offset this, you'd be increasing i-cache usage at every malloc() call site by wrapping it in an overflow check (if you want the code to be *correct* as well as fast), which is likely to be a bigger hit.  

The reason that calloc() does zeroing in the first place rather than just having malloc() followed by memset() / bzero() is that the memory that malloc() gets from the kernel is already zero'd, and so the 'price' for the zeroing is often nothing.

David

P.S. A quick look at Coverity shows 4 other bugs in this file, one of which looks like it might actually be serious.