is strlen()'s read-4-bytes-ahead a standard?

Fri Jul 16 09:27:47 UTC 2010

deeptech71 at gmail.com <deeptech71 at gmail.com> wrote:
 > Xin LI wrote:
 > > On 2010/07/15 15:38, deeptech71 at gmail.com wrote:
 > > > Some C implementations use the read-4-bytes-ahead technique to speed
 > > > up strlen(). Does the C standard state anything about strlen() being
 > > > allowed to read past the terminating zero?
 > > 
 > > It's not 4-bytes-ahead, but read a whole (aligned) word at one time.
 > > 
 > > I think C standard does not dictate in this detail.
 > 
 > OK, can anyone confirm this?

When Xin LI states it, it doesn't need confirmation.  ;-)

You can look up for yourself, it's in section 7.21.6.3
(page 333) of ISO/IEC 9899:1999 a.k.a. "C99".
It only states that "The strlen function computes the length
of the string" and "The strlen function returns the number
of characters that precede the terminating null character".
Nothing more.

 > > But why?
 > 
 > Just wondering.

There's no reason not to read the string as aligned words.
Because they're aligned, there's no risk to accidentally
hit the next VM page after the end of the string.

On the other hand, I don't think it is clear that doing
this for strlen() would be a performance win in every
situation.

BTW, some languages (and also some string libraries for C)
store the length separately for every string, so you don't
have to iterate through the whole string to get its length.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

I suggested holding a "Python Object Oriented Programming Seminar",
but the acronym was unpopular.
        -- Joseph Strout