misc/66941: Unacceptable stringstream performance

Jonathan Wakely jwakely at mintel.com
Thu May 20 06:40:19 PDT 2004


>Number:         66941
>Category:       misc
>Synopsis:       Unacceptable stringstream performance
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu May 20 06:40:19 PDT 2004
>Closed-Date:
>Last-Modified:
>Originator:     Jonathan Wakely
>Release:        4.9
>Organization:
Mintel International
>Environment:
FreeBSD cartman.mintel.co.uk 4.9-STABLE-20040212-SESNAP FreeBSD 4.9-STABLE-20040212-SESNAP #1: Mon Feb 16 10:48:30 GMT 2004     jason at cartman.mintel.co.uk:/usr/obj/usr/src/sys/CARTMAN  i386

>Description:
      The /usr/include/g++/sstream header provided with FreeBSD's GCC 2.95.4 performs very badly. Every single character written to the stream causes the buffer to allocate one extra character, copy the existing buffer contents, and append the new character. Once the buffer gets large the overhead of reallocating and copying for every single character becomes enormous.

The C++ standard requires that the appends happen in amortised constant time. This implies the buffer should grow exponentially so that the overhead of reallocating+copying happens less frequently as the buffer grows.

The performance makes it impractical to use <sstream> for any sizable chunk of data, forcing you to use the unsafe <strstream> instead.

>How-To-Repeat:
      Testcase:

#include <iostream>
#include <iomanip>
#include <sstream>
#include <strstream>
#include <time.h>

template <typename SStreamT>
    clock_t
    test(unsigned count)
    {
        SStreamT s;
        const clock_t start = ::clock();
        for (unsigned i = 0; i < count; ++i)
        {
            s << ' ';
        }
        return ::clock() - start;
    }

int main()
{
    using namespace std;

    const unsigned count[] = {10000, 100000, 1000000};
    cout << setw(18) << "iterations"
        << setw(18) << count[0]
        << setw(18) << count[1]
        << setw(18) << count[2] << endl
        << setw(18) << "strstream"
        << setw(18) << test<strstream>(count[0])
        << setw(18) << test<strstream>(count[1])
        << setw(18) << test<strstream>(count[2]) << endl
        << setw(18) << "stringstream"
        << setw(18) << test<stringstream>(count[0])
        << setw(18) << test<stringstream>(count[1])
        << setw(18) << test<stringstream>(count[2]) << endl;
}

Running this on an unloaded 4-way Xeon gives:

        iterations             10000            100000           1000000
         strstream                 0                 1                 3
      stringstream                 2               503            129648

i.e. it takes roughly 1000s to write 1000000 characters to a buffer!

>Fix:
            I've been patching the file on all our development servers for months, without problems (except when OS upgrades overwrite the file with broken versions again). The patched version grows the buffer exponentially, separately tracking the unused capacity and only reallocating when that spare capacity is exhausted.

With my patch the above testcase produces:
Patched on same system:
        iterations             10000            100000           1000000
         strstream                 0                 0                 4
      stringstream                 0                 2                15

I'll attach the patch to this PR
>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list