[Bug 224160] [patch] wc -c is slow

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Fri Dec 8 14:34:53 UTC 2017


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224160

Conrad Meyer <cem at freebsd.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |patch
             Status|New                         |In Progress
            Summary|wc -c is slow               |[patch] wc -c is slow
           Assignee|freebsd-bugs at FreeBSD.org    |cem at freebsd.org

--- Comment #2 from Conrad Meyer <cem at freebsd.org> ---
wc(1) uses a stack buffer of size MAXBSIZE, or 64kB.  Increasing this may help
(move it to the heap).

Secondly, there is an optimization for counting lines, and that same
optimization counts characters, but it is not used if wc is only asked to count
characters!  Silly.  It's also not used if wc is asked to count stdin!  Stupid.

Just fixing stdin + character count optimization gives much better results,
comparable to GNU wc:

 2097152000
~/obj/usr/home/conrad/src/freebsd/amd64.amd64/usr.bin/wc/wc -c  0.01s user
0.43s system 45% cpu 0.964 total

Bumping the buffer size to 4 MB yields big improvement in system time.  (Note
that the dd size was increased 10x.)

Before:
 20971520000
~/obj/usr/home/conrad/src/freebsd/amd64.amd64/usr.bin/wc/wc -c  0.14s user
3.99s system 42% cpu 9.653 total
After:
 20971520000
~/obj/usr/home/conrad/src/freebsd/amd64.amd64/usr.bin/wc/wc -c  0.12s user
1.90s system 40% cpu 4.954 total

GNU wc is actually worse:
20971520000
gwc -c  0.21s user 2.91s system 48% cpu 6.490 total


Here is the PoC patch (whitespace changes elided (-w) for legibility).  Note
that it leaks memory.  4 MB may be totally inappropriate for small devices,
too.

--- a/usr.bin/wc/wc.c
+++ b/usr.bin/wc/wc.c
@@ -199,15 +199,17 @@ cnt(const char *file)
        size_t clen;
        short gotsp;
        u_char *p;
-       u_char buf[MAXBSIZE];
+       u_char *buf;
        wchar_t wch;
        mbstate_t mbs;

+#define MY_BUF_SIZE (4 * 1024 * 1024)
+       buf = malloc(MY_BUF_SIZE);
+
        linect = wordct = charct = llct = tmpll = 0;
        if (file == NULL)
                fd = STDIN_FILENO;
-       else {
-               if ((fd = open(file, O_RDONLY, 0)) < 0) {
+       else if ((fd = open(file, O_RDONLY, 0)) < 0) {
                xo_warn("%s: open", file);
                return (1);
        }
@@ -218,8 +220,8 @@ cnt(const char *file)
         * lines than to get words, since the word count requires some
         * logic.
         */
-               if (doline) {
-                       while ((len = read(fd, buf, MAXBSIZE))) {
+       if (doline || dochar) {
+               while ((len = read(fd, buf, MY_BUF_SIZE))) {
                        if (len == -1) {
                                xo_warn("%s: read", file);
                                (void)close(fd);
@@ -230,6 +232,7 @@ cnt(const char *file)
                                    llct);
                        }
                        charct += len;
+                       if (doline) {
                                for (p = buf; len--; ++p)
                                        if (*p == '\n') {
                                                if (tmpll > llct)
@@ -239,7 +242,9 @@ cnt(const char *file)
                                        } else
                                                tmpll++;
                        }
+               }
                reset_siginfo();
+               if (doline)
                        tlinect += linect;
                if (dochar)
                        tcharct += charct;
@@ -270,13 +275,12 @@ cnt(const char *file)
                        return (0);
                }
        }
-       }

        /* Do it the hard way... */
 word:  gotsp = 1;
        warned = 0;
        memset(&mbs, 0, sizeof(mbs));
-       while ((len = read(fd, buf, MAXBSIZE)) != 0) {
+       while ((len = read(fd, buf, MY_BUF_SIZE)) != 0) {
                if (len == -1) {
                        xo_warn("%s: read", file != NULL ? file : "stdin");
                        (void)close(fd);

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list