bin/106734: SSE2 optimization for bzip2/libbz2
Mikhail T.
mi at aldan.algebra.com
Thu Dec 14 13:53:48 PST 2006
>Number: 106734
>Category: bin
>Synopsis: SSE2 optimization for bzip2/libbz2
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: change-request
>Submitter-Id: current-users
>Arrival-Date: Thu Dec 14 21:50:06 GMT 2006
>Closed-Date:
>Last-Modified:
>Originator: Mikhail T.
>Release: FreeBSD 6.2-PRERELEASE amd64
>Organization:
Virtual Estates, Inc.
>Environment:
Intel's and AMD chips with SSE2 instructions.
>Description:
The patch below makes bzip2's blocksort routines use SSE2-registers
to compare 16 bytes at a time.
On both i386 and AMD chips I tested, the performance improvement
ranges from 5% for the already compressed (.gz) files to 20% for
the highly compressible system logs.
The compressed files are byte-for-byte identical with those produced
by the original bzip2.
The changes are ifdef-ed by __SSE2__ and relies on the intrinsics
available in GNU, Intel's, and Microsoft's compilers.
No changes to Makefile(s) are necessary -- when targeting an
SSE2-capable CPU (i.e. ``-march=opteron'' or ``-march=pentium4''),
the __SSE2__ is set by the compiler.
>How-To-Repeat:
>Fix:
The patch is available from http://aldan.algebra.com/~mi/bz/
The patch is not FreeBSD-specific, but was developed, tested, and timed
on FreeBSD-6.x using both i386 and amd64.
Feedback welcome.
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list