bin/127912: In theory, uncompress(1) may crash and SEGV

Tue Oct 7 09:20:02 UTC 2008

>Number:         127912
>Category:       bin
>Synopsis:       In theory, uncompress(1) may crash and SEGV
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Oct 07 09:20:01 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     David Jones
>Release:        
>Organization:
Ravenbrook Limited
>Environment:
>Description:
I have found a problem with the uncompress utility, but not by running it, but by examining the source code.  There seems to be a possibility that uncompress will SEGV if calloc places an allocated block in a very unfortunate place.

Follows is a backgrounder of the source code, then the problem.
Byte is assumed to be 8-bits in the following discussion.

uncompress is a standard Unix utility.  On BSD systems the command line utility is a wrapper around the zopen compressed file IO interface.  The core code is in zopen.c.

That code is available here:
http://www.freebsd.org/cgi/cvsweb.cgi/src/usr.bin/compress/zopen.c?rev=1.12
I use that version for reference when mentioning line numbers.

We will be examining the routines responsible for compressed input.  These are zread on line 453 and getcode on line 568.  The input consists of a series of codes.  Each code is a number stored in a certain number of bits.  Initially codes are 9 bits wide, but as the table used for uncompressing grows the codes get wider too.  The variable "n_bits" specifies the number of bits used by codes at the current point in the input file.  It is between 9 and 16.  In the disk file codes are packed into bytes, the getcode routine is responsible for unpacking them, effectively converting bytes to codes.

getcode proceeds by reading into the buffer gbuf the next n_bits bytes of the input (see line 592).  Note that this buffer contains exactly 8 codes of length n_bits.  Near the beginning of getcode (after the buffer has been refilled if necessary), at line 599, the variable roffset contains the number of bits of buffer already consumed by previous calls to getcode; it is necessarily a multiple of n_bits up to 7*n_bits.  Starting at line 600 and throughout the rest of the routine the variable bits contains the number of bits yet to be consumed in constructing the next code.  A code is between 9 and 16 bits wide, it therefore requires either 2 or 3 bytes to be read to extract a code.

getcode uses the variable gcode to construct the code in either 2 or 3 parts.  The bytes from the buffer gbuf are read on lines 607, 613, 619.  The read on line 613 is conditional on there being at least 8 more bits of code that need to be read.  The 3 reads are as follows: first, read sufficient bits to take us up to the next byte boundary (between 1 and 8); second, if there are at least 8 more bits required, read the entire next byte; third, read the remaining bits (between 0 and 7).  Note the third read on line 619, it occurs even when bits is 0 and no more bits are required; in this case the gcode variable does not change (because rmask[0] is 0), but a byte is read nonetheless.

The problem

Consider the case when n_bits is 16; this is its maximum permitted value and we would expect it to be achieved for any sufficiently large file.  In this case the codes in the gbuf buffer are always byte aligned (roffset is a multiple of 16 and hence a multiple of 8).  Line 607 will read 8 bits and advance the bp pointer; line 613 will read 8 bits (because there are at least 8 bits needed) and advance the bp pointer; line 619 will read the byte at the bp pointer but throw it away.  Nonetheless the byte at the bp pointer is read.  Where does this pointer point?  For the first 7 codes read from the gbuf buffer it will simply point at the first byte of the next code.  When reading the last code from the gbuf buffer, bp will point one past the end of the buffer.  Reading one past the end of the buffer is undefined behaviour in C; however, we all know that what will happen is that simply the next byte will read and its value ignored.

Now it just so happens that this buffer, gbuf, is at the end of an object of type struct s_zstate, see lines 101 through to 143 (the buffer is declared on line 140, gbuf is aliased to it via a macro defined on line 177).  Such an object is allocated from dynamic memory on line 696 with a call to calloc.  If calloc places its returned memory block at the high-end of a page and with no mapped memory on the next page in memory then reading on past the end of the buffer will SEGV.

Conclusion

On some systems in some circumstances most invocations of compress will SEGV.  It is necessary that a particular block of memory obtained from calloc be placed so that the next byte of memory is unmapped; for example, at the highest address possible on a page with the following page left unmapped.

>How-To-Repeat:
Cannot cause it to fail once, never mind repeatedly. ;)

Is this a problem?  Is it exploitable?  I have no idea.
>Fix:
Prefix line 619 with "if(bits)"

>Release-Note:
>Audit-Trail:
>Unformatted: