Mentor for C self study wanted

Tue Oct 23 15:58:33 PDT 2007

Am Dienstag, 23. Oktober 2007 23:24:09 schrieb Harald Schmalzbauer:
> #include <stdio.h>
>
> void main()
> {
>   short nnote;
>
>   // Numerischen Notenwert einlesen
>   printf("Bitte numerischen Schulnotenwert eingeben: ");
>   scanf("%d",&nnote);

man 3 scanf (most important thing to look at with any such problem is the 
C-library documentation, which is excellent on FreeBSD) says that for "%d" 
the passed pointer has to be a pointer to "integer", which &nnote is not. 
&nnote is a pointer to short, which points to 2 bytes, whereas a pointer to 
integer is a pointer to 4 bytes of storage.

Generally, nnote is reserved by the compiler on the stack (as it's a local 
variable) with two bytes (but this depends on your platform), and &nnote 
points to the beginning of this area.

As you are probably running on a little-endian architecture, the layout that 
scanf presumes is (from low to high):

---> increasing addresses
lsbyte 2 3 msbyte
^
|-- &nnote points here

of which only the first two are interpreted as nnote by the rest of the 
program; the upper two are different stack content (probably a return address 
to the C initialization code calling main(), or a pushed stack pointer, or 
such, as your procedure defines no other locals, see below).

Now, when scanf assigns the four bytes, it'll properly enter the lower two 
bytes of the integer into "lsbyte 2" (which is nnote, in the same byte 
order), but overwrite two bytes that are above it.

When main() finishes, the (now broken) saved address (of which "3 msbyte" is 
the lower half) is popped, which leads to the SIGSEGV you're seeing.

In case you were on big-endian, the result would be different (i.e., the order 
would be reversed, so that nnote would always be zero or minus one in case 
you entered small integral values in terms of absolute value), but 
effectively, the return address would be overwritten as well, breaking it.

This is effectively what can be called a buffer-overflow.

Just to finish this: the proper format would be "%hd", for which the flag "h" 
signifies that the pointer is a pointer to a "short int", also documented in 
man 3 scanf.

Why aren't you seeing this behaviour with printf (i.e., why can you pass a 
short but still specify "%d")? Because C defines that functions that take a 
variable number of arguments (of which printf is one such) get each argument 
as type "long" (the type that's at least as big as a pointer on the current 
platform), so when passing a short as argument to a var-args function, the 
C-compiler inserts code which makes sure that the value is promoted to a long 
in the argument stack for printf. scanf is also a varargs function, but 
you're not passing the value of nnote, but rather a pointer to it, which 
(should) already be as wide as a long.

Finally, looking at (parts of) the assembly that gcc generates (on a 
little-endian i386 machine):

.globl main
        .type   main, @function
main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp

; Set up the pointer to the local frame (EBP on i386). All locals are
; relative to EBP in a function.
        movl    %esp, %ebp

; ECX is the first (hidden) local.
        pushl   %ecx

        subl    $20, %esp
        subl    $12, %esp
        pushl   $.LC0
        call    printf
        addl    $16, %esp
        subl    $8, %esp

; Load the effective address of EBP-6, i.e., nnote, into EAX, which
; is pushed for scanf. scanf will thus write its output on EBP-6 up to
; EBP-3, where EBP-4 and EBP-3 are part of the value that's been
; pushed in the "pushl %ecx" above.
        leal    -6(%ebp), %eax

        pushl   %eax
        pushl   $.LC1
        call    scanf

...

; Restore the value at EBP-4 (i.e., the ECX that was pushed above) into
; ECX at function exit. This value has been corrupted by the integer
; assignment due to scanf.
        movl    -4(%ebp), %ecx

        leave

; Restore the stack pointer from the (invalidated) %ecx, i.e. produce a
; bogus stack pointer.
        leal    -4(%ecx), %esp

        ret

This produces a segfault, after the return to the C initialization code, 
simply because the stack pointer is totally bogus.

> P.S.:
> I found that declaring nnote as int soleves my problem, but I couldnÄt
> understand why.

Everything clear now? ;-)

-- 
Heiko Wundram
Product & Application Development
-------------------------------------
Office Germany - EXPO PARK HANNOVER

Beenic Networks GmbH
Mailänder Straße 2
30539 Hannover

Fon        +49 511 / 590 935 - 15
Fax        +49 511 / 590 935 - 29
Mobil      +49 172 / 437 3 734
Mail       wundram at beenic.net

Beenic Networks GmbH
-------------------------------------
Sitz der Gesellschaft: Hannover
Geschäftsführer: Jorge Delgado
Registernummer: HRB 61869
Registergericht: Amtsgericht Hannover