printf behaviour with illegal or malformed format string

Bruce Evans bde at zeta.org.au
Mon Dec 12 23:53:23 PST 2005


On Mon, 12 Dec 2005, Max Laier wrote:

> On Monday 12 December 2005 13:14, Poul-Henning Kamp wrote:
>> Obligatory bikeshed avoidance notice:
>>>> Please read all the way to the bottom of this email before you reply <<
>>
>> Given that illegal or malformed format strings to the printf family
>> of functions mostly result in harmless misformatting but occationally
>> in coredumps and maybe some times in security issues, what is the
>> desired behaviour of libc's printf implementation ?

It is to emit nasel daemons.  Failing that, a core dump is best.

>> A very good example is
>>
>> 	printf("%hf", 1.0);
>>
>> The 'h' modifier is not legal for %f format, and it is therefore a good
>> bet that the programmer was confused and we know that the program
>> contains at least one error.
>>
>> Our first line of defence against this kind of error is compile-time
>> checking by GCC, but we cannot rely on the string being sane in libc,
>> we still need to do error checking.

There is also fmtcheck(3).

>> The context for the above is that I'm working on adding extensibility
>> to our printf, compatible with the GLIBC (see 12.13 in the glibc
>> manual).  Obviously, gcc cannot compile-time check such extensions
>> for us, and therefore the question gains a bit more relevance.
>>
>> In an ideal world, the printf family of functions would have been
>> defined to return EINVAL in this case.  Almost nobody checks the
>> return values of printf-like functions however and those few that
>> do, all pressume that it is an I/O error so such an approach is
>> unlikely to gain us much if anything.

A core dump is best for these reasons.  Also, no one uses fmtcheck(3),
and there there is no way for standard-conforming programs to check
for errors caused by undefined behaviour, since undefined behaviour
includes undefined errors.

I think most checking belongs in the compiler and in fmtcheck(3).
printf() itself cannot detect most types errors without compiler
support that would basically involve passing it the types of all
args so that it could call fmtcheck(3).  Extensions should rarely
be needed for printf(), and it is not unreasonable to expect that
applications to use the extensions to be more careful.  Extensions
might be more needed for printf-like interfaces that aren't really
printf.

>> Another alternative is to spit out the format string unformatted,
>> possibly with an attached notice, but this doesn't really seem to
>> help anybody either, but at least indicates what the problem is.
>>
>> I'm leaning towards doing what phkmalloc has migrated to over time:
>> Make a variable which can select between "normal/paranoia" and force
>> it to paranoia for (uid==0 || gid==0 || setuid || setgid).
>>
>> If the variable is set, a bogus format string will result in abort(2).

This sometimes breaks defined behaviour.

>> If it is not set, the format string will be output unformatted in
>> the message "WARNING: Illegal printf() format string: \"...\".

malloc()'s messages are better ("<progname>: error ...").

> I agree on principle but would like to ask if we need to revisit some of the
> error cases.  Especially with regard to 64bit porting there are some
> "artifacts" that might cause serious pain for ported applications if the
> above is adopted.
>
> Specifically, right now the following will warn "long long int format, int64_t
> arg (arg 2)" on our 64bit architectures while it is required on - at least -
> i386
>
> 	int64_t i = 1;
> 	printf("%lld", i);

Warning is the correct behaviour for this.  Code like this isn't required
on any arch, and is just wrong on arches where long long is longer than
int64_t.  Fortunately we have some arches where int64_t has a different
type than long long, so code like this causes a warning so it doesn't
get written so often.  rms added the warning to gcc 15-18 years ago after
I complained about the corresponding bad code for ints and long:

 	int i = 1;
 	printf("%ld", i);

This is just wrong on arches where long is longer than int.

> Many other platforms allow it for 64bit architectures as well.  As for all our
> 64bit architectures sizeof(long) == sizeof(long long) (as far as I am aware),
> I am not convinced this should be a (fatal) error.  There might be other
> similar cases.

int64_t = long != long long (although all the sizes are the same) is an
artifact.  We use the artificat mainly to detect errors so that many printf
formats don't need to be "fixed" yet again for 128-bit machines.

> So the question is, how strict should this check be?  Are there cases where we
> are better off with a "just do it"-sollution?

Just doing it is the correct method.  It requires the compiler to rewrite
the string (or replace the call to printf by calls to special formatting
functions).  E.g.,

 	int64_t i64 = 1;
 	long long ll = 1;
 	int i = 1;
 	printf("%I %I %I\n", i64, ll, i);

should produce the same code as:

 	int64_t i64 = 1;
 	long long ll = 1;
 	int i = 1;
 	printf("%" PRIi64 "%ld %d\n", i64, ll, i);

Message catalogs should probably use somthing like fmtcheck() to rewrite
the strings.

> As a community service, there is a right way to do this (according to C99):
>
> 	int64_t i = 1;
> 	printf("%" PRIi64 "\n", i);

This is a very wrong way to do this.  It makes the programmer hande details
that the compiler can handle better.

>
> but it's obvious this is not going to be adopted.  The other often used

Fortunately it is so ugly that no one uses it.

> workaround is:
>
> 	int64_t i = 1;
> 	printf("%jd\n", (intmax_t)i);

This is not a workaround, but the only reasonable way to do things (unless
%I exists).  It is easier to use than PRI*, and doesn't need changing if
the type of i is expanded.

> or:
>
> 	printf("%lld\n", (long long)i);

This is a wrong way to do this.  It uses the long long mistake, and only
works because i has type int64_t and long long is longer than int64_t
(the latter is standard).

> which kind of reverts the idea behind useing C99-types.
>
> Note that:
>
> 	printf("%jd\n, i);
>
> seems to work as well, but I not sure this is correct.

It is incorrect.  It assumes that int64_t has the same _size_ as intmax_t,
and would cause a compile time error if int64_t doesn't have the same
_type_ as intmax_t.

We use intmax_t == int64_t on all arches, but this is an even weirder
artifact than int64_t != long long, since it makes the "maximum" type
"smaller" (same in size, but logically shorter) than long long:

     char < short < int < long = int64_t = intmax_t ~< long long

Bruce


More information about the freebsd-arch mailing list