Anyone object to the following change in libc?
Harti Brandt
brandt at fokus.fraunhofer.de
Thu Oct 30 03:36:57 PST 2003
On Thu, 30 Oct 2003, Terry Lambert wrote:
TL>Harti Brandt wrote:
TL>> TL>Paragraph 6 of:
TL>> TL>
TL>> TL> http://www.opengroup.org/onlinepubs/007904975/functions/sscanf.html
TL>> TL>
TL>> TL>Implies that the lack of characters in the string following the
TL>> TL>conversion, due to failure in assignment, should result in an
TL>> TL>"Input failure". Note also that stdio.h defines EOF as -1.
TL>>
TL>> I fail to locate this paragraph. This interpretation would also imply
TL>> that scanf() always needs to return -1 whenever it cannot match a format
TL>> specifier.
TL>
TL> The fscanf() functions shall execute each directive of the
TL> format in turn. If a directive fails, as detailed below, the
TL> function shall return. Failures are described as input
TL> failures (due to the unavailability of input bytes) or
TL> matching failures (due to inappropriate input).
TL>
TL>It comes down to how you interpret the NUL byte at the end of the
TL>sscanf() input string. Is it an EOF? Or is it an unavailability of
TL>input bytes? The answer to the question picks which return value
TL>is correct.
Section 7.19.6.7 of N843 states:
"Reaching the end of the string is equivalent to encountering end-of-file
for the fscanf function."
Unfortunately this is missing in POSIX, but obviously implied by their
reference to ISO.
The next paragraph states:
"The sscanf function returns the value of the macro EOF if an input
failure occurs before any conversion."
Again: do we have a conversion? We have! Should we return EOF? No.
TL>
TL>
TL>> TL>I think it can be interpreted either way, still.
TL>>
TL>> You miss the section about RETURN VALUE: EOF is return on a read error.
TL>> This is not an input error.
TL>
TL>How do I distinguish a "return value is -1 as an error result" from
TL>"return value is -1 as an EOF result"?
Well, I suppose that's the intention of having scanf() setting errno
when it returns -1 in POSIX. Unfortunately POSIX fails to describe
the error codes. This is possibly fodder for the aardvark.
TL>
TL>
TL>> You should also read the very 1st paragraph. This clearly states, that
TL>> ISO is the primary source of information and the ISO text is a lot
TL>> cleaner.
TL>
TL>No, that's not what it actually states; here's the paragraph:
TL>
TL> The functionality described on this reference page is
TL> aligned with the ISO C standard. Any conflict between
TL> the requirements described here and the ISO C standard
TL> is unintentional. This volume of IEEE Std 1003.1-2001
TL> defers to the ISO C standard.
TL>
TL>It says that any conflicts are unintentional, and their intent was
TL>to use different language for no good reason, rather than just
TL>copying it verbatim and removing any doubt. It does *NOT* say
TL>that no conflicts exist.
Yes. But I take the last sentence to mean that ISO-C takes over in the
case a conflict exists.
TL>
TL>Also: In this context, which is IEEE 1003.1-2001, Issue 6, "the
TL>ISO C standard" refers to "c89", which is the version of the C
TL>standard that was in effect at the time that SVID IV was defined.
Line 107 of Austin TC-1:
"The c89 utility (which specified a compiler for the C Language specified
by the 108 ISO/IEC 9899: 1990 standard) has been replaced by a c99 utility
(which specifies a compiler for 109 the C Language specified by the
ISO/IEC 9899: 1999 standard)."
TL>If you need clarification on this issue, you should download the
TL>currently available version of the NIST/PCTS, which specifically
TL>requires you to compile with a c89 compiler, not one more recent.
TL>The same is true of The Open Group test suites which are available
TL>on the Internet.
TL>
TL>The version of the ISO C standard you are quoting from is *NOT*
TL>the c89 version.
Our sscanf() claims conformance to C99. So if we change the behaviour
we have to remove this claim.
TL>This makes interpretation ambiguous, since the test you are
TL>specifically referencing to get the 0 result is text that was
TL>added to the next version of the standard to clarify it.
TL>
TL>
TL>> I think it makes no sense to classify
TL>>
TL>> sscanf("123", "%*d%d", ...
TL>>
TL>> as an error, but
TL>>
TL>> sscanf("123", "%d%d", ...
TL>>
TL>> not, does it? Also at least Solaris 9 return -1 but fails to set
TL>> errno. Which is simply a bug.
TL>
TL>It makes no sense to do conversions without assignment in the
TL>first place (IMO).
[... Stuff about sense removed (I was talking about what return
code makes sense, not whether calling sscanf makes sense) ...]
TL>In any case, we are practically guaranteed that returning -1, as
TL>all other UNIX-like OS's currently do, would result in less source
TL>code breaking.
No coder in his right mind should have written code that depends
on this behaviour given the moot formulations in the classical books,
man pages and pre-C99 standards. Also note, that the reason for
this change request was that configuration scripts break, not applications.
If applications break they should be fixed.
TL>In other words, conformance level has historically been dictated
TL>by what code is not broken, not what is technically permitted by
TL>the standards, if you language-lawyer them to death.
TL>
TL>To put it in IETF terms: "Be conservative in what you generate,
TL>and generous in what you accept".
This does not apply here because you cannot return -1 and 0 at the same
time. Adhering to a cleanly written standard and breaking a handful of
badly written autoconf scripts is clearly better than adhering to
undocumented historical behaviour. What will we do if Solaris 10
returns 0 in the above case? Change our code back?
harti
--
harti brandt,
http://www.fokus.fraunhofer.de/research/cc/cats/employees/hartmut.brandt/private
brandt at fokus.fraunhofer.de, harti at freebsd.org
More information about the freebsd-current
mailing list