C99: Suggestions for style(9)

Tue Apr 28 12:32:03 UTC 2009

I like the part about using as many variables as possible because
of documentation and performance enhancements. I tend to like
the other changes as well..

On Sun, Apr 26, 2009 at 09:02:36AM +0200, Christoph Mallon wrote:
> Hi hackers@,
> 
> as some of you may have noticed, several years ago a new millenium 
> started and a decade ago there was a new C standard. HEAD recently 
> switched to C99 as default (actually gnu99, but that's rather close). So 
> it's finally time to re-examine wether style(9) needs to be accomodated.
> The diff with the proposed changes is attached. Below are the changes 
> with some further explanations. They are in the order as they appear in 
> style(9).
> Maybe using all of these changes is a bit too big change at once, but 
> I'd like some of them applied to style(9). So, please consider the 
> proposed changes individually and not a as a all-or-nothing package.
> 
> >-Do not put declarations
> >-inside blocks unless the routine is unusually complicated.
> >+Prefer declaring loop iterators in the for-statement to limit their scope.
> > .Bd -literal
> >-	for (; cnt < 15; cnt++) {
> >+	for (int cnt = 0; cnt < 15; cnt++) {
> [...]
> >-When declaring variables in functions declare them sorted by size,
> >-then in alphabetical order; multiple ones per line are okay.
> >+When declaring variables in functions declare them sorted in alphabetical 
> >order;
> >+prefer one declaration per line, especially when pointer declarators or 
> >+initializations are involved.
> > If a line overflows reuse the type keyword.
> > .Pp
> >-Be careful to not obfuscate the code by initializing variables in
> >-the declarations.
> >-Use this feature only thoughtfully.
> >-DO NOT use function calls in initializers.
> >+Prefer initializing variables right at their declaration.
> >+When the value is not supposed to change in the function, make the 
> >variable
> >+const.
> >+This makes it easier for a reader to identify the where the value of a 
> >variable
> >+originates.
> >+Do not reuse the same variable in a different context, declare a new 
> >variable.
> > .Bd -literal
> >-	struct foo one, *two;
> >-	double three;
> >-	int *four, five;
> >-	char *six, seven, eight, nine, ten, eleven, twelve;
> >+	struct foo *bar;
> >+	struct foo baz = { 42, "qux" };
> > 
> >-	four = myfunction();
> >+	FILE *const f = fopen("name", "r");
> >+	if (f == NULL)
> >+		err("Failed to open file");
> >+	/* We can safely assume that f is not changed anymore, even if the
> >+	 * function is a hundred lines long */
> 
> This change is to reduce the scope of local variables. For reasons why 
> this does not cost anything in terms of stack space, please see the last 
> change, which adds explanations about local variables.
> Smaller scopes and distinct variables for distinct contexts make it 
> easier for readers of the code to identify the def-use-chains of 
> variables, because there are less places with assignments to a variable 
> and the scope is smaller. Also, when a variable is initialised right at 
> its declaration and is not supposed to change, you can emphasize this by 
> making it const.
> I looked at older discussions regarding this topic and identified 
> several concerns, which were raised. I'll address them below:
> 
> - It does not add anything to the language.
> Well, yes, just like if, do, for, goto, the comma operator, compound 
> assignment (+=, ...), ++/-- and many other things. All you need to be 
> Turing complete is lambda calculus - there hardly can be less syntax. 
> But, like all mentioned constructs, it makes the code more concise.
> 
> - It leads to sloppy code. You could reuse another variable or should 
> think again whether you really need this variable.
> Reusing variables in different contexts is error prone and bad for 
> maintainability. You could reuse a variable, which is not as dead as you 
> thought. More local variables cost nothing, especially no stack space, 
> see the last proposed changed below. It is good to use more local 
> variables, because it gives the reader a name, which carries 
> information. Also it makes it easier for a reader (and the compiler!) to 
> identify, which expressions are the same. You could repeat 
> foo->long_name->even_longer_name[2 * i + 1] five times or just declare a 
> local variable and cache the value to make it easier to read.
> It also enables the compiler to produce better warnings: If you declare 
> a new variable, it is not initialised and if you do not initialise it on 
> all paths before a use, the compiler will warn you. But if you reuse an 
> older variable, it is already initialised by its former uses and so you 
> get no warning, but operate on garbage values (the old value from the 
> previous use).
> So it does not lead to slopyy code, but better code, which is easier to 
> comprehend, the compiler can better help you to prevent bugs, and as a 
> side effect the compiler can produce better code (aliasing is a major 
> problem for common subexpression elimination).
> 
> - You can determine the stack usage with all the variable declarations 
> at the top.
> This is not true. There is no relation between the declared variables 
> and the stack used. Especially, more stack than suggested by the 
> declarations can be used due to various optimisations - less space can 
> be used, too, of course.
> 
> - It is harder to find the declaration of a variable.
> Every editor, which is more advanced than ed, has facilities to jump to 
> the declaration of a variable (e.g. in vim it's "gd"). And most often 
> this is not necessary, because the declaration is just a few lines above 
> instead of all the way up at the top of the function, so no jumping 
> around is required at all.
> 
> - You could accidently shadow a variable.
> All modern compilers have warnings for this. FreeBSD uses -Wshadow for 
> GCC (and all relevant compilers understand this very option).
> 
> 
> >-Values in
> >-.Ic return
> >-statements should be enclosed in parentheses.
> >-.Pp
> 
> return with parentheses:
> Removed, because it does not improve maintainability in any way. There 
> is no source for confusion here, so the rule even contradicts the rule, 
> which states not to use redundant parentheses. Maybe, decades ago it was 
> just a workaround for a broken compiler, which does not exist anymore.
> 
> 
> >-Old-style function declarations look like this:
> >-.Bd -literal
> >-static char *
> >-function(a1, a2, fl, a4)
> >-	int a1, a2;	/* Declare ints, too, don't default them. */
> >-	float fl;	/* Beware double vs. float prototype differences. */
> >-	int a4;		/* List in order declared. */
> >-{
> >-.Ed
> >-.Pp
> >-Use ANSI function declarations unless you explicitly need K&R 
> >compatibility.
> >+Use ANSI function declarations.
> 
> Old-style function declarations:
> Removed, because there is no need to use them anymore. The C99 standard 
> considers them obsolescent[1], too. All headers use prototype 
> declarations, there's no reason to handle the function definitions 
> different. Also they are a source of bugs - or did you know that the 
> following is wrong:
>     void wrong(char /* sic! */);
>     void wrong(x) char x; {}
> And this is correct[2]:
>     void right(int /* sic! */);
>     void right(x) char x; {}
> (It's a bug in GCC that it does not complain about the former.)
> 
> 
> >-
> >-static void
> >-usage()
> >-{
> >-	/* Insert an empty line if the function has no local variables. */
> 
> Empty line when there are no local variables:
> Removed, because it does not improve readability and it actually hinders 
> readability for very short functions, e.g. static inline functions, 
> which just consist of one or two lines of code without any local 
> variables except for parameters. Further, it makes even less sense with 
> the changed recommendations for declarations.
> 
> 
> >+.Sh LOCAL VARIABLES
> >+Unaliased local variables are for free on modern compilers.
> >+They do not unconditionally use stack space.
> >+In fact there is no relation between their number and the used stack 
> >space.
> >+A local variable is guaranteed to be unaliased, if its address has not 
> >been
> >+taken (e.g. &i).
> >+Do not use the same local variable in different context just because it 
> >happens
> >+that the same type is needed.
> >+It does not improve the generated code in any way, but it makes it harder 
> >for a
> >+reader to determine all definitions and uses which belong together.
> >+Further, a new variable can be given a meaningful name (even if it is very
> >+short), which automatically serves as documentation and so improves
> >+comprehensibility.
> >+Especially avoid re-using variables, whose address has been taken:
> >+.Bd -literal
> >+	int i;
> >+	foo(&i);
> >+	printf("%d\\n", i);
> >+	for (i = 0; i != 10; ++i) {
> >+		/* BAD: i will be needlessly moved to and from memory in
> >+		 * the loop, because its address was taken before. This
> >+		 * results in larger and slower code.
> >+		 * Declare a new variable for the loop instead. */
> >+		printf("%d\\n", i);
> >+	}
> >+.Ed
> >+
> 
> Last, but definitely not least, I added this paragraph about the use of 
> local variables. This is to clarify, how today's compilers handle 
> unaliased local variables. There is absolutely no need to reuse them in 
> different contexts. Trying to optimise stack usage in this way is 
> misguided and is a source of bugs, when a reused variable is not as dead 
> as thought. For more reasons, please read the quoted diff.
> Maxim, you requested this paragraph: Does this addition suit you?
> 
> 
> Hopefully at least some of these suggestions are considered improvements 
> and are applied to style(9).
> 
> Regards
> 	Christoph
> 
> 
> [1] ISO/IEC 9899:1999 (E) ?6.11.6 and ?6.11.7
> [2] ISO/IEC 9899:1999 (E) ?6.7.5.3:15