bookmarks     readme     useme
 


            Linux kernel coding style

            This  is  a  short  document describing the preferred coding
            style for the linux kernel.  Coding style is very  personal,
            and  I  won't  _force_ my views on anybody, but this is what
            goes for anything that I have to be able  to  maintain,  and
            I'd  prefer  it  for most other things too.  Please at least
            consider the points made here.

            First off, I'd suggest printing out a copy of the GNU coding
            standards,  and  NOT  read it.  Burn them, it's a great sym-
            bolic gesture.

            Anyway, here goes:


            Chapter 1: Indentation

            Tabs are 8 characters, and  thus  indentations  are  also  8
            characters.   There  are  heretic movements that try to make
            indentations 4 (or even 2!)  characters deep,  and  that  is
            akin to trying to define the value of PI to be 3.

            Rationale:  The  whole idea behind indentation is to clearly
            define where a block of control starts and ends.  Especially
            when  you've  been  looking  at  your screen for 20 straight
            hours, you'll find it a lot easier to see how  the  indenta-
            tion works if you have large indentations.

            Now, some people will claim that having 8-character indenta-
            tions makes the code move too far to the right, and makes it
            hard  to read on a 80-character terminal screen.  The answer
            to that is that if you need more than 3 levels  of  indenta-
            tion, you're screwed anyway, and should fix your program.

            In  short,  8-char  indents  make things easier to read, and
            have the added benefit of warning you  when  you're  nesting
            your functions too deep.  Heed that warning.

            Don't  put  multiple  statements on a single line unless you
            have something to hide:

                 if (condition) do_this;
                    do_something_everytime;

            Outside of comments, documentation and  except  in  Kconfig,
            spaces are never used for indentation, and the above example
            is deliberately broken.

            Get a decent editor and don't leave whitespace at the end of
            lines.


            Chapter 2: Breaking long lines and strings

            Coding  style  is  all about readability and maintainability
            using commonly available tools.

            The limit on the length of lines is 80 columns and this is a
            hard limit.

            Statements longer than 80 columns will be broken into sensi-
            ble chunks.  Descendants are  always  substantially  shorter
            than  the  parent and are placed substantially to the right.
            The same applies to function headers with  a  long  argument
            list.  Long strings are as well broken into shorter strings.


                 void fun(int a, int b, int c)
                 {
                    if (condition)
                       printk( KERN_WARNING
                         "Warning this is a long printk with "
                         "3 parameters a: %u b: %u c: %u 0, a, b, c);
                    else
                       next_statement;
                 }


            Chapter 3: Placing Braces

            The other issue that always comes up in  C  styling  is  the
            placement  of braces.  Unlike the indent size, there are few
            technical reasons to choose one placement strategy over  the
            other, but the preferred way, as shown to us by the prophets
            Kernighan and Ritchie, is to put the opening brace  last  on
            the line, and put the closing brace first, thusly:

                 if (x is true) {
                     we do y
                 }


            However,  there  is one special case, namely functions: they
            have the opening brace at the beginning of  the  next  line,
            thus:

                 int function(int x)
                 {
                     body of function
                 }


            Heretic  people  all  over  the world have claimed that this
            inconsistency is ...  well ...  inconsistent, but all right-
            thinking  people  know  that (a) K&R are _right_ and (b) K&R
            are right.  Besides, functions are special anyway (you can't
            nest them in C).

            Note  that  the closing brace is empty on a line of its own,
            _except_ in the cases where it is followed by a continuation
            of  the same statement, ie a "while" in a do-statement or an
            "else" in an if-statement, like this:

                 do {
                      body of do-loop
                 } while (condition);

            and

                 if (x == y) {
                      ..
                 } else if (x > y) {
                      ...
                 } else {
                      ....
                 }


            Rationale: K&R.

            Also, note that this brace-placement also minimizes the num-
            ber  of  empty  (or almost empty) lines, without any loss of
            readability.  Thus, as  the  supply  of  new-lines  on  your
            screen  is  not a renewable resource (think 25-line terminal
            screens here), you have more empty lines to put comments on.


            Chapter 4: Naming

            C  is  a  Spartan  language,  and  so should your naming be.
            Unlike Modula-2 and Pascal programmers, C programmers do not
            use  cute  names  like ThisVariableIsATemporaryCounter.  A C
            programmer would call that variable  "tmp",  which  is  much
            easier  to write, and not the least more difficult to under-
            stand.

            HOWEVER, while mixed-case names are frowned  upon,  descrip-
            tive  names  for  global  variables  are  a must.  To call a
            global function "foo" is a shooting offense.

            GLOBAL variables (to be used only if you _really_ need them)
            need  to have descriptive names, as do global functions.  If
            you have a function that counts the number of active  users,
            you  should call that "count_active_users()" or similar, you
            should _not_ call it "cntusr()".

            Encoding the type of a function  into  the  name  (so-called
            Hungarian  notation)  is  brain damaged - the compiler knows
            the types anyway and can check those, and it  only  confuses
            the programmer.  No wonder MicroSoft makes buggy programs.

            LOCAL  variable names should be short, and to the point.  If
            you have some random integer loop counter, it should  proba-
            bly be called "i".  Calling it "loop_counter" is non-produc-
            tive, if there is no  chance  of  it  being  mis-understood.
            Similarly, "tmp" can be just about any type of variable that
            is used to hold a temporary value.

            If you are afraid to mix up your local variable  names,  you
            have  another  problem, which is called the function-growth-
            hormone-imbalance syndrome.  See next chapter.


            Chapter 5: Functions

            Functions should be short and sweet, and do just one  thing.
            They  should  fit  on  one  or  two  screenfuls of text (the
            ISO/ANSI screen size is 80x24, as we all know), and  do  one
            thing and do that well.

            The  maximum  length of a function is inversely proportional
            to the complexity and indentation level  of  that  function.
            So,  if you have a conceptually simple function that is just
            one long (but simple) case-statement, where you have  to  do
            lots  of  small things for a lot of different cases, it's OK
            to have a longer function.

            However, if you have a complex  function,  and  you  suspect
            that a less-than-gifted first-year high-school student might
            not even understand what the  function  is  all  about,  you
            should  adhere  to  the maximum limits all the more closely.
            Use helper functions with descriptive names (you can ask the
            compiler to in-line them if you think it's performance-crit-
            ical, and it will probably do a better job of  it  than  you
            would have done).

            Another measure of the function is the number of local vari-
            ables.  They shouldn't exceed 5-10, or  you're  doing  some-
            thing  wrong.   Re-think  the  function,  and  split it into
            smaller pieces.  A human brain  can  generally  easily  keep
            track of about 7 different things, anything more and it gets
            confused.  You know you're brilliant, but maybe  you'd  like
            to understand what you did 2 weeks from now.


            Chapter 6: Centralized exiting of functions

            Albeit deprecated by some people, the equivalent of the goto
            statement is used frequently by compilers  in  form  of  the
            unconditional jump instruction.

            The goto statement comes in handy when a function exits from
            multiple locations and some common work such as cleanup  has
            to be done.

            The rationale is:

            -    unconditional  statements  are easier to understand and
                 follow

            -    nesting is reduced

            -    errors by not updating individual exit points when mak-
                 ing modifications are prevented

            -    saves the compiler work to optimize redundant code away
                 ;)


                      int fun(int )
                      {
                           int result = 0;
                           char *buffer = kmalloc(SIZE);

                           if (buffer == NULL)
                                 return -ENOMEM;

                           if (condition1) {
                                 while (loop1) {
                                      ...
                                 }
                                 result = 1;
                                 goto out;
                           }
                           ...
                      out:
                           kfree(buffer);
                           return result;
                      }



            Chapter 7: Commenting

            Comments are good, but there is also a danger  of  over-com-
            menting.  NEVER try to explain HOW your code works in a com-
            ment: it's much better to write the code so that the  _work-
            ing_  is  obvious, and it's a waste of time to explain badly
            written code.

            Generally, you want your comments to  tell  WHAT  your  code
            does, not HOW.  Also, try to avoid putting comments inside a
            function body: if the function is so complex that  you  need
            to  separately  comment  parts of it, you should probably go
            back to chapter 5 for a while.  You can make small  comments
            to  note  or  warn  about  something particularly clever (or
            ugly), but try to avoid excess.  Instead, put  the  comments
            at  the  head  of the function, telling people what it does,
            and possibly WHY it does it.


            Chapter 8: You've made a mess of it

            That's OK, we all do.  You've probably  been  told  by  your
            long-time  Unix  user  helper that "GNU emacs" automatically
            formats the C sources for you, and you've noticed that  yes,
            it  does  do  that,  but  the defaults it uses are less than
            desirable (in fact, they are worse than random typing  -  an
            infinite number of monkeys typing into GNU emacs would never
            make a good program).

            So, you can either get rid of GNU emacs, or change it to use
            saner values.  To do the latter, you can stick the following
            in your .emacs file:

                 (defun linux-c-mode ()
                 "C mode with adjusted defaults for use with the Linux kernel."
                   (interactive)
                   (c-mode)
                   (c-set-style "K&R")
                   (setq tab-width 8)
                   (setq indent-tabs-mode t)
                   (setq c-basic-offset 8))

            This will define the M-x linux-c-mode command.  When hacking
            on a module, if you put the string -*- linux-c -*- somewhere
            on the first two lines,  this  mode  will  be  automatically
            invoked. Also, you may want to add

            (setq auto-mode-alist (cons '("/usr/src/linux.*/.*\.[ch]$" .
            linux-c-mode)                auto-mode-alist))

            to your  .emacs  file  if  you  want  to  have  linux-c-mode
            switched  on  automagically when you edit source files under
            /usr/src/linux.

            But even if you fail in getting emacs to do sane formatting,
            not everything is lost: use "indent".

            Now, again, GNU indent has the same brain-dead settings that
            GNU emacs has, which is why you need to give it a  few  com-
            mand  line  options.   However,  that's not too bad, because
            even the makers of GNU indent recognize the authority of K&R
            (the  GNU  people  aren't  evil, they are just severely mis-
            guided in this matter), so you just give indent the  options
            "-kr  -i8"  (stands  for "K&R, 8 character indents"), or use
            "scripts/Lindent", which indents in the latest style.

            "indent" has a lot of options, and especially when it  comes
            to  comment re-formatting you may want to take a look at the
            man page.  But remember: "indent" is not a fix for bad  pro-
            gramming.


            Chapter 9: Configuration-files

            For  configuration  options  (arch/xxx/Kconfig,  and all the
            Kconfig files), somewhat different indentation is used.

            Help text is indented with 2 spaces.


                 if CONFIG_EXPERIMENTAL
                      tristate CONFIG_BOOM
                      default n
                      help
                        Apply nitroglycerine inside the keyboard (DANGEROUS)
                      bool CONFIG_CHEER
                      depends on CONFIG_BOOM
                      default y
                      help
                        Output nice messages when you explode
                 endif

            Generally, CONFIG_EXPERIMENTAL should surround  all  options
            not  considered  stable. All options that are known to trash
            data (experimental  write-  support  for  file-systems,  for
            instance)  should be denoted (DANGEROUS), other experimental
            options should be denoted (EXPERIMENTAL).


            Chapter 10: Data structures

            Data structures that have  visibility  outside  the  single-
            threaded  environment  they  are  created  and  destroyed in
            should always have reference counts.  In the kernel, garbage
            collection  doesn't  exist  (and  outside the kernel garbage
            collection is slow and inefficient), which  means  that  you
            absolutely _have_ to reference count all your uses.

            Reference  counting  means  that  you can avoid locking, and
            allows multiple users to have access to the  data  structure
            in  parallel  -  and not having to worry about the structure
            suddenly going away from under them just because they  slept
            or did something else for a while.

            Note  that  locking  is  _not_  a  replacement for reference
            counting.  Locking is used to keep data structures coherent,
            while  reference  counting is a memory management technique.
            Usually both are needed, and they are  not  to  be  confused
            with each other.

            Many data structures can indeed have two levels of reference
            counting, when there are users of different "classes".   The
            subclass  count  counts  the  number  of subclass users, and
            decrements the global count  just  once  when  the  subclass
            count goes to zero.

            Examples  of  this  kind of "multi-level-reference-counting"
            can be  found  in  memory  management  ("struct  mm_struct":
            mm_users  and  mm_count),  and  in  filesystem code ("struct
            super_block": s_count and s_active).

            Remember: if another thread can find  your  data  structure,
            and  you don't have a reference count on it, you almost cer-
            tainly have a bug.


            Chapter 11: Macros, Enums, Inline functions and RTL

            Names of macros defining constants and labels in  enums  are
            capitalized.

            #define CONSTANT 0x12345

            Enums are preferred when defining several related constants.

            CAPITALIZED macro names are appreciated  but  macros  resem-
            bling functions may be named in lower case.

            Generally,  inline functions are preferable to macros resem-
            bling functions.

            Macros with multiple statements should be enclosed in a do -
            while block:


                 #define macrofun(a, b, c)
                     do {
                          if (a == 5)
                          do_this(b, c);
                     } while (0)

            Things to avoid when using macros:

            1) macros that affect control flow:

                    #define FOO(x)
                        do {
                              if (blah(x) < 0)
                                   return -EBUGGERED;
                        } while(0)

            is  a  _very_  bad  idea.  It looks like a function call but
            exits the  "calling"  function;  don't  break  the  internal
            parsers of those who will read the code.

            2)  macros  that  depend  on  having a local variable with a
            magic name:

            #define FOO(val) bar(index, val)

            might look like a good thing, but  it's  confusing  as  hell
            when  one  reads  the  code  and it's prone to breakage from
            seemingly innocent changes.

            3) macros with arguments that are used as l-values: FOO(x) =
            y;  will  bite you if somebody e.g. turns FOO into an inline
            function.

            4) forgetting about precedence:  macros  defining  constants
            using  expressions  must enclose the expression in parenthe-
            ses. Beware of similar issues with macros using  parameters.

            #define CONSTANT 0x4000
            #define CONSTEXP (CONSTANT | 3)

            The  cpp  manual  deals  with  macros  exhaustively. The gcc
            internals manual also covers RTL which  is  used  frequently
            with assembly language in the kernel.


            Chapter 12: Printing kernel messages

            Kernel  developers  like to be seen as literate. Do mind the
            spelling of kernel messages to make a  good  impression.  Do
            not  use  crippled  words  like  "dont"  and use "do not" or
            "don't" instead.

            Kernel messages do not have to be terminated with a  period.

            Printing  numbers  in  parentheses  (%d)  adds  no value and
            should be avoided.


            Chapter 13: References

            The C Programming Language, Second Edition
            by Brian W. Kernighan and Dennis M. Ritchie.
            Prentice Hall, Inc., 1988.
            ISBN 0-13-110362-8 (paperback), 0-13-110370-9 (hardback).
            URL: http://cm.bell-labs.com/cm/cs/cbook/


            The Practice of Programming
            by Brian W. Kernighan and Rob Pike.
            Addison-Wesley, Inc., 1999.
            ISBN 0-201-61586-X.
            URL: http://cm.bell-labs.com/cm/cs/tpop/


            GNU manuals - where in compliance with K&R and this text - for cpp, gcc,
            gcc internals and indent, all available from http://www.gnu.org


            WG14 is the international standardization working group for the programming
            language C, URL: http://std.dkuug.dk/JTC1/SC22/WG14/


            --
            Last updated on 16 February 2004 by a community effort on LKML.