Tuesday, July 5, 2011

Writing a C library, part 5

This is part five in a series of blog-posts about best practices for writing C libraries. Previous installments: part one, part two, part three, part four.

API design

A C library is, almost by definition, something that offers an API that is used in applications. Often an API can't be changed in incompatible ways (it can, however, be extended) so it is usually important to get right the first time because if you don't, you and your users will have to live with your mistakes for a long time.

This section is not a a full-blown guide to API design as there's a lot of literature, courses and presentations available on the subject - see e.g. Designing a library that's easy to use - but we will mention the most important principles and a couple of examples of good and bad API design.

The main goals when it comes to API design is, of course, to make the API easy to use - this include choosing good names for types, functions and constants. Be careful of abbreviations - atof might be quick to type but it's not exactly clear that the function parses a C string and returns a double (no, not a float as the name suggests). Typically nouns are used for types and while verbs are used for methods.

Another thing to keep in mind is the number of function arguments - ideally each function should take only a few arguments so it's easy to remember how to use it. For example, no-one probably ever remembers exactly what arguments to pass to g_spawn_async_with_pipes() so programmers end up looking up the docs, breaking the rhythm. A better approach (which is yet to be implemented in GLib), would be to create a new type, let's call it GProcess, with methods to set what you'd otherwise pass as arguments and then a method to spawn the actual program. Not only is this easier to use, it is also extensible as adding a method to a type doesn't break API while adding an argument to an existing function/method does. An example of such an API is libudev's udev_enumerate API - for example, at the time udev starting dealing with device tags, the udev_enumerate type gained the add_match_tag() method.

If using constants, it is often useful to use the C enum type since the compiler can warn if a switch statement isn't handling all cases. Generally avoid boolean types in functions and use flag enumerations instead - this has two advantages: first of all, it's sometimes easier to read foo_do_stuff(foo, FOO_FLAGS_FROBNICATOR) than foo_do_stuff(foo, TRUE) since the reader does not have to expend mental energy on remembering if TRUE translates into whether the frobnicator is to be used or not. Second, it means that several booleans arguments can be passed in one parameter so hard-to-use functions like e.g. gtk_box_pack_start() can be avoided (most programmers can't remember if the expand or fill boolean comes first). Additionally, this technique allows adding new flags without breaking API.

Often the compiler can help - for example, C functions can be annotated with all kinds of gcc-specific annotations that will cause warnings if the user is not using the function correctly. If using, GLib, some of these annotations are available as macros prefixed with G_GNUC, the most important ones being G_GNUC_CONST, G_GNUC_PURE, G_GNUC_MALLOC, G_GNUC_DEPRECATED_FORG_GNUC_PRINTF and G_GNUC_NULL_TERMINATED.

Checklist

  • Choose good type and function names (favor expressiveness over length).
  • Keep the number of arguments to functions down (consider introducing helper types).
  • Use the type system / compiler to your advantage instead of fighting it (enums, flags, compiler annotations).

Documentation

If your library is very simple, the best documentation might just be a nicely formatted C header file with inline comments. Often it's not that simple and people using your library might expect richer and cross-referenced documentation complete with code samples.

Many C libraries, including those in GLib and GNOME itself, use inline documentation tags that can be read by tools such as gtk-doc or Doxygen. Note that gtk-doc works just fine even on low-level non-GLib-using libraries - see e.g. libudev and libblkid API documentation.

If used with a GLib library, gtk-doc uses the GLib type system to draw type hierarchies and show type-specific things like properties and signals. gtk-doc can also easily integrate with any tool producing Docbook documentation such as manual pages or e.g. gdbus-codegen(1) when used to generate docs describing D-Bus interfaces (example with C API docs, D-Bus docs and man pages).

Checklist

  • Decide what level of documentation is needed (HTML, pdf, man pages, etc.).
  • Try to use standard tools such as Doxygen or gtk-doc.
  • If shipping commands/daemons/helpers (e.g. anything showing up in ps(1) output), consider shipping man pages for those as well.

Language bindings

C libraries are increasingly used from higher-level languages such as Python or JavaScript through a so-called language binding - for example, this is what allows the Desktop Shell in GNOME 3 to be written entirely in JavaScript while still using C libraries such as GLib, Clutter and Mutter underneath.

It's outside the scope of this article to go into detail on language bindings (however a lot of the advice given in this series does apply - see also: Writing Bindable APIs) but it's worth pointing out that the goal of the GObject Introspection project (which is what is used in GNOME's Shell) is aiming for 100% coverage of GLib libraries assuming the library is properly annotated. For example, this applies to the GUdev library (a thin wrapper on top of the libudev library) can be used from any language that supports GObject Introspection (JS example).

GObject Intropspection is interesting because if someone adds GObject Introspection support to a new language, X, then the GNOME platform (and a lot of the underlying Linux plumbing as well cf. GUdev) is now suddenly available from that language without any work.

Checklist

  • Make sure your API is easily bindable (avoid C-isms such as variadic functions).
  • If using GLib, set up GObject Introspection and ship GIR/typelibs (notes).
  • If writing a complicated application, consider writing parts of it in C and parts of it in a higher-level language.

ABI, API and versioning

While the API of a library describes how the programmer use it, the ABI describes how the API is mapped onto the target machine the library is running on. Roughly, a (shared) library is said to be compatible with a previous version if a recompile is not needed. The ABI involves a lot of factors including data alignment rules, calling conventions, file formats and other things that are not suitable to cover in this series; the important thing to know about when writing C libraries is how (and if) the ABI changes when the API changes. Specifically, since some changes (such as adding a new function) are backwards compatible, the interesting question is what kind of API changes result in non-backwards-compatible ABI changes.

Assuming all other factors like calling convention are constant, the rule of thumb about compatibility on the ABI level basically boils down to a very short list of allowed API changes:
  • you may add new C functions; and
  • you may add parameters to a function only if it doesn't cause a memory/resource leak; and
  • you may add a return value to a function returning void only if it doesn't cause a memory leak; and
  • modifiers such as const may be added / removed at will since they are not part of the ABI in C
The latter is an example of a change that breaks the API (causing compiler warnings when compiling existing programs that used to compile without warnings) but preserve the ABI (still allowing any previously compiled program to run) - see e.g. this GLib commit for a concrete example (note that this can't be done in C++ because of how name mangling work).

In general, you may not extend C structs that the user can allocate on the stack or embed in another C structure which is why opaque data types are often used since they can be extended without the user knowing. In case the data type is not opaque, an often used technique is to add padding to structs (example) and use it when adding a new virtual method or signal function pointer (example). Other types, such as enumeration types, may normally be extended with new constants but existing constants may not be changed unless explicitly allowed.

The semantics of a function, e.g. its side effect, is usually considered part of the ABI. For example, if the purpose of a function is to print diagnostics on standard output and it stops doing it in a later version of the library, one could argue it's an ABI break even when existing programs are able to call the function and return to the caller just fine possibly even returning the same value.

On Linux, shared libraries (similar to DLLs on Windows) use the so-called soname to maintain and provide backwards-compatibility as well as allowing having multiple incompatible run-time versions installed at the same time. The latter is achieved by increasing the major version number of a library every time a backwards-incompatible change is made. Additionally, other fields of the soname have other (complex) rules associated (more info).

One solution to managing non-backwards-compatible ABI changes without bumping the so-number is symbol versioning - however, apart from being hard to use, it only applies to functions and not e.g. higher-level run-time data structures like e.g. signals, properties and types registered with the GLib type-system.

It is often desirable to have multiple incompatible versions of libraries and their associated development tools installed at the same time (and in the same prefix) - for example, both version 2 and 3 of GTK+. To easily achieve this, many libraries (including GLib and up) include the major version number (which is what is bumped exactly when non-backwards-compatible changes are made) in the library name as well as names of tools and so on - see the Parallel Installation essay for more information.

Some libraries, especially when they are in their early stages of development, specifically gives no ABI guarantees (and thus, does not manage their soname when incompatible changes are made). Often, to better manage expectations, such unstable libraries require that the user defines a macro acknowledging this (example). Once the library is baked, this requirement is then removed and normal ABI stability rules starts applying (example).

Related to versioning, it's important to mention that in order for your library to be easy to use, it is absolutely crucial that it includes pkg-config files along with the header files and other development files (more information).

Checklist

  • Decide what ABI guarantees to give if any (and when)
  • Make sure your users understand the ABI guarantees (being explicit is good)
  • If possible, make it possible to have multiple incompatible versions of your library and tools installed at the same time (e.g. include the major version number in the library name)

6 comments:

  1. This is a GREAT series of articles. I especially like the many very useful links. Go try and write something like this on a dead-trees book.

    Thanks a lot!

    ReplyDelete
  2. How does adding new parameters to an existing function NOT break the function's ABI?

    1. If a v2 caller passes unexpected parameters to a v1 library, then the library will safely ignore them. No problem.

    2. But if a v1 caller passes TOO FEW parameters to a v2 library, then the library will read garbage values for the new parameters (unless the v2 library is deduce if the new parameters are missing based on the values of the old parameters).

    ReplyDelete
  3. An interesting and somewhat elegant variant of a parallel installation with symbol versioning dates back to Multics, and some early linkers.

    This versions the interface or a struct passed via it, and allows asynchronous change: the client and server need not change at the same time.

    This is described in Paul Stachour's article in CACM, at http://cacm.acm.org/magazines/2009/11/48444-you-dont-know-jack-about-software-maintenance/fulltext

    --dave (who helped on the article) c-b

    ReplyDelete
  4. Structure versioning guidelines in glibc
    http://sourceware.org/glibc/wiki/Development/Versioning_A_Structure

    ReplyDelete
  5. Very good overview of libraries. I like the links to outside resources, very handy!

    ReplyDelete
  6. Just like Chris, I'd also like some clarification on the list of ABI-compatible changes in the API/ABI section. How is adding parameters ever safe? Could you explain further what you mean with "if it does not cause leaks"?

    ReplyDelete