This is part five in a series of blog-posts about best practices for writing C libraries. Previous installments: part one, part two, part three, part four.
API design
A C library is, almost by definition, something that offers an
API that is used in applications. Often an API can't be changed in incompatible ways (it can, however, be extended) so it is usually important to get right the first time because if you don't, you and your users will have to live with your mistakes for a long time.
This section is not a a full-blown guide to API design as there's a lot of literature, courses and presentations available on the subject - see e.g.
Designing a library that's easy to use - but we will mention the most important principles and a couple of examples of good and bad API design.
The main goals when it comes to API design is, of course, to make the API easy to use - this include
choosing good names for types, functions and constants. Be careful of abbreviations -
atof might be quick to type but it's not exactly clear that the function parses a C string and returns a double (no, not a float as the name suggests). Typically
nouns are used for types and while
verbs are used for methods.
Another thing to keep in mind is the number of function arguments - ideally each function should take only a few arguments so it's easy to remember how to use it. For example, no-one probably ever remembers exactly what arguments to pass to
g_spawn_async_with_pipes() so programmers end up looking up the docs,
breaking the rhythm. A better approach (which is yet to be implemented in GLib), would be to create a new type,
let's call it GProcess, with methods to set what you'd otherwise pass as arguments and then a method to spawn the actual program. Not only is this easier to use, it is also extensible as adding a method to a type doesn't break API while adding an argument to an existing function/method does. An example of such an API is libudev's
udev_enumerate API - for example, at the time udev starting dealing with
device tags, the
udev_enumerate type gained the
add_match_tag() method.
If using constants, it is often useful to use the
C enum type since the compiler can warn if a switch statement isn't handling all cases. Generally avoid boolean types in functions and use
flag enumerations instead - this has two advantages: first of all, it's sometimes easier to read foo_do_stuff(foo, FOO_FLAGS_FROBNICATOR) than foo_do_stuff(foo, TRUE) since the reader does not have to expend mental energy on remembering if TRUE translates into whether the frobnicator is to be used or not. Second, it means that several booleans arguments can be passed in one parameter so hard-to-use functions like e.g.
gtk_box_pack_start() can be avoided (most programmers can't remember if the
expand or
fill boolean comes first). Additionally, this technique allows adding new flags without breaking API.
Often the compiler can help - for example, C functions can be annotated with all kinds of
gcc-specific annotations that will cause warnings if the user is not using the function correctly. If using, GLib, some of these annotations are available as macros prefixed with G_GNUC, the most important ones being
G_GNUC_CONST,
G_GNUC_PURE,
G_GNUC_MALLOC,
G_GNUC_DEPRECATED_FOR,
G_GNUC_PRINTF and
G_GNUC_NULL_TERMINATED.
Checklist
- Choose good type and function names (favor expressiveness over length).
- Keep the number of arguments to functions down (consider introducing helper types).
- Use the type system / compiler to your advantage instead of fighting it (enums, flags, compiler annotations).
Documentation
If your library is very simple, the best documentation might just be a nicely formatted C header file with inline comments. Often it's not that simple and people using your library might expect richer and cross-referenced documentation complete with code samples.
Many C libraries, including those in GLib and GNOME itself, use inline documentation tags that can be read by tools such as
gtk-doc or
Doxygen. Note that gtk-doc works just fine even on low-level non-GLib-using libraries - see e.g.
libudev and
libblkid API documentation.
If used with a GLib library, gtk-doc uses the GLib type system to
draw type hierarchies and show type-specific things like
properties and
signals. gtk-doc can also easily integrate with any tool producing
Docbook documentation such as
manual pages or e.g.
gdbus-codegen(1) when used to generate docs describing D-Bus interfaces (
example with C API docs, D-Bus docs and man pages).
Checklist
- Decide what level of documentation is needed (HTML, pdf, man pages, etc.).
- Try to use standard tools such as Doxygen or gtk-doc.
- If shipping commands/daemons/helpers (e.g. anything showing up in ps(1) output), consider shipping man pages for those as well.
Language bindings
C libraries are increasingly used from higher-level languages such as
Python or
JavaScript through a so-called
language binding - for example, this is what allows the
Desktop Shell in
GNOME 3 to be written entirely in JavaScript while still using C libraries such as
GLib,
Clutter and
Mutter underneath.
It's outside the scope of this article to go into detail on language bindings (however a lot of the advice given in this series does apply - see also:
Writing Bindable APIs) but it's worth pointing out that the goal of the
GObject Introspection project (which is what is used in GNOME's Shell) is aiming for 100% coverage of GLib libraries assuming the library is properly
annotated. For example, this applies to the
GUdev library (a thin wrapper on top of the
libudev library) can be used from any language that supports GObject Introspection (
JS example).
GObject Intropspection is interesting because if someone adds GObject Introspection support to a new language, X, then the GNOME platform (and a lot of the underlying Linux plumbing as well cf. GUdev) is now suddenly available from that language without any work.
Checklist
- Make sure your API is easily bindable (avoid C-isms such as variadic functions).
- If using GLib, set up GObject Introspection and ship GIR/typelibs (notes).
- If writing a complicated application, consider writing parts of it in C and parts of it in a higher-level language.
ABI, API and versioning
While the API of a library describes how the programmer use it, the
ABI describes how the API is mapped onto the target machine the library is running on. Roughly, a (shared) library is said to be compatible with a previous version if a recompile is not needed. The ABI involves a lot of factors including
data alignment rules,
calling conventions,
file formats and other things that are not suitable to cover in this series; the important thing to know about when writing C libraries is how (and if) the ABI changes when the API changes. Specifically, since some changes (such as adding a new function) are backwards compatible, the interesting question is what kind of API changes result in non-backwards-compatible ABI changes.
Assuming all other factors like calling convention are constant, the rule of thumb about compatibility on the ABI level basically boils down to a very short list of allowed API changes:
- you may add new C functions; and
- you may add parameters to a function only if it doesn't cause a memory/resource leak; and
- you may add a return value to a function returning void only if it doesn't cause a memory leak; and
- modifiers such as const may be added / removed at will since they are not part of the ABI in C
The latter is an example of a change that breaks the API (causing compiler warnings when compiling existing programs that used to compile without warnings) but preserve the ABI (still allowing any previously compiled program to run) - see e.g.
this GLib commit for a concrete example (note that this can't be done in C++ because of how
name mangling work).
In general, you may not extend C structs that the user can allocate on the stack or embed in another C structure which is why
opaque data types are often used since they can be extended without the user knowing. In case the data type is not opaque, an often used technique is to add padding to structs (
example) and use it when adding a new virtual method or signal function pointer (
example). Other types, such as enumeration types, may normally be extended with new constants but existing constants may not be changed unless explicitly allowed.
The semantics of a function, e.g. its
side effect, is usually considered part of the ABI. For example, if the purpose of a function is to print diagnostics on
standard output and it stops doing it in a later version of the library, one could argue it's an ABI break even when existing programs are able to call the function and return to the caller just fine possibly even returning the same value.
On Linux,
shared libraries (similar to
DLLs on Windows) use the so-called
soname to maintain and provide backwards-compatibility as well as allowing having multiple incompatible run-time versions installed at the same time. The latter is achieved by increasing the major version number of a library every time a backwards-incompatible change is made. Additionally, other fields of the soname have other (complex) rules associated (
more info).
One solution to managing non-backwards-compatible ABI changes without bumping the so-number is
symbol versioning - however, apart from being hard to use, it only applies to functions and not e.g. higher-level run-time data structures like e.g. signals, properties and types registered with the GLib type-system.
It is often desirable to have multiple incompatible versions of libraries and their associated development tools installed at the same time (and in the same prefix) - for example, both version 2 and 3 of
GTK+. To easily achieve this, many libraries (including GLib and up) include the major version number (which is what is bumped exactly when non-backwards-compatible changes are made) in the library name as well as names of tools and so on - see the
Parallel Installation essay for more information.
Some libraries, especially when they are in their early stages of development, specifically gives no ABI guarantees (and thus, does not manage their soname when incompatible changes are made). Often, to better manage expectations, such unstable libraries require that the user defines a macro acknowledging this (
example). Once the library is baked, this requirement is then removed and normal ABI stability rules starts applying (
example).
Related to versioning, it's important to mention that in order for your library to be easy to use, it is absolutely crucial that it includes pkg-config files along with the header files and other development files (
more information).
Checklist
- Decide what ABI guarantees to give if any (and when)
- Make sure your users understand the ABI guarantees (being explicit is good)
- If possible, make it possible to have multiple incompatible versions of your library and tools installed at the same time (e.g. include the major version number in the library name)