Wednesday, June 29, 2011

Writing a C library, part 3

This is part three in a series of blog-posts about best practices for writing C libraries. Previous installments: part one, part two.

Modularity and namespaces

The C programming language does not support the concept of namespaces (as used in e.g. C++ or Python) so it is usually emulated simply by using naming conventions. The main reason of namespaces is to avoid naming collisions - consider both libwoot and libkool providing a function called get_all_objects() - which one should be used if a program links to both libraries? Namespacing is an important part of a naming strategy and applies to variables, function names, type names (including structs, unions, enums and typedefs) and macros.

The standard convention is to use a short identifier, e.g. for libnm-glib you will see nm_ and NM being used, for Clutter it's clutter and Clutter and for libpolkit-agent-1, it's polkit_agent and PolkitAgent. For libraries that don't use CamelCase for its types, the same prefix is normally used for functions and types - for example, libudev the prefix used is simply udev.

Code that isn't using namespaces properly is not only hard to integrate into other libraries and applications (the chance of symbol collisions is high), there's also a chance that it will collide with future additions to the standard C library or POSIX standards.

One benefit of using namespaces in C (one that ironically is not present in a language with proper support for namespaces), is that it's a lot easier to pinpoint what the code is doing by just looking at a fragment of the source code - e.g. when you see an item being added to a container, you are usually not in doubt whether the programmer meant to invoke GtkContainer's add() method or ClutterContainer's add() method because of how C namespacing forces the programmer to be explicit, for better or worse.

In addition to choosing a good naming strategy, note that the visibility of what symbols (typically variables and functions) a library export can be fine-tuned, see these notes for why this is desirable.

On the topic of naming, it is usually a good idea to avoid C++ keywords (such as "class") for variable names, at least in header files that you except C++ code to include using e.g. extern "C". Additionally, generally avoid names of functions in the C standard library / POSIX for variable names such as "interface" or "index" because these functions can (and on Linux, actually is) be defined as macros.

Checklist

  • Choose a naming convention - and stick to it.
  • Do not export symbols that are not public API.

Error handling

If there's one statement that adequately describes error handling in the C programming language, it's perhaps that it's something that people rarely agree on. Most programmers, however, would agree that errors can be broken down into two categories 1) programmer errors; 2) run-time errors.

A programmer error is when the programmer isn't using a function correctly - e.g. passing a non-UTF-8 string to a function expecting a valid UTF-8 string such as g_variant_new_string() (if unsure, validate with g_utf8_validate() before calling the function) or passing an invalid D-Bus name to g_bus_own_name() (if unsure, validate with g_dbus_is_name() and g_dbus_is_unique_name() before calling).

Most libraries have undefined behavior in the presence of being used incorrectly - in the GLib case the macros g_return_if_fail() / g_return_val_if_fail() are used, see e.g. the checks in g_variant_new_string() and the checks in g_dbus_own_name().  Additionally, for performance, these checks can be disabled by defining the macro G_DISABLE_CHECKS when building either GLib itself or applications using GLib (but usually aren't). Not all parameters may be checked, however, and the check might not cover all cases because checks can be expensive. Combined with the G_DEBUG flag, it's even easy to trap this in debugger by running the program in an environment where G_DEBUG=fatal-warnings.

Having g_return_if_fail()-style checks is usually a trade-off - for example, GLib didn't initially have the UTF-8 check in g_variant_new_string() - it was only added when it became apparent that a considerable amount of users passed non-UTF-8 data which caused errors in unrelated code that was extremely hard to track down - see the commit message for details. If this cost is unacceptable, the programmer can easily use the g_variant_new_from_data() function passing TRUE as the trusted parameter.

Even with a library doing proper parameter validation (to catch programmer errors early on), if you pass garbage to a function you usually end up with undefined behavior and undefined behavior can mean anything including formatting your hard disk or evaporating all booze in a five-mile radius (oh noz). That's why some libraries simply calls abort() instead of carrying on pretending nothing happened. In general, a C library can never guarantee that it won't blow up no matter what data is passed - for example the user may pass a pointer to invalid data and, boom, SIGSEGV is raised when the library tries to accesses it. Of course the library could try to recover, longjmp(3) style, but since it's a library it can't mess around with process-wide state like signal handlers. Unfortunately, even smart people sometime fail to realize that the caller has a responsibility and instead blames the library instead of its user (for the record, libdbus-1 is fine which is why process 1 is able to use it without any problems). In most cases, problems like these are solved by just throwing documentation at the problem.

To conclude, when it comes to programmer errors, one key take away is that it's a good idea to document exactly what kind of input a function accepts. As the saying goes, "trust is good, control is better", it is also a good idea to verify that the programmer gets it right by using g_return_if_fail() style checks (and possibly provide API that does no such checks). Also, if your code does any kinds of checks, make sure that the functions used for checking (if non-trivial) are public so e.g. language bindings have a chance to validate input before calling the function (see also: notes on errors in libdbus).

A run-time error is e.g. if fopen(3) returns NULL (for example the file to be opened does not exist or the calling process is not privileged to open it), g_socket_client_connect() returns FALSE (the network might not be up) or g_try_malloc() returns NULL (might not have enough address space for a 8GiB array). By definition, run-time errors are recoverable although the code you are using might treat some (like malloc(3) failing) as irrecoverable because handling some run-time errors (such as OOM) would complicate the API not only on the function level (possibly by taking an error parameter), but also by requiring transactional semantics (e.g. rollback) on most data types (see also: write-up on why handling OOM is hard and a good explanation of Linux's overcommit feature).

For simple libraries just using libc's errno is often simplest approach to handling run-time errors (since it's thread-safe and every C programmer knows it) but note that some functions including asprintf(3) does not set errno to ENOMEM if e.g. failing to allocate memory. If you are basing your code on a library like GLib, use its native error type, e.g. GError, for run-time errors. An interesting approach to handling errors is the one used by the cairo 2D graphics library where (non-trivial) object instances track the error state (see e.g. cairo_status() and cairo_device_status()). There are many many other ways to convey run-time errors - as always, the important thing when writing a C library is to be consistent.

Checklist

  • Document valid and invalid value ranges for parameters (if any) and provide facilities to validate parameters (unless trivial) for programmers and language bindings
  • Try to validate incoming parameters at public API boundaries
  • Establish a policy on how to deal with programmer errors (e.g. undefined behavior or abort()).
  • Establish a policy on how to deal with run-time errors (e.g. use errno or GError)
  • Ensure the way you handle run-time errors map to common exception handling systems.

Encapsulation and OO design

While C as programming language does not have built-in support for object-oriented programming lots of C programmers use C that way - in many ways it's almost hard not to. In fact, many C programmers regard the simplicity of C (compared to, say, C++) as a feature insofar that you are not bound to any one object model - for example, the kernel uses various OO techniques and the GLib/GTK+ stack has its own dynamic type system called GType on which the GObject base class (that many classes are derived from) is built.

There's of course a price to pay for defining your own object model - it typically involves more typing (identifiers are longer) and, especially for GObject, involves actual function calls to register properties, add private instance data and so on (example). On the other hand, such a dynamic type system often offer some level of type introspection so it's possible to easy link the property for whether a check-button widget is active with whether an text-entry widget should use password mode using the g_object_bind_property() function (screenshot). Polymorphism in GObject is provided by embedding a virtual method table in the class struct (example) and providing a C functions that uses the function pointer (example) - note that derived types can access the class struct to chain up (example).

One important feature of object-oriented design in C is that it usually promotes encapsulation and data hiding through the use of opaque data types - this is desirable as it allows extending the data type (e.g. adding more properties or methods) without breaking or requiring a recompile existing programs using the library (a future installment will discuss API and ABI and what it means wrt. API design). In an opaque data type, fields that would usually be in the C struct are hidden from the user and instead are made available via a getter (example) and/or setter (example) - additionally, if the object model support properties, the member may also be made available as a property (example) - for example, this is useful for notifying when the property changes.

Of course, not every single data structure need to be a full-blown GObject - for example, in some cases data hiding might not be desirable (sometimes it's awkward to use a C getter function) or maybe it's too slow to do from an inner loop (direct struct access is without a doubt faster). Also, for simple data structures it is sometimes desirable to initialize struct instances directly in the code.

Even when a full-blown object model (like GType and GObject) isn't used, it's never a bad idea to use opaque data structures and getters/setters. As an interesting alternative to this, note that some libraries explicitly allows extending a C structure without considering it an ABI change - while there's no easy way to enforce this (the user may allocate the structure on the stack), at least the library author can always tell the programmer that he shouldn't have done so (which may or may not be useful).

Checklist

  • Establish an object model for your library (if applicable).
  • Hide as many implementation details as is practical without impacting performance
  • Ensure that you can extend your library and types without breaking API or ABI.
  • If possible, build on top of an established and well-understood object system (such as the GLib one)

Tuesday, June 28, 2011

Writing a C library, part 2

This is part two in a series of blog-posts about best practices for writing C libraries. Previous installments: part one.

Event handling and the main loop

Event-driven applications, especially GUI applications, are often built around the idea of a "main loop" that intercepts and dispatches all sorts of events such as key/button presses, incoming IPC calls, mouse movements, timers and file/socket I/O and so on. The main loop typically "call back" into the application whenever an event happens.

For example, the GLib/Gtk+ library stack is built around the GMainContext, GMainLoop and GSource types and other library stacks provide similar abstractions. Many kernel and systems-level programmers often look funny at GUI programmers when they utter the word "main loop" - much the same way GUI programmers stare confused at kernel programmers when they say put() or get() something. The truth is that a main-loop is really a well-known concept with a different name: it's basically an abstraction of OS primitives such as select(2), poll(2) or equivalent on e.g. Windows.

It is important to note that a multi-threaded application may run different main loops in different threads to ensure that callbacks happen in the right threads - in GLib this is achieved by using the g_main_context_push_thread_default() function which records the main loop for the current thread in thread-local storage. This variable is in turn read when starting an asynchronous operation (such as g_dbus_connection_call() or g_input_stream_read_async()) to ensure that the passed callback function is invoked in a thread running a main loop for the context set with g_main_context_push_thread_default() earlier.

Some main loops, for example the GLib one, allows creating recursive main loops and this is used to implement GtkDialog's run() method. While this indeed appears to block the calling thread, it is important to note that events are still being processed (to e.g. process input events and redraw animations). Specifically, this means that the functions (plural since applies to everything in the call stack) that brought up the dialog might end getting called again (from a callback). Thus, when using functions like gtk_dialog_run() you need to ensure that your functions are either re-entrant or that they are guaranteed to not get called when the dialog is showing (typically achieved by making the dialog modal so the UI action triggering the display of the dialog can't be accessed). Because of pitfalls like this, you must clearly document if a function is using a recursive main loop.

Note that main loops are not a GUI-only concept - a lot of daemons (e.g. background process without any GUI) are built around this concept since it nicely integrates events from any source whether they are file descriptor based or synthetic such as timers or logging events. In fact, a considerable part of the system-level software on a modern Linux system is built on top of GLib and uses its main event loop abstraction to dispatch events - most of the time such daemons sit idle in one or more main loops and wait for D-Bus messages to arrive (to service a client), a timeout to fire (maybe to kick off periodic house-keeping tasks) or a child process to terminate (when using a helper program or process to do work).

Now that we've explained what a main loop is (or rather, what the idea of a main loop is), let's look at why this matters if you a writing a C library. First of all, if your library doesn't need to deliver events to users, you don't need to worry about main loops. Most libraries, however, are not that simple - for example, libudev delivers events when devices are plugged or changed, NetworkManager wants to inform of changes in networking and so on.

If your library is using GLib, it is often suitable to just require that the user runs the GLib main loop (if the application is using another main loop, it can either integrate the GLib main loop (like Qt does) or run it in a separate thread) and use g_main_context_get_thread_default() when setting up a callback. This is the way many GObject-based libraries, such as libpolkit-gobject-1libnm-glib or libgudev work - for example, callbacks connected to the GUdevClient::uevent signal will be called in what was the thread-default main loop when the object was constructed. For a shared resource, such as a message bus connection, a good policy is that callbacks happen in what was the the thread-default main loop when the method was called (see e.g. g_dbus_connection_signal_subscribe() where this is the case) since applications or libraries have no absolute control of when the shared object was created. In any case, functions dealing with callbacks must always document in what context the callback happens in.

On the other hand, if your library is not using GLib, a good way to provide notification is simply to a) provide a file descriptor that e.g. turns readable when there are events to process; and b) provide a function that processes events (and possibly invoke callback functions registered by the user). A good example of this is libudev's udev_monitor_get_fd() and udev_monitor_receive_device() functions. This way the application (or the library using your library) can easily control what thread the event is handled in. As an example of how libudev is integrated into the GLib main loop, see here and here. In the libudev case, the returned file descriptor is the underlying netlink socket used to receive the event from udevd (via the kernel); in the case that there are no natural file descriptor (could be the event is only happening in response to a certain entry in, say, a log file), your library could use pipe(2) (or eventfd(2) if on Linux) and use a private worker thread to signal the other end.

If your library provide callback functions, make sure they take user_data arguments so the user can easily associate callbacks with other objects and data. If the scope of your callback is undefined (e.g. may fire more than once or if there is no way to disconnect the callback), also provide a way to free the user_data pointer when it is no longer needed - otherwise the application will leak data that needs to be freed later. See g_bus_watch_name() and  for an example.

Checklist

  • Provide APIs for main loop integration
  • Make sure callback functions take user_data arguments (possibly with free an accompanying free function)

Synchronous and Asynchronous I/O

It is important for users of a library to know if calling a function involves doing synchronous I/O (also called blocking I/O). For example, an application with an user interface need to be responsive to user input and may even need to update the user interface every frame for smooth animations (e.g. 60 times a second). To avoid unresponsive applications and jerky animations, its UI thread must never call any functions that does any synchronous I/O.

Note that even loading a file from local disk may block for a very long amount of time - sometimes tens of seconds. For example the file may not be in the page cache and the hard disk with the file system may be powered down - or the file may be in the users home directory which could be on a network filesystem such as NFS. Other examples of blocking IO includes local IPC such as D-Bus or Unix domain sockets.

If an operation is known to take a long time to complete (synchronous or otherwise), it is often nice if it is possible to easily cancel the it (perhaps from another thread). For example, see the GCancellable type in the GLib stack. Another nicety (although easily implemented via the GCancellable type) is a way to set a timeout for potentially long-running operations - see e.g. g_dbus_connection_send_message_with_reply() and g_dbus_proxy_call() and note how the latter has an object-wide timeout so the timeout only has to be set once.

Some libraries provide both synchronous and asynchronous versions of a function where the former blocks the calling thread and the latter doesn't. Typically asynchronous I/O is implemented using worker threads (where the worker thread is doing synchronous I/O) but it could also involve communicating with another process via IPC (e.g. D-Bus) or even TCP/IP. For example, in the libgio-2.0 case asynchronous file I/O is implemented via synchronous IO (e.g. basically read(2) and write(2) calls) in worker threads (using a GThreadPool) simply because the Linux kernel does not (yet?) provide an adequate way to do asynchronous I/O suitable for libraries (see also: colorful notes about Asynchronous I/O). On the upside, this is mostly an implementation detail and the libgio-2.0 implementation can migrate to a non-threaded approach should such a mechanism be made available in the future.

Asynchronous I/O typically involves callbacks (or at least some kind of event notification) and thus involves a main loop. If a library provides functions for this, it should clearly state what thread the callback will happen in, and whether it requires the application to run a (specific kind of) main loop - see the previous section about main loops for details.

If a library is thread-safe, it is often easier for the application itself to just use the synchronous version of a function in a worker-thread - if using GLib, g_io_scheduler_push_job() is the right way to do that.

In some cases synchronous I/O is implemented by using a recursive main loop (typically by using the asynchronous form of the function) - this should be avoided as it typically causes all kinds of problems because of reentrancy and events being processed while waiting for the supposedly synchronous operation to complete. As always, clearly document what your code is doing.

Some libraries, such as those in the GLib stack, use a consistent pattern for asynchronous I/O for all of its functions involving the GAsyncResult / GSimpleAsyncResult, GAsyncReadyCallback and GCancellable types - this makes it a lot easier both for programmers and for higher-level language bindings especially since important things like life-cycles are part of this model (for example, you are guaranteed that the callback will always happen, even on cancellation, timeout or error).

Checklist

  • Clearly document if a function does any synchronous I/O
  • Ideally suffix synchronous functions it with _sync() so it's easy to inspect large code-trees using e.g. grep(1) 
  • Consider if an operation needs to be available in synchronous or asychronous form or both.
  • Point to both synchronous and asynchronous functions in your API documentation.
  • If possible, use an established model (such as the GIO model) for I/O instead of rolling your own

Monday, June 27, 2011

Writing a C library, part 1

This is part one in a series of blog-posts about best practices for writing C libraries.

Base libraries

Since libc is a fairly low-level set of libraries, there exists higher-level libraries to make C programming a more pleasant experience including libraries in the GLib and and GTK+ stack. Even while the following is going to be somewhat GLib- and GTK+-centric, these notes are written to be useful for any C code whether it's based on libc, GLib or other libraries such as NSPR, APR or some of the Samba libraries.

Most programmers would agree that it's usually a bad idea to implement basic data-types such as string handling, memory allocation, lists, arrays, hash-tables or queues yourself just because you can - it only makes code harder to read and harder to maintain by others. This is where C libraries such as GLib and GTK+ come into play - these libraries provides much of this out of the box. Plus, when you end up needing non-trivial utility functions (and chances are you will) for, say, Unicode manipulation, rendering complex scripts, D-Bus support or calculating checksums, ask yourself (or worse: wait until your manager or peers ask you) if the decision to avoid a well-tested and well-maintained library was a good decision.

In particular, for things like cryptography, it is usually a bad idea to implement it yourself (however inventing your own algorithm is worse); instead, it is better to use an existing well-tested library such as NSS (and even if you do, be careful of using the library correctly). Specifically, said library may even be FIPS-140 certified which is a requirement if you want to do business with the US government.

Similarly, while it’s more efficient to use e.g. epoll than poll for event notification, maybe it doesn't matter if your application or library is only handling on the order of ten file descriptors. On the other hand, if you know that you are going to handle thousands of file descriptors, you can still use e.g. GLib for the bulk of your library or application - just use epoll from dedicated threads. Ditto, if you need O(1) removal from a list, maybe don’t use a GList - use an embedded list instead.

Above all, no matter what libraries or code you end up using, make sure you have at least a rudimentary understanding of the relevant data-types, concepts and implementation details. For example, with GLib it is very easy to use high-level constructs such as GHashTable, g_timeout_add() or g_file_set_contents() without knowing how things are implemented or what a file descriptor really is. For example, when saving data, you want to do so atomically (to avoid data-loss) and just knowing that g_file_set_contents() Does The Right Thing(tm) is often enough (often just reading the API docs will tell you what you need to know). Additionally make sure you understand both the algorithmic complexity of the data-types you end up using and how they work on modern hardware.

Finally, try not to get caught up in religious discussions about “bloated” libraries with random people on the Internet - it’s usually not a good a use of time and resources.

Checklist

  • Don’t reinvent basic data-types (unless performance is a concern).
  • Don’t avoid standard libraries just because they are portable.
  • Be wary of using multiple libraries with overlapping functionality.
  • To the extent where it’s possible, keep library usage as a private implementation detail.
  • Use the right tool for the right job - don’t waste time on religious discussions.

Library initialization and shutdown

Some libraries requires that an function, typically called foo_init(), is called before other functions in the library is called - this function typically initializes global variables and data structures used by the library. Additionally, libraries may also offer a shutdown function, typically called foo_shutdown() (forms such as foo_cleanup(), foo_fini(), foo_exit() and the grammatically dubious foo_deinit() have also been observed in the wild), to release all resources used by the library. The main reason for having a shutdown() function is to play nicer with Valgrind (for finding memory leaks) or to release all resources when using dlopen() and friends.

In general, library initialization and shutdown routines should be avoided since they might cause interference between two unrelated libraries in the dependency chain of an application; e.g. if you don’t call them from where they are used, you are possible forcing the application to call a init() function in main(), just because some library deep down in the dependency chain is using the library without initializing it.

However, without a library initialization routine, every function in the library would have to call the (internal) initialization routine which is not always practical and may also be a performance concern. In reality, the check only has to be done in a couple of functions since most functions in a library depends on an object or struct obtained from e.g. other functions in the library. So in reality, the check only has to be done in _new() functions and functions not operating on an object.

For example, every program using the GLib type system has to call g_type_init() and this includes libraries based on libgobject-2.0 such as libpolkit-gobject-1 - e.g. if you don’t call g_type_init() prior to calling polkit_authority_get_sync() then your program will probably segfault. Naturally this is something most people new to the GLib stack gets wrong and you can’t really blame them - if anything, g_type_init() is a great poster-child of why init() functions should be avoided if possible.

One reason for library initialization routine has to do with library configuration, either app-specific configuration (e.g. the application using the library might want to force a specific behavior) or end-user specific (by manipulating argc and argv) - for example, see gtk_init(). The best solution to this problem is of course to avoid configuration, but in the cases where it’s not possible it is often better to use e.g. environment variables to control behavior - see e.g. the environment variables supported by libgtk-3.0 and the environment variables supported by libgio-2.0 for examples.

If your library does have an initialization routine, do make sure that it is idempotent and thread-safe, e.g. that it can be called multiple times and from multiple threads at the same time. If your library also has a shutdown routine, make sure that some kind of “initialization count” is used so the library is only shutdown once all users of it have called its shutdown() routine. Also, if possible, ensure that your library init/shutdown routines calls the init/shutdown routines for libraries that it depends on.

Often, a library's init() and shutdown() functions can be removed by introducing a context object - this also fixes the problem of global state (which is undesirable and often break multiple library users in the same process), locking (which can then be per context instance) and callbacks / notification (which can call back / post events to separate threads). For example, see libudev's struct udev_monitor.

Checklist

  • Avoid init() / shutdown() routines - if you can’t avoid them, do make sure they are idempotent, thread-safe and reference-counted.
  • Use environment variables for library initialization parameters, not argc and argv.
  • You can easily have two unrelated library users in the same process - often without the main application knowing about the library at all. Make sure your library can handle that.
  • Avoid unsafe API like atexit(3) and, if portability is a concern, unportable constructs like library constructors and destructors (e.g. gcc’s __attribute__ ((constructor)) and __attribute__ ((destructor))).

Memory management

It is good practice to provide a matching free() function or each kind of allocated object that your API returns. If your library uses reference counting, it is often more appropriate to use the suffix _unref instead of _free. An example of this in the GLib/GTK+ stack the functions used are g_object_new(), g_object_ref() and g_object_unref() that operate on instances of the GObject type (including derived types). Similarly, for the GtkTextIter type, the relevant functions are gtk_text_iter_copy() and gtk_text_iter_free(). Also, note that some objects may be stack-allocated (such as GtkTextIter) while others (such as GObject) can only be heap-allocated.

Note that some object-oriented libraries with the concept of derived types may require the app to use the unref() method from a base type - for example, an instance of a GtkButton must be released with g_object_unref() because GtkButton is also a GObject. Additionally, some libraries have the concept of floating references (see e.g. GInitiallyUnowned, GtkWidget and GVariant) - this can make it more more convenient to use the type system from C since it e.g. allows using the g_variant_new() constructor in place of a parameter like in the example code for g_dbus_proxy_call_sync() without leaking any references.

Unless it’s self-evident, all functions should have documentation explaining how parameters are managed. It is often a good idea to try to force some kind of consistency on the API. For example, in the GLib stack the general rule is that the caller owns parameters passed to a function (so the function need to take a reference or make a copy if the parameter is used after the function returns) and that the callee owns the returned parameters (so the caller needs to make a copy or increase the reference count) unless the function can be called from multiple threads (in which case the caller needs to free the returned object).

Note that thread-safety often dictates what the API looks like - for example, for a thread-safe object pool, the lookup() function (returning an object) must return a reference (that the caller must unref()) because the returned object could be removed from another thread just after lookup() returns - one such example is g_dbus_object_manager_get_object().

If you implement reference counting for an object or struct, make sure it is using atomic operations or otherwise protect the reference count from being modified simultaneously by multiple threads.

If a function is returning a pointer to memory that the caller isn’t supposed to free or unref, it is often necessary to document for how long the pointer is valid - for example the documentation for the getenv() C library function says “The string pointed to by the return value of getenv() may be statically allocated, and can be modified by a subsequent call to getenv(), putenv(3), setenv(3), or unsetenv(3).”. This is useful information because it shows that care should be taken if the result from getenv() is used by multiple threads; also this kind of API can never work in a multi-threaded application and the only reason it works is that applications or libraries normally don’t modify the environment.

It is often advantageous for an application to not worry about out-of-memory conditions and instead just call abort() if the underlying allocator signals an out-of-memory condition. This holds true for most libraries as well since it allows a simpler and better API and huge code-footprint reductions. If you do decide to worry about OOM in your library, do make sure that you test all code-paths or your effort will very likely have been in vain. On the other hand, if you know your library is going to be used in e.g. process 1 (the init process) or other critical processes, then not handling OOM is not an option.

Checklist

  • Provide a free() or unref() function for each type your library introduces.
  • Ensure that memory handling consistent across your library.
  • Note that multi-threading may impose certain kinds of API.
  • Make sure the documentation is clear on how memory is managed.
  • Abort on OOM unless there are very good reasons for handling OOM.

Multiple Threads and Processes


A library should clearly document if and how it can be used from multiple threads. There are often multiple levels of thread-safety involved - if the library has a concept of objects and a pool of objects (as most libraries do), the enumeration and management of the pool might be thread safe while applications are supposed to provide their own locking when operating on a single object from multiple threads, concurrently.

If you are providing a function performing synchronous I/O, it is often a good idea to make it thread-safe so an application can safely use it from a helper thread

If your library is using threads internally, be wary of manipulating process-wide state, such as the current directory, locale, etc. Doing so from your private worker thread will have unexpected consequences for the application using your library.

A library should always use thread-safe functions (e.g. getpwnam_r() rather than getpwnam()) and avoid libraries and code that is not thread-safe. If you can’t do this, clearly state that your library isn’t thread-safe so applications can use it from a dedicated helper process instead if they need thread-safety.

It is also important to document if your library is using threads internally, e.g. for a pool of worker threads. Even though you think of the thread as a private implementation detail, its existence can affect users of your library; e.g. Unix signals might need to be handled differently in the the presence of threads, and there are extra complications when forking a threaded application.

If your library has interfaces involving resources that can be inherited over fork(), such as file descriptors, locks, memory obtained from mmap(), etc, you should try to establish a clear policy for how an application can use your library before/after a fork. Often, the simplest policy is the best: start using nontrivial libraries only after the fork, or offer a way to reinitialize the library in the forked process. For file descriptors, using FD_CLOEXEC is a good idea. In reality most libraries have undefined behavior after the fork() call, so the only safe thing to do is to call the exec() function.

Checklist

  • Document if and how the library can be used from multiple threads.
  • Document what steps need to be taken after fork() or if the library is now unusable.
  • Document if the library is creating private worker threads.