Friday, July 1, 2011

Writing a C library, part 4

This is part four in a series of blog-posts about best practices for writing C libraries. Previous installments: part one, part two, part three.

Helpers and daemons

Occasionally it's useful for a program or library to call upon an external process to do its bidding. There are many reasons for doing this - for example, the code you want to use
  • might not be easily used from C - it could be written in say, python or, gosh, bash; or
  • could mess with signal handlers or other global process state; or
  • is not thread-safe or leaking or just bloated; or
  • its error handling is incompatible with how your library does things; or
  • the code needs elevated privileges; or
  • you have a bad feeling about the library but it's not worth (or (politically) feasible) to re-implement the functionality yourself. 
There are three main ways of doing this.

The first one is to just call fork(2) and start using the new library in the child process - this usually doesn't work because chances are that you are already using libraries that cannot be reliably used after the fork() call as discussed in previously (additionally, a lot of unnecessary COW might be happen if the parent process has a lot of writable pages mapped). If portability to Windows is a concern, this is also a non-starter as Windows does not have fork() or any meaningful equivalent that is as efficient.

The second way is to write a small helper program and distribute the helper along with your library. This also uses fork() but the difference is that one of the exec(3) functions is called immediately in the child process so all previous process state is cleaned up when the process image is replaced (except for file descriptors as they are inherited across exec() so be wary of undesired leaks). If using GLib, there's a couple of (portable) useful utility functions to do this (including support for automatically closing file descriptors).

The third way is to have your process communicate with a long-lived helper process (a socalled daemon or background process). The helper daemon can be launched either by dbus-daemon(1) (if you are using D-Bus as the IPC mechanism), systemd if you are using e.g. Unix domain sockets, an init script (uuidd(8) used to do this - wasteful if your library is not going to get used) or by the library itself.

Helper daemons usually serve multiple instances of library users, however it is sometimes desirable to have a helper daemon instance per library user instance. Note that having a library spawn a long-lived process by itself is usually a bad idea because the environment and other inherited process state might be wrong (or even insecure) - see Rethinking PID 1 for more details on why a good, known, minimal and secure working environment is desirable. Another thing that is horribly difficult to get right (or, rather, horribly easy to get wrong) is uniqueness - e.g. you want at most one instance of your helper daemon - see Colin's notes for details and how D-Bus can be used and note that things like GApplication has built-in support for uniqueness. Also, in a system-level daemon, note that you might need to set things like the loginuid (example of how to do this) so things like auditing work when rendering service for a client (this is related to the Windows concept known as impersonation).

As an example, GLib's libproxy-based GProxy implementation uses a helper daemon because dealing with proxy servers involves a interpreting JavaScript (!) and initializing a JS interpreter from every process wanting to make a connection is too much overhead not to mention the pollution caused (source, D-Bus activation file - also note how the helper daemon is activated by simply creating a D-Bus proxy).

If the helper needs to run with elevated privileges, a framework like PolicyKit is convenient to use (for checking whether the process using your library is authorized) since it nicely integrates with the desktop shell (and also console/ssh logins). If your library is just using a short-lived helper program, it's even simpler: just use the pkexec(1) command to launch your helper (example, policy file).

As an aside (since this write-up is about C libraries, not software architecture), many subsystems in today's Linux desktop are implemented as a system-level daemons (often running privileged) with the primary API being a D-Bus API (example) and a C library to access the functionality either not existing at all (applications then use generic D-Bus libraries or tools like gdbus(1) or dbus-send(1)) or mostly generated from the IDL-like D-Bus XML definition files (example). It's useful to contrast this approach to libraries using helpers since one is more or less upside down compared to the other.

Checklist

  • Identify when a helper program or helper daemon is needed
  • If possible, use D-Bus (or similar) for activation / uniqueness of helper daemons.
  • Communicating with a helper via the D-Bus protocol (instead of using a custom binary protocol) adds a layer of safety because message contents are checked.
  • Using D-Bus through a message bus router (instead of peer-to-peer connections) adds yet another layer of safety since the two processes are connected through an intermediate router process (a dbus-daemon(1) instance) which will also validate messages and disconnects processes sending garbage.
  • Hence, if the helper is privileged (meaning that it must a) treat the unprivileged application/library using it as untrusted and potentially compromised; and b) validate all data to it - see Wheeler's Secure Programming notes for details), activating a helper daemon on the D-Bus system bus is often a better idea than using a setuid root helper program spawned yourself.
  • If possible, in particular if you are writing code that is used on the Linux desktop, use PolicyKit (or similar) in privileged code to check if unprivileged code is authorized to carry out the requested operation.

Testing

A sign of maturity is when a library or application comes with a test suite; a good test suite is also incredible useful for ensuring mostly bug-free releases and, more importantly, ensuring that the maintainer is comfortable putting releases out without loosing too much sleep or sanity. Discussing specifics of testing is out of the scope for a series on writing C libraries, but it's worth pointing to the GLib test framework, how it's used (example, example and example) and how this is used by e.g. the GNOME buildbots.

One metric for measuring how good a test suite is (or at least how extensive it is), is determining how much of the code it covers - for this, the gcov tool can be used - see notes on how this is used in D-Bus. Specifically, if the test suite does not cover some edge case, the code paths for handling said edge case will appear as never being executed. Or if the code base handles OOM but the test suite isn't set up to handle it (for example, by failing each allocation) the code-paths for handling OOM should appear as untested.

Innovative approaches to testing can often help - for example, Mozilla employ a technique known as reftests (see also: notes on GTK+ reftests) while the Dracut test suite employs VMs for both client and server to test that booting from iSCSI work.

Checklist

  • Start writing a test suite as early as possible.
  • Use tools like gcov to ascertain how good the test suite is.
  • Run the test suite often - ideally integrate it into the build system ('make check'), release procedures, version control etc.

3 comments:

  1. Regarding testing, g_test is very useful, but I think it'd be great to have some better documentation on how to integrate it with your project, e.g. with 'make check'

    glib shows how to do it (using Makefile.decl etc.), but it's not trivial, and esp. not something most people would come up with by themselves.

    ReplyDelete
  2. Yeah, the state of build systems (and, as a generalization: IDE, RAD etc.) is kinda of a mess on (at least) Linux. I was thinking about covering it in my series... but since there really is no good guidance except for "copy/paste whatever autofoo other projects are using" it's probably just going to be a bullet-point in the list of items not covered with that guidance. Answers on a post-card....

    ReplyDelete
  3. > But since there really is no good guidance
    > except for "copy/paste whatever autofoo other
    > projects are using" it's probably just going to
    > be a bullet-point in the list of items not
    > covered with that guidance. Answers on a post-card...

    I think the best guidance is to use CMake!

    It is cross-platform (to all the platforms that anyone has used in the last 20 years), much MUCH easier than automake, and much faster!

    And yes, I have used automake, cons, scons, jam, and even imake. CMake is the best.

    ReplyDelete