This article is part number 6 of the Header files series.


I don’t like header files and, especially, I don’t like having to use them to implement modules. When you have used a language that sports real modules—pretty much anything other than C or sh—, header files feel primitive. Well, they actually are primitive.

But what do I mean by “real modules”?

Modules!

In a language that supports modules, a module is a collection of related data types and code exposed via a well-defined set of symbols encapsulated in a single container (name). In other words: the effect of importing a module is simply getting a single reference to that module.

For example, take a look at the following Python session:

>>> dir()
['__builtins__', '__doc__', '__name__', '__package__']
>>> import collections
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 'collections']
>>> dir(collections)
['Callable', 'Container', 'Counter', 'Hashable', ...]

Nothing surprising. What I’d like you to notice is the simple effect that importing a module had: all that happened was that the module was brought into the global namespace, but only the module’s name was exposed. This is regardless of any dependencies the module may have internally. In order to use anything from the module, we actually have to access the module.

Leaky dependencies

Compare the above with the inclusion of header files. When you include a header file, it is very likely for your namespace to be “polluted” by a bunch of things you did not ask for. This comes from the fact that header files often need to include other header files: and because header files are not modules, anything included by the header file becomes visible in the caller site as well.

In the case of C++, you may think that using namespaces fixes this situation… but in reality they do not. Namespaces are only a way to prevent collisions of symbols between vendors. They do nothing to prevent leaky inclusions in header files.

Consider, for example, a header file called text-utils.h with the following contents:

#if !defined(MYPROJECT_TEXT_UTILS_H)
#define MYPROJECT_TEXT_UTILS_H

#include <stdio.h>

int read_line(FILE*, char**);

#endif /* MYPROJECT_TEXT_UTILS_H */

This little header file has to include stdio.h because it needs FILE* as part of a prototype. There is no way around this (other than going completely type-unsafe and using void*): FILE is a type defined in the standard library so you should not attempt to redefine it yourself (e.g. by hand-crafting a forward declaration) in your code.

Whenever you want to use read_line, you will have to include your text-utils.h file that supposedly represents your module… but doing so will (unexpectedly) pull in stdio.h. A nasty side-effect: now your new module has access to all sorts of unexpected symbols aside from FILE, like fopen or even printf, when it did not ask for them.

But what’s the problem here? If you end up using all these symbols you did not ask for, and you forget to add an explicit inclusion of the corresponding header file, your code will break if the internal implementation of text-utils.h changes and gets rid of the inclusion of stdio.h!

There are some ways to prevent this, but none of them are pretty.

If you look at the standard system header files, you will be horrified to see all the tricks they have to go through just to ensure that importing a standard C header file does not leak public symbols from other standard headers. The way this is done involves having to define everything as private in “internal” header files and then making the user-facing header files redefine those internal names as public… but there are several caveats as you will notice if you take a look. (Also note that not all standard libraries are “clean” in this sense; latest versions of glibc are particularly good at this.)

Another mechanism is to use the pimpl idiom in your structures and classes. By doing this, you will (sometimes) reduce the amount of dependencies needed in the header file and you can move those dependencies to the implementation module. As a result, you will have better-defined “modules” and faster build times.

Be preemptive!

Try to minimize the amount of header files you include in a header file. Every time you see the need for an inclusion, think if there is another way to achieve your goal. Usually, putting that inclusion in the corresponding .c file is a better choice — assuming the public API does not need that header file for anything.

Don’t wait until it’s too late to discover that your code relies on indirect dependencies pulled in by some unrelated header file. Look for these eagerly and plug the leaks. The include-what-you-use tool will help you do this… although in my opinion is an aberration (see how large it is) that exists only because of the lack of proper abstractions in C and C++.

And, to conclude: I hear that there are proposals on the table to equip C++ with a modules system but I don’t know when/if they will become true. (And I’m a little scared of the thought that they may, just because of how they may have to work for backwards-compatibility reasons!)

Comments from the original Blogger-hosted post: