The Axioms of linking

July 30, 2019

How did things get this bad? The Axioms of linking

Every few months I end up in the a shady corner of the internet, with twenty tabs open searching for variations on ‘why the $%^&!@ does this not work’. Hours later I end up re-discovering the truth: using binary libraries in $CURRENT_YEAR still pretty much sucks.

So let’s try and get something positive out of this, and at least write up the facts of the situation. Here’s to hoping someone comes along and shows me I missed that “one weird trick” that will make all of this make sense.

So here we are, in no particular order, a bunch of hard-learned facts about using libraries (on unix-flavored operating systems)

Dynamic or shared libraries are loaded up by your program at runtime. They contain lookup tables that map symbols to shared executable code. If you give someone a binary that links a dynamic library that they don’t already have, the OS will complain about missing libraries when they try to run it.
Dynamic or “shared” libraries have names that start with lib and finish with .so. Unless you’re on a Mac, where they end with .dylib.¹
Dynamic libraries themselves can link other dynamic libraries. These are known as transitive dependencies. All dependencies will need to be found to successfully run your binary.
If you want to move a binary from one machine (where it was compiled) to another, you’ll almost certainly find that at least some of the shared libraries needed by your binary are no longer found. This is usually the first sign of trouble…
Linux knows how to find libraries because it has a list of known locations for shared libraries in /etc/ld.so.conf. Each time you run ldconfig, the OS updates its cache of known libraries by going through directories in this file and reading the libraries it finds. OS X works differently… see dyld and friends.
Use ldd (linux) or otool -L (OS X) to query your binary for the missing libraries. Beware that it is not safe to do this on a binary you suspect may be malicious 😞.
You can safely copy dynamic libraries from one machine to another. As long as the environments are similar enough…² . In a perfect world (on linux), you could just copy the library you want to use into /usr/local/lib (the recommended place for unstable libraries) and then run ldconfig to make your OS reload its library cache.
Of course, on OS X things work totally differently. Dynamic libraries have an install name which contains the absolute path. This path is baked into your binary at compile time. You can use install_name_tool to change it. Good luck!
On linux, Adding libraries to /usr/local/lib makes them visible to everything, so you may want to copy your library somewhere else so that only your binary knows how to find it. One way to do this is using rpath…
You can set the rpath attribute of your binary to contain a directory hint for your OS to look in for libraries. This hint can be relative to your binary. This is especially useful if you always ship libraries in a relative directory to your binary. You can use @origin as a placeholder for the path of the binary itself, so an rpath of @origin/lib causes the OS to always look in <path to your binary>/lib for shared libraries at runtime. This can be used on both OS X and linux, and is one of the most useful tools to actually getting things working in practice.
If your OS isn’t finding a dynamic library that you know exists, you can try helping your OS by setting the environment variable LD_LIBRARY_PATH to the directory containing it - your OS will look there first before default system paths. Beware, this is considered bad practice, but it might unblock you at a pinch. OS X has DYLD_LIBRARY_PATH, which is similar, and also DYLD_FALLBACK_LIBRARY_PATH, which is similar, but different (sorry).
Dynamic libraries also have a thing called a soname, which is the name of the library, plus version information. You have seen this if you’ve seen libfoo.so.3.1 or similar. This allows us to use different versions of the same library on the same OS, and to make non backwards-compatible changes to libraries. The soname is also baked into the library itself.
Often, your OS will have multiple symlinks to a single library in the same directory, just with different paths containing version information, e.g. libfoo.so.3, libfoo.so.3.1. This is to allow programs to find compatible libraries with slightly different versions. Everything starts to get rather messy here… if you really need to get into the weeds, this article will help. You probably only need to understand this if you are distributing libraries to users and need to support compatibility across versions.
Of course, even if your binary only depends on a single symbol in a dynamic library, it must still link that library. Now consider that the dependency itself may also link other unused transitive dependencies. Accidentally “catching a dependency” can cause your list of shared library dependencies to grow out of control, so that your simple hello world binary ends up depending on hundreds of megabytes of totally unused shared libraries 😞.
One solution to avoiding “dependency explosions” is to statically link symbols directly into your binary, so let’s start to look at static linking!
Static libraries (.a files) contain symbol lookup table, similarly to dynamic libraries. However, they are much more dumb and also a total PITA to use correctly.
If you compile your binary and link in only static dependencies, you will end up with a static binary. This binary will not need to load any dependencies at runtime and thus much easier to share with others!
People On The Internet will recommend that you do not not distribute static binaries, because it makes it hard to patch security flaws. With dynamic libraries, you just have to patch a single library e.g. libssl.so, instead of re-compiling everything on your machine that may have linked the broken library without your knowledge (i.e. everything).
People who build production systems at companies recommend static libraries because it’s wayyyy the hell easier to just deploy a single binary with zero dependencies that can basically run anywhere. No one cares about how big binaries are these days anyway.
Still more people on the internet remind you that only one copy of a dynamic library is loaded into memory by the OS even when it is used by multiple processes, saving on memory pressure.
The static library people remind you that modern computers have plenty of memory and library size is hardly the thing killing us right now.
The OS X people point out that OS X strongly discourages the use of statically linked binaries.
Static libraries can’t declare any kinds of library dependencies. This means it is your responsibility to ensure all symbols are all baked correctly into your binary at link time - otherwise your linker will fail. This can make linking static libraries painfully error-prone.
If you get symbol not found errors but literally swear that you linked every damn thing, you probably linked a static library, and forgot a transitive dependency that is needed by it. This pretty much sucks as it’s basically impossible to figure out where that library comes from. Try having a guess by looking at the error messages. Or something?
Oh, and you must ensure that you link your static libraries in the correct order, otherwise you can still get symbol not found errors.
If you are starting to think it might be hard to keep track of static libraries, you are following along correctly. There are tools that can help you here, such as pkgconfig, CMake, autotools… or bazel. It’s quite easy to get going, and achieve deterministic platform-independent static builds with no dynamic dependencies… Said no one ever 😓.
One classic way to screw up, is to compile a static library without using the -fPIC flag (for “position independent code”). If you do not do this, you will be able to use the static library in a binary, but you will not be able to link it into a dynamic library. This is especially frustrating if you were provided with a static library that was compiled without this flag and you can’t easily recompile it.
Beware that -fpic is not the same as -fPIC. Apparently, -fPIC always works but may result in a few nanoseconds of slowdown, or something. Probably you should use -fPIC and try not to think about it too much.
Your compiler toolchain (e.g. CMake) usually has a one-liner way to link a bunch of static libraries into a single dynamic library with no dependencies of its own. However, should you want to link a bunch of static libraries into another static library… well I’ve never successfully found a reliable way to do this 😞. Why do this you may ask? Mostly for cffi - when I want to build a single static library from C++ and then link it into e.g. a go binary.
Beware that your compiler/linker is not smart! Just because the header files declare a function and your linker manages to find symbols for it in your library, doesn’t mean that the function is remotely the same. You will discover this when you get undefined behavior at runtime.
Oh, and if the library you are linking was compiled with a #define switch set, but when you include the library’s headers, you do not set the define to the same value, welcome again to runtime undefined behavior land! This is the same problem as the one above, where the symbols end up being incompatible.
If you are trying to ship C++, another thing that can bite you is that the C++ standard library uses dynamic linking. This means that even the most basic hello world program cannot be distributed to others unless they have a compatible version of libstdc++. Very often you’ll end up compiling with a shiny new version of this library, only to find that your target is using an older, incompatible version.
One way to get around libstdc++ problems is to statically link it into your binary. However, if you create a static library that statically links libstdc++, and your library uses C++ types in its public interface… welcome again to undefined behavior land ☠️.
Another piece of classic advice is to statically link everything in your binary apart from core system libraries, such as glibc - which is basically a thin wrapper around syscalls. A practical goal I usually aim for is to statically link everything apart from libc and (preferably an older version of) libstdc++. This seems to be the safest approach.
Ultimately, my rule of thumb for building distributed systems is to statically link everything apart from libc and (an older version of) libstdc++. You can then put this library / binary into a Debian package, or an extremely lightweight Docker container that will run virtually anywhere. Setting up the static linking is a pain, but IMO worth the effort - the main benefits of dynamic libraries generally do not apply anymore when you are putting the binary in a container anyway.
Finally, for ultimate peace of mind, use a language that has a less insane build toolchain than C++. For example, Go builds everything statically by default and can link in both dynamic or static libraries if needed, using cgo. Rust also seems to work this way. Static binaries have started becoming fashionable!

Windows has .dll or something, sorry, you’re own with that one. This doc is all about unix-flavored operating systems 🙂. ↩︎
There are three main things that get you into trouble: operating system compatibility (e.g. Linux vs OS X), instruction architecture, e.g. arm vs amd, and special instructions such as SSE. ↩︎