1 This example shows the details of linking a simple program from three
2 source files. There are three ways to link: directly from object
3 files, statically from static libraries, or dynamically from shared
4 libraries. If you're following along in my [[example
5 source|linking.tar.gz]], you can compile the three flavors of the
6 `hello_world` program with:
10 And then run them with:
17 Here's the general compilation process:
19 1. Write code in a human-readable language (C, C++, …).
20 2. Compile the code to [object files][object-files] (`*.o`) using a
21 compiler (`gcc`, `g++`, …).
22 3. Link the code into executables or libraries using a [linker][]
23 (`ld`, `gcc`, `g++`, …).
25 Object files are binary files containing [machine code][machine-code]
26 versions of the human-readable code, along with some bookkeeping
27 information for the linker (relocation information, stack unwinding
28 information, program symbols, …). The machine code is specific to a
29 particular processor architecture (e.g. [x86-64][]).
31 Linking files resolves references to symbols defined in translation
32 units, because a single object file will rarely (never?) contain
33 definitions for all the symbols it requires. It's easy to get
34 confused about the difference between compiling and linking, because
35 you often use the same program (e.g. `gcc`) for both steps. In
36 reality, `gcc` is performing the compilation on its own, but is
37 using external utilities like `ld` for the linking. To see this in
38 action, add the `-v` (verbose) option to your `gcc` (or `g++`)
39 calls. You can do this for all the rules in the `Makefile` with:
41 make CC="gcc -v" CXX="g++ -v"
43 On my system, that shows `g++` using `/lib64/ld-linux-x86-64.so.2`
44 for dynamic linking. On my system, C++ seems to require at least some
45 dynamic linkning, but a simple C program like `simple.c` can be
46 linked statically. For static linking, `gcc` uses `collect2`.
48 Symbols in object files
49 =======================
51 Sometimes you'll want to take a look at the symbols exported and
52 imported by your code, since there can be [subtle bugs][bugs] if you
53 link two sets of code that use the same symbol for different purposes.
54 You can use `nm` to inspect the intermediate object files. I've saved
55 the command line in the `Makefile`:
57 $ make inspect-object-files
58 nm -Pg hello_world.o print_hello_world.o hello_world_string.o
60 _Z17print_hello_worldv U
61 main T 0000000000000000 0000000000000010
63 _Z17print_hello_worldv T 0000000000000000 0000000000000027
65 _ZNSt8ios_base4InitC1Ev U
66 _ZNSt8ios_base4InitD1Ev U
68 _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ U
69 _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc U
74 hello_world_string R 0000000000000010 0000000000000008
76 The output format for `nm` is described in its man page. With the
77 `-g` option, output is restricted to globally visible symbols. With
78 the `-P` option, each symbol line is:
80 <symbol> <type> <offset-in-hex> <size-in-hex>
82 For example, we see that `hello_world.o` defines a global text
83 symbol `main` with at position 0 with a size of 0x10. This is where
84 the loader will start execution.
86 We also see that `hello_world.o` needs (i.e. “has an undefineed symbol
87 for”) `_Z17print_hello_worldv`. This means that, in order to run,
88 `hello_world.o` must be linked against something else which provides
89 that symbol. The symbol is for our `print_hello_world` function. The
90 `_Z17` prefix and `v` postfix are a result of [name
91 mangling][mangling], and depend on the compiler used and function
92 signature. Moving on, we see that `print_hello_world.o` defines the
93 `_Z17print_hello_worldv` at position 0 with a size of 0x27. So
94 linking `print_hello_world.o` with `hello_world.o` would resolve the
95 symbols needed by `hello_world.o`.
97 `print hello_world.o` has undefined symbols of its own, so we can't
98 stop yet. It needs `hello_world_string` (provided by
99 `hello_world_string.o`), `_ZSt4cout` (provided by `libcstd++`),
102 The process of linking involves bundling up enough of these partial
103 code chunks so that each of them has access to the symbols it needs.
105 There are a number of other tools that will let you poke into the
106 innards of object files. If `nm` doesn't scratch your itch, you may
107 want to look at the more general `objdump`.
112 In the previous section I mentioned “globally visible symbols”. When
113 you declare or define a symbol (variable, function, …), you can use
114 [storage classes][storage-classes] to tell the compiler about your
115 symbols' *linkage* and *storage duration*.
117 For more details, you can read through *§6.2.2 Linkages of
118 identifiers*, *§6.2.4 Storage durations of objects*, and *§6.7.1
119 Storage-class specifiers* in [WG14/N1570][N1570], the last public
120 version of [ISO/IEC 9899:2011][9899] (i.e. the C11 standard).
122 Since we're just worried about linking, I'll leave the discussion of
123 storage duration to others. With linkage, you're basically deciding
124 which of the symbols you define in your translation unit should be
125 visible from other translation units. For example, in
126 `print_hello_world.h`, we declare that there is a function
127 `print_hello_world` (with a particular signature). The `extern`
128 means that may be defined in another translation unit. For
129 block-level symbols (i.e. things defined in the root level of your
130 source file, not inside functions and the like), this is the default;
131 writing `extern` just makes it explicit. When we define the
132 function in `print_hello_world.cpp`, we also label it as `extern`
133 (again, this is the default). This means that the defined symbol
134 should be exported for use by other translation units.
136 By way of comparison, the string `secret_string` defined in
137 `hello_world_string.cpp` is declared `static`. This means that
138 the symbol should be restricted to that translation unit. In other
139 words, you won't be able to access the value of `secret_string` from
140 `print_hello_world.cpp`.
142 When you're writing a library, it is best to make any functions that
143 you don't *need* to export `static` and to [avoid global variables
149 You never want to code *everything* required by a program on your own.
150 Because of this, people package related groups of functions into
151 libraries. Programs can then take use functions from the library, and
152 avoid coding that functionality themselves. For example, you could
153 consider `print_hello_world.o` and `hello_world_string.o` to be
154 little libraries used by `hello_world.o`. Because the two object
155 files are so tightly linked, it would be convenient to bundle them
156 together in a single file. This is what static libraries are, bundles
157 of object files. You can create them using `ar` (from “archive”;
158 `ar` is the ancestor of `tar`, from “tape archive”).
160 You can use `nm` to list the symbols for static libraries exactly as
161 you would for object files:
163 $ make inspect-static-library
164 nm -Pg libhello_world.a
165 libhello_world.a[print_hello_world.o]:
166 _Z17print_hello_worldv T 0000000000000000 0000000000000027
168 _ZNSt8ios_base4InitC1Ev U
169 _ZNSt8ios_base4InitD1Ev U
171 _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ U
172 _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc U
176 libhello_world.a[hello_world_string.o]:
177 hello_world_string R 0000000000000010 0000000000000008
179 Notice that nothing has changed from the object file output, except
180 that object file names like `print_hello_world.o` have been replaced
181 by `libhello_world.a[print_hello_world.o]`.
186 Library code from static libraries (and object files) is built into
187 your executable at link time. This means that when the library is
188 updated in the future (bug fixes, extended functionality, …), you'll
189 have to relink your program to take advantage of the new features.
190 Because no body wants to recompile an entire system when someone makes
191 `cout` a bit more efficient, people developed shared libraries. The
192 code from shared libraries is never built into your executable.
193 Instead, instructions on how to find the dynamic libraries are built
194 in. When you run your executable, a loader finds all the shared
195 libraries your program needs and copies the parts you need from the
196 libraries into your program's memory. This means that when a system
197 programmer improves `cout`, your program will use the new version
198 automatically. This is a Good Thing™.
200 You can use `ldd` to list the shared libraries your program needs:
202 $ make list-executable-shared-libraries
204 linux-vdso.so.1 => (0x00007fff76fbb000)
205 libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/libstdc++.so.6 (0x00007ff7467d8000)
206 libm.so.6 => /lib64/libm.so.6 (0x00007ff746555000)
207 libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ff74633e000)
208 libc.so.6 => /lib64/libc.so.6 (0x00007ff745fb2000)
209 /lib64/ld-linux-x86-64.so.2 (0x00007ff746ae7000)
213 soname => path (load address)
215 You can also use `nm` to list symbols for shared libraries:
217 $ make inspect-shared-libary | head
218 nm -Pg --dynamic libhello_world.so
219 _Jv_RegisterClasses w
220 _Z17print_hello_worldv T 000000000000098c 0000000000000034
222 _ZNSt8ios_base4InitC1Ev U
223 _ZNSt8ios_base4InitD1Ev U
225 _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ U
226 _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc U
227 __bss_start A 0000000000201030
231 _edata A 0000000000201030
232 _end A 0000000000201048
233 _fini T 0000000000000a58
234 _init T 0000000000000810
235 hello_world_string D 0000000000200dc8 0000000000000008
237 You can see our `hello_world_string` and `_Z17print_hello_worldv`,
238 along with the undefined symbols like `_ZSt4cout` that our code
239 needs. There are also a number of symbols to help with the shared
240 library mechanics (e.g. `_init`).
242 To illustrate the “link time” vs. “load time” distinction, run:
249 LD_LIBRARY_PATH=. ./hello_world-dynamic
252 Then switch to the `Goodbye` definition in
253 `hello_world_string.cpp`:
255 //extern const char * const hello_world_string = "Hello, World!";
256 extern const char * const hello_world_string = "Goodbye!";
258 Recompile the libraries (but not the executables) and run again:
267 LD_LIBRARY_PATH=. ./hello_world-dynamic
270 Finally, relink the executables and run again:
279 LD_LIBRARY_PATH=. ./hello_world-dynamic
282 When you have many packages depending on the same low-level libraries,
283 the savings on avoided rebuilding is large. However, shared libraries
284 have another benefit over static libraries: shared memory.
286 Much of the machine code in shared libraries is static (i.e. it
287 doesn't change as a program is run). Because of this, several
288 programs may share the same in-memory version of a library without
289 stepping on each others toes. With statically linked code, each
290 program has its own in-memory version:
292 <table class="center" style="border-spacing: 40px 0px">
294 <tr><th>Static</th><th>Shared</th></tr>
297 <tr><td>Program A → Library B</td><td>Program A → Library B</td></tr>
298 <tr><td>Program C → Library B</td><td>Program C ⎯⎯⎯⎯⬏</td></tr>
303 Unicode characters used in above figure:
304 U2192 RIGHTWARDS ARROW
305 U23AF HORIZONTAL LINE EXTENSION
306 U2B0F RIGHTWARDS ARROW WITH TIP UPWARDS
312 If you're curious about the details on loading and shared libraries,
313 [Eli Bendersky][EB] has a nice series of articles on [load time
314 relocation][relocation], [PIC on x86][PIC-x86], and [PIC on
317 [object-files]: http://en.wikipedia.org/wiki/Object_file
318 [linker]: http://en.wikipedia.org/wiki/Linker_%28computing%29
319 [machine-code]: http://en.wikipedia.org/wiki/Instruction_set_architecture
320 [x86-64]: http://en.wikipedia.org/wiki/X86-64
321 [collect2]: http://gcc.gnu.org/onlinedocs/gccint/Collect2.html
323 http://blog.flameeyes.eu/2008/02/09/flex-and-linking-conflicts-or-a-possible-reason-why-php-and-recode-are-so-crashy
324 [mangling]: http://en.wikipedia.org/wiki/Name_mangling
326 http://ee.hawaii.edu/~tep/EE160/Book/chap14/section2.1.1.html
327 [N1570]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
328 [9899]: http://www.open-std.org/jtc1/sc22/wg14/
329 [global]: http://c2.com/cgi/wiki?GlobalVariablesAreBad
331 Wulf and Shaw, Global variable considered harmful
332 http://dx.doi.org/10.1145%2F953353.953355
334 [EB]: http://eli.thegreenplace.net/
336 http://eli.thegreenplace.net/2011/08/25/load-time-relocation-of-shared-libraries/
338 http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/
340 http://eli.thegreenplace.net/2011/11/11/position-independent-code-pic-in-shared-libraries-on-x64/