Compilation and Linking Basics
With today's powerful IDEs building a program is often done by simply pressing a key. While this is a nice progress it is still desirable to know what is going on under the hood to be able to diagnose and fix problems when the build process fails. This tutorial gives a general overview of the build process. It explains how the compiler and the linker work together to build executables.
The information in this page is not specific to any particular programming language. A large number of languages follow the compilation and linking model e.g. C/C++ and Fortran. The obvious exceptions are interpreted languages. Java is an interesting case as it actually follows the compilation and linking model but the executables produced run on a virtual machine.
The compiler and the linker
For compiled languages like C/C++ and Fortran a tool is needed to convert the source files into an executable that can run on the machine. The produced executable is targetted at a specific platform (processor, operating system, etc). The idea behind a compiled language is that the sources can be common between platforms while the produced executable is customized and optimized for its platform. In practice it is quite hard to have entirely common source files between platforms though.
The tool that converts the source files into the executable is commonly called a compiler and the conversion process is called compilation. However "compilation" is usually split in two separate steps: compilation and linking. And two separate tools are used: the compiler and the linker. It is not usually necessary to distinguish between the two steps, this is why people simply refer to them as compilation. In this tutorial it is necessary to make the distinction though.
Figure 1 shows the compilation and linking process. For each set of source files the compiler is invoked and produces an object file. The object files are then fed into the linker which produces the executable. We are voluntarily vague on what a set of source files is. This is language specific. For the C language this would be a .c file and all the header files it includes.
|Figure 1: The steps to build an executable|
You may wonder why two tools are needed rather than one. There are several advantages.
- It is possible to group object files and store them in files called libraries. Thus these libraries contain compiled code that can be used by the linker directly. Using libraries is efficient because the compilation stage doesn't have to be performed for code that doesn't change. It also facilitates software distribution as the user of the library doesn't have to rebuild all the library from source, something that can be complex. And finally it also helps protect a company's intellectual property as the code in the library is only present in compiled form so the content of the source files are not readily available.
- While both the compiler and the linker are very platform dependent they are in different ways. The compiler is more dependent on the processor architecture. The linker is more dependent on the operating system.
- Compilers are quite complex pieces of software already. Splitting them into two separate units simplifies their writing and maintenance.
As we mentioned object files can be grouped in libraries and used as input to the linker. Figure 2 shows an updated version of the process illustrated by Figure 1 to include libraries.
|Figure 2: The steps to build an executable|
Example: Visual Studio .NET 2003
We will now illustrate the compilation and linking process using Visual Studio .NET 2003. These steps are not very sensitive to the exact Visual Studio version so anything between version 6 and 2005 should do.
We create a new project. For simplicity we create a Win32 Console project. This will be a very simple "Hello World" application. So we have just one source file called main.cpp with the following contents.
int main(int argc, char** argv)
After we build this in Debug mode we will find the executable HelloWorld.exe in the Debug directory. However there is also a lot of other files there. The only one that interests us here is main.obj. This is the result of the compilation stage. As the name of the file indicates this is the object file produced by compiling main.cpp. The build log can be found here.
The interesting portion of the build log has been reproduced below with the relevant details highlighted. Here you can clearly see that two tools will be invoked: cl.exe, the compiler, and link.exe, the linker. You can also see that main.obj is an input to the linker.
Creating temporary file "s:\HelloWorld\Debug\RSP000007.rsp" with contents [ /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /Gm /EHsc /RTC1 /MLd /Fo"Debug/" /Fd"Debug/vc70.pdb" /W3 /c /Wp64 /ZI /TP .\main.cpp ] Creating command line "cl.exe @s:\HelloWorld\Debug\RSP000007.rsp /nologo" Creating temporary file "s:\HelloWorld\Debug\RSP000008.rsp" with contents [ /OUT:"Debug/HelloWorld.exe" /INCREMENTAL /NOLOGO /DEBUG /PDB:"Debug/HelloWorld.pdb" /SUBSYSTEM:CONSOLE /MACHINE:X86 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib .\debug\main.obj ] Creating command line "link.exe @s:\HelloWorld\Debug\RSP000008.rsp"
Figure 3 shows the process explained in Figure 2 but tailored to this case. The set of source files is made of main.cpp and all the files included by it i.e. stdio.h and any file stdio.h may include recursively. Visual Studio always include a set of default libraries. Some of them may actually not be needed.
|Figure 3: Compilation and linking with Visual Studio .NET 2003|