What happens when you type GCC main.c

jgra007
3 min readJun 10, 2020

GCC is a set of compilers for various languages (ada, C, C++, fortran, ObjC, ObjC++, at one point java). It provides all of the infrastructure for building software in those languages from source code to assembly.

It is responsible for the conversion of the “high level” source code in the respective language and ensuring that it is semantically valid, performing well formed optimizations, and converting it to assembly code (which is then handed off to the assembler).

It also provides the general “driver” to invoke the various tools in the toolchain (e.g. invoking the assembler or linker) so that you do not need to worry about the exact ordering of a large number of implementation details about the object file format and underlying runtime library.

When you want to execute code to create an object file, the computer needs to be communicated with in machine language, which is (base-2) binary code. Unfortunately (and fortunately), humans communicate in language higher than binary. Hence, why we use a higher-level language like Python, Ruby, and in this case, the C programming language. But, in order for computers to execute our C code, we have to compile the code using the Unix command:

GCC main.c

Here’s what happens when we GCC the file main.c

Three main steps happen when we compile code:
1. Reads the source file
2. Processes it
3. Links it with a runtime library

A lot happens when the code is being processed. Let’s unpack.

A compiler has multiple modules: preprocessor, compiler, assembler and linker.

When we write the file main.c, the preprocessor generates some intermediate file, that file is given to the compiler. The role of the compiler is that it compiles files generated by the preprocessor as input, and that generates assembly code, so it can convert our C program file into the assembly language. Computers can only generate binary code, which is why assembly language is the format it needs to be in.

Though, assembler code is still not understood by the machine — it needs to be converted into machine code. The converter that does this job: the assembler. The assembler module will convert the assembly code into the object code.

Lastly, the linker, the last module, links the object code (created by the assembler) with library functions code that we use (when we write our code). From that linkage, txt files are generated.

Let’s go in depth about each module.

The preprocessor does 3 tasks: removes comments from the code, includes the header file (standard in C files) into the generated file itself, and if any macros were used, will replace the macro name with code.

The compiler will take the file (created by the preprocessor) code and create the assembly code. The assembly code are comprised of mnemonics, instructions defined by english words.

The assembler converts the assemble code into the object code.

Lastly, the linker can play one of two roles:
1. Can merge multiple C files by compiling them, into one executable file.
2. Links our code (generated from the binary code of the assembler output) with the library function code.

There are two types of linking: static and dynamic. The linker decides what type of linking it will use.

The linker will pack all the code into a single file, which is famously known as the .exe file.

--

--