Subscribe to Stuck in an Infiniteloop        RSS Feed
***** 1 Votes

A Visual Guide of the C++ Compilation Process

Icon 4 Comments
We seem to get a lot of threads asking about headers, definition and redefinition errors, including cpp files, etc... so it is time for diagrams (but first, some words)!

A typical C/C++ compiler consists of several pieces: the preprocessor, the parser, the lexer, the linker, code generator, code optimization, etc... but for the purpose of simplicity I'm going to refer to the whole "package" as "the compiler" unless otherwise noted or if I'm making a specific reference to a certain piece.

In a nutshell this is what the compiler does:

Attached Image

We're not going to get into parsing and lexicographical analysis, but if you're looking for resources, Martyn.Rae has an excellent series of tutorials on the subject.


The following code is going to serve as our example:

#ifndef CIRCLE_H
#define CIRCLE_H
#include <iostream>

class Circle{
private:
	int radius;
	double area;
	double circumfrence;
public:
	Circle();
	Circle(int);
	int getRadius()const;
	double getCircumfrence()const;
	double getArea()const;
	friend std::ostream& operator <<(std::ostream&, const Circle&);
};
#endif



//Circle.cpp
#include "Circle.h"

Circle::Circle():radius(0), area(0), circumfrence(0)	{}

Circle::Circle(int r){
	radius = r;
	circumfrence = radius*2*3.14;
	area = radius*radius*3.14;; 
}

int Circle::getRadius()	const								{return radius;}
double Circle::getCircumfrence() const						{return circumfrence;}
double Circle::getArea() const								{return area;}


std::ostream& operator<<(std::ostream& os, const Circle& rhs){

	os << "Radius: " << rhs.getRadius() << std::endl;
	os << "Area: " << rhs.getArea() << std::endl;
	os << "Circumfrence: " << rhs.getCircumfrence() << std::endl;
	return os;
}



#include <iostream>
#include "Circle.h"
using namespace std;


int main(){
	Circle one;
	Circle two(5);

	cout << "Circle one's info: " << endl;
	cout << one << endl;
	cout << "Circle two's info: " << endl;
	cout << two << endl;
	cin.get();
	return 0;
}



Note: There is a disproportionate amount of info regarding the preprocessor and header guards then on the compiling and linking steps since the former is not only visibly tangible, but also seems to cause the most problems.

The preprocessor

The preprocessor runs before anything else. Think of it as a giant copy paste machine. Any time you #include something, it gets "copy/pasted" right there.

For example:


In Circle.cpp, we #include "Circle.h". After the preprocessor runs, Circle.cpp looks like this (to the compiler):

#ifndef CIRCLE_H
#define CIRCLE_H
#include <iostream>

class Circle{
private:
	int radius;
	double area;
	double circumfrence;
public:
	Circle();
	Circle(int);
	int getRadius()const;
	double getCircumfrence()const;
	double getArea()const;
	friend std::ostream& operator <<(std::ostream&, const Circle&);
};
#endif

Circle::Circle():radius(0), area(0), circumfrence(0)	{}

Circle::Circle(int r){
	radius = r;
	circumfrence = radius*2*3.14;
	area = radius*radius*3.14;
}

int Circle::getRadius()	const								{return radius;}
double Circle::getCircumfrence() const						{return circumfrence;}
double Circle::getArea() const								{return area;}


std::ostream& operator<<(std::ostream& os, const Circle& rhs){

	os << "Radius: " << rhs.getRadius() << std::endl;
	os << "Area: " << rhs.getArea() << std::endl;
	os << "Circumfrence: " << rhs.getCircumfrence() << std::endl;
	return os;
}



...all in one file. Headers are for the programmers as a means to logically separate and organize code/projects.

More on the preprocessor can be found here.


Header guards:

Most programming languages (I want to say all, but I'm sure there's something obscure out there I am unaware of) have a "one definition rule". This means only one definition of anything can exist at one time. Therefore, when we have multiple files including multiple headers, we must provide a way for the compiler to include only one copy (or definition).

The most common/platform independent way is through the use of header guards:

#ifndef CIRCLE_H
#define CIRCLE_H
//header
#endif



If two of these are in a file, then only the first will be included since CIRCLE_H is already defined elsewhere.

For example, if we had Shape.h that Circle.h included and main() needed access to both Shape and Circle objects it would probably include both headers. Now main() has two copies of Shape.h. without header guards, the compiler would complain about multiple definitions of the same thing:

//Shape.h
class Shape{
	//shape stuff
};



//Circle.h
#include "Shape.h"
class Circle: public Shape{
	//circle stuff
};



//main.cpp
#include "Shape.h"
#include "Circle.h"

int main(){
	//shape and circle stuff
	return 0;
}



If we expand the inclusions, as the preprocessor does we can see the problem:

//Shape.h
class Shape{
	//shape stuff
};

//Shape.h
class Shape{
	//shape stuff
};

class Circle: public Shape{
	//circle stuff
};


int main(){
	//shape and circle stuff
	return 0;
}



Throwing in some header guards:

//included
#ifndef SHAPE_H
#define SHAPE_H
//Shape.h
class Shape{
	//shape stuff
};
#endif

//this one is not included
//SHAPE_H already defined
#ifndef SHAPE_H
#define SHAPE_H
//Shape.h
class Shape{
	//shape stuff
};
#endif

class Circle: public Shape{
	//circle stuff
};


int main(){
	//shape and circle stuff
	return 0;
}



Problem solved! Another possible solution is the use of #pragma once which basically says "only compile this once", but not all vendors support its use. MSVC++ does in case you're curious and comes into play in the standard headers, here is the first few lines of <iostream>:

// iostream standard header for Microsoft
#pragma once
#ifndef _IOSTREAM_
#define _IOSTREAM_



A list of pros and cons of each method can be found on here.


Compiler

This is where the "work" is done. High level code is translated into machine code, the result is object files.

Linker

All object files and relevent resources are linked together. Symbol information is verified (I'm sure you've run into a Linker error more then once) and an executable is made.

Whew, now with that is out of the way...


Diagrams!

Note: these are not overly technical diagrams (that would require detailed analysis of each vendor's compiler), but serve to provide a generic overview of what happens when one hits "build".

Attached Image

Attached Image

Attached Image

For the text lovers: the preprocessor fills all of the symbols, macros, includes, etc... The compiler then takes the resulting code and makes it into an object file (machine code). Then the linker lassos all of the object files and any additional resources into an executable. Typically only symbol information is linked, since the whole point of libraries is for them to do the work rather then having it "in" your program/executable.


The sample program above in this process:

Attached Image

Let me know if I missed anything crucial or you want to see more UML art!


--
Happy coding!

4 Comments On This Entry

Page 1 of 1

skyhawk133 Icon

13 September 2010 - 04:15 PM
Great entry! Submitted to DZone: http://www.dzone.com...on_process.html
0

alias120 Icon

13 September 2010 - 07:36 PM
Thank you for this KYA, the diagrams do a nice job of demonstrating what happens during that "magical" compilation process.
0

RBSprogram101 Icon

14 September 2010 - 01:14 PM
Why is the overloaded operator a friend? Is this a good way to overload an operator? I have been told it goes outside the class but you have it inside.

friend std::ostream& operator <<(std::ostream&, const Circle&);


and you have two semicolons here:
area = radius*radius*3.14;;


Excellent article!!
0

KYA Icon

14 September 2010 - 02:08 PM
There's several ways to overload that particular operator, in fact it could be a blog post unto itself. I could have accessed the private members directly (friend designation), but I stuck with accessors. As for "inside", it is declared inside to let the class know who its friends are, but is still externally defined.


Thank you for pointing out the typo.
0
Page 1 of 1

August 2014

S M T W T F S
     12
3456789
10111213141516
17181920212223
242526272829 30
31      

Tags

    Recent Entries

    Recent Comments

    Search My Blog

    0 user(s) viewing

    0 Guests
    0 member(s)
    0 anonymous member(s)