Getting Started with C++

Carlos Moreno

Give someone a program, and you frustrate them for a day,
teach someone to program, and you frustrate them for a lifetime.
– Unknown

A First C++ Program

A simple program that displays a fixed message on the console (or, more specifically, sends a fixed message to the standard output device, stdout):

#include <iostream>
using namespace std;

int main()
{
cout << "Thank you for running my first C++ program!" << endl;
return 0;
}

The “substance” of this program is in the two lines between the curly braces (the characters { } that follow the line int main()); that is, the lines starting with cout and return. These are the two statements that are executed. Notice that the term statement in the context of programming languages is related to the notion of an instruction, rather than the notion of stating a fact.

The cout statement is used to send or output data to the console; more specifically, it sends data to the standard output device, or stdout.

One of the properties of the C++ language is that everything in a C++ program is case-sensitive. This means that in this simple program, cout must be written in lowercase letters, and you can not write it as Cout, or COUT, or any other variation.

In the program above, I used cout to display a fragment of text, which in C++ we can represent as a literal string. A literal string is written as a sequence of characters enclosed between double-quote characters.

The semicolon at the end of the statement is required in general to indicate the end of a statement, and not only in the case of cout.

cout also allows you to send several items to the console, as shown below:

cout << item1 << item2 [ << ··· ];

The square brackets ([ ]) indicate optional elements. In this case, the optional << and ··· indicate the possibility of additional items being sent using a single cout statement.

The token endl, which stands for end-line, has a dual effect: when output, it is equivalent to output a newline character (which moves the cursor to the beginning of the next line), and it flushes the output device (the console, in our example).

Flushing an output device means that all preceding output operations are required to be completed immediately. This is related to the issue of buffering, which is an optimization technique used by the operating system. Roughly speaking, the operating system reserves (and usually exerts) the right to put the data “on stand by” until it decides that it has an amount of data large enough to justify the cost associated to sending the data to the screen. In some cases, however, we need the guarantee that the output operations performed in our program are completed at a given point in the execution of our program, so we flush the output device.

The structure of the program

A really minimal application in C++ (one that does absolutely nothing) requires the following components:

int main()
{
return 0;
}

The statements of the program must appear inside the body of the function main; that is, inside the pair of curly braces. As you will find out later, a program can be divided into multiple files and multiple sections, so we have to indicate where execution begins. This is the purpose of the function main.

Any other statements that you want the program to execute must appear after the opening curly brace (the { after int main()), and before the return statement — any statement that you place after the return statement would never be executed, since the execution of the return statement causes the program to end.

The C++ language is rather modular, meaning that a C++ program only needs to deal with the areas of the language that it uses, making the process — and our program — more efficient. This is the reason why when using cout statements, we must indicate that we are going to use input/output facilities; we do this by placing the #include <iostream> directive at the beginning of the source file. This leads to the additional requirement to place the using namespace std; directive as well. I will discuss this issue more in detail, but for now, just take it as part of the recipe.

Running the Program

If you are working on a Unix/Linux platform (highly recommended in most contexts) with the GNU Compiler Collection (GCC), you would proceed as follows:

Use a text editor to type the above program.
Save the text file as first.c++
At the shell prompt, issue the following command:

c++ –o first first.c++

(you could use the command g++ as well — the commands c++ and g++ are equivalent)
To run the program, issue the command:

./first

The command in step 3 has the following structure: the first item (c++) is the name of the command to invoke the C++ compiler. The last item is the command-line argument given to the compiler, and it indicates the source file to be compiled. The item –o first is a command switch or modifier, in this case, it tells the compiler where should the executable be placed (the filename for the executable file). By default, the compiler would create a file called a.out, and we would have to run it as ./a.out, but we normally want to give the executable a meaningful name related to our program.

Try it out!

An Interactive Program

If we want the program to accept input from the user, the first thing we need is to set up variables to store the information. A variable is a named piece of information.

Variable names have the following requirements:

Can only contain letters, digits, and the underscore character (_). No other characters are allowed.
Can not begin with a numeric digit (it can contain digits, just not as the first character)
Can not be one of C++ reserved keywords (such as int, return, namespace, and many others). I omit a complete list of these reserved keywords for two reasons: (1) you can find it easily through a Google search; and (2) if you do accidentally use a reserved keyword as a variable name, the compiler will flag an error.
Variable names are case-sensitive — capital letters are perfectly valid; just keep in mind, a variable named VAR has nothing to do with a variable called var; they're both valid names, referring to different variables.

Examples of valid variable names:

counter	Sequence of lowercase letters
Counter2	Can contain upper or lowercase letters, and numeric digits
number_of_employees	Why not?

Examples of invalid variable names:

2nd_counter	Can contain digits, but it can not begin with a number
save&exit	The character & is not allowed
price+tax	The character + is not allowed — how would the compiler know it is not a variable price added with a variable tax?
number of employees	Spaces are not allowed

Declaring variables

An important detail in C++ is that all variables must be declared prior to their first use. A variable declaration is a statement with the following syntax:

data_type variable_name;

Where data_type defines the type of information that this variable is going to hold (numeric, text, integer or real numbers, etc.). Some of the data types that we will be using often are:

int	Integer value, typically with a range between -2147483648 and 2147483647
unsigned int	Non-negative integer value, typically with a range between 0 and 4294967295
float	Single-precision floating-point numbers (real numbers, with decimals)
double	Double-precision floating-point numbers
char	A single character
bool	Can contain true or false (literally!)
string	A sequence of characters (any amount of characters, including the possibility of zero characters)

The ranges for int and unsigned int are platform-dependent; the above figures are for typical C++ implementations on 32-bit and 64-bit processors. However, we do have the guarantee that the size of the range is the same for both (in the above example, both ranges are 4294967296 units in length — variables of both data types can represent a maximum of 4294967296 different values).

string is not really a built-in data type; it is a Standard Library facility, meaning that if we are going to use string variables in our program, we must provide a line at the top as follows:

#include <string>

Usually, our program needs other basic standard library facilities, and we would end up doing something like this:

#include <iostream>
#include <string>
using namespace std;

Examples of variable declarations are shown below. The examples illustrate the basic syntax shown above as well as a few additional possibilities:

int counter;
double price;
int number_of_items = 0;
double x1, x2, Tc = 0, Tf;
double pst, gst;
double total = price + pst + gst;

The declaration of total initializes it with the result of the expression at the right of the equal sign — it is assumed that the three variables involved in that expression already contain the correct values at the point where they're used for the purpose of initializing total.

Our first interactive program could be a conversion from inches to centimeters. We just have to accept input from the user, and multiply the result times the conversion factor 2.54:

#include <iostream>
using namespace std;

int main()
{
    double inches, cm;

    cout << "Please enter a distance in inches: " << flush;
    cin >> inches;

    cm = 2.54 * inches;

    cout << "The distance you entered is equivalent to "
         << cm << " centimeters" << endl;

    return 0;
}

Notice the flush token in the cout statement; it flushes the standard output, but unlike endl, it does not send the cursor to the beginning of the next line. Usually, things work fine without it, because the program has to stop to wait for user input, and the operating system usually decides that it is a good idea to complete the output operations at that point. But still, it doesn't hurt to force the operations to be completed, to avoid the unfortunate situation in which the user never gets to read the instructions from the program because of buffering issues.

Making your Programs Readable

One of the important features in a program — in addition to exhibit the expected behaviour in all possible situations — is readability. There are two strong reasons for this: In a real-life situation, many different programmers interact with a given program — perhaps you wrote the program, but some time later another person has to maintain it (modify it, add new features, fix newly discovered problems, etc.), or you get to use, for your application, bits and pieces that other people wrote. The other reason is that even if you are the only one that is going to interact with a particular program, what we write today may become less-than obvious when we look at it two or three weeks (or months, or years) later.

The argument can be made a lot more dramatic, to the point of claiming that readability is more important than functionality (and no, this is not a joke). The argument being that it is easier to fix a program that is readable and easy to understand, than to modify or maintain a program that is working perfectly but is written in a cryptic way. You may ask: «if the program is working perfectly, why would it need to be modified?» The reality is that programs require change more often than not — requirements change, new versions are developed with new or modified features, problems (bugs) are found on programs more often than not, etc.

Of course, by the time that you “ship” your programs, they must be working properly above everything else. But the thing is, there is a lot of time between the moment that you write a program (possibly a small part of a bigger software or some other product) and the moment that the product is shipped, and during that time, programs are tested, modified, fixed or re-written, and of course, you want these tasks to be as easy and as little error-prone as possible — and one of the most critical conditions for that is that the programs be readable.

But there is another subtlety here: as programs become more complex (and they necessarily will, if you plan to do anything useful with programming), making them work correctly becomes increasingly tougher; readability is essential to reduce the complexity, or rather, to make complex programs a little less hard to understand and thus to get them working properly.

I will briefly discuss a few of the issues involved in writing readable programs. For a more advanced discussion, I would recommend you to get a copy of Steve McConnell's Code Complete book. This book is, in my opinion, a masterpiece, even if it may be considered a little bit dated; still, I consider it a must-have piece for everyone who wants to be a programmer or a software designer/architect. As you progress in the area of software, you'll find out that there are many books on more specific issues of software engineering and quality of the software that we develop; but this book is definitely a good starting point.

Comments

Comments are fragments of text in the program that are marked so that they are disregarded by the compiler (that is, they are not really part of the program); they serve the purpose of explaining (to a human reader) what the program (or more specifically, what that section of the program) is doing, or why.

In C++, comments are indicated in two possible ways:

End-of-line comments, indicated with //

Whenever the compiler encounters the sequence //, the rest of that line is disregarded by the compiler, and has no effect on the behaviour of the program.
Block comments, indicated by enclosing between /* and */

Brainteaser: There is an obvious exception to the above rules (that is, a situation where the sequence // or the sequence /* appears, and yet what follows is not ignored by the compiler). Can you think of such a situation?

The fragment below shows some examples of comments in a C++ program:

#include <iostream>
using namespace std;

    /* Author: Carlos Moreno
       Description:
          This is just a demo program.
          Don't take it too seriously
    */

int main()
{
        // Just print a quick message:
    cout << "Done!" << endl;

    return 0;
}

In the above fragment, I used teal(ish) to indicate the new portions, the new concepts being illustrated; however, in the future (including other tutorials), I will use green for the comments; in fact, many text editors oriented to programming use the so-called syntax highlight, in which different aspects of the program are shown in different colors (literal strings in one color, variables in another color, comments in another color, etc.), and often, green or gray is used for comments (when you think about it, syntax highlight is a feature that derives from the notion of readability, and the importance of readability in programs).

Very important: make good and generous use of comments, but equally important: do not write excessive comments. An excessive amount of comments makes it hard to read the actual program. Typical mistakes in this category include writing comments that say exactly the same as the statement that they're addressing:

a = 0; // Assign 0 to a
cin >> b; // Get user input into b

These comments are redundant, as they just repeat what is absolutely clear by looking at the statement. If anything, a complex statement would have to be explained (but explained, not just “spelled out”). Also, a comment that could make sense (even for a simple and obvious statement) is one that, instead of repeating what is being done (like the above examples), says why it needs to be done. For example, there may be a non-obvious reason why a needs to be assigned with 0 in the above example; in such case, a good/useful comment would be one that explains why a is being assigned with 0

Good Variable Names

This aspect can be seen as “do not speak in code” (no pun intended!). If a variable means something, then you'll want to program to “speak” to you in the right terminology; that variable should have a name that means that something. If a variable holds a price, it would be ridiculous to call it product, or a, or variable (hopefully, you are not one of those persons that use the word password for their passwords — and if you are, hopefully you won't bring that truly horrible habit to the world of programming).

Perhaps, a good guideline — and I'm sure this is going to sound like I'm joking — is the fact that variable names should make comments unnecessary, by making the statement self-explanatory; if you look at the example of conversion from inches to centimeters, I think there is no room for discussion: the program is perfectly clear, and neither line requires a comment to clarify what is being done or why. If, however, instead of inches and cm the variables had been called a and b, or measure1 and measure2, then comments would have been needed to exlain the meaning of something mysterious like a = 2.54 * b;

More in general (this applies to variable names and to other readability aspects as well), we should aim at writing programs that are “self-commenting”; programs that are so well-written and so clear that comments are really not needed to explain what the program is doing (so, it's not that comments should be avoided — simply aim at writing a program as if you were trying to make comments unnecessary).

Indentation and Spacing

Proper layout helps distinguish the various pieces, sections and blocks of a program. In C++, spaces and even line-breaks are optional (well, with some obvious exceptions); the entire program could be written in a single line and with no spaces between statements: int main(){cout<<"Done!"<<endl;return 0;}

Even this extremely short program becomes sufficiently painful to read and understand to illustrate the point (even if the example is taken to a somewhat ridiculous extreme).

The aspect of indentation helps to identify and separate the different blocks in a program. This will become more clear in the upcoming topics (covered in the other tutorials, on conditional execution, loops, functions, etc.), but do keep in mind that it is an important feature, in the category of readability.

Constants and Constant Values

The use of named constants has a dual benefit. It can both increase the robustness of a program, and also helps with readability. Suppose that you see a program, perhaps written by someone else, containing the following fragment:

total = price + price * 0.07;
total = total + total * 0.08;

If you happen to be familiar with the local tax rates, you may immediately understand what that fragment is doing. But one should not necessarily expect from all programmers that they will recognize those numbers. Multiplying times 0.07 and then 0.08 might strike the reader as mysterious and puzzling. Sure, a comment next to those numbers would solve the mystery, but then again, we don't want to have to write a comment for something so trivial as this.

It would be better if we could give a name to those mysterious “magic numbers”, so that when we read them, it becomes quite obvious what the program is doing. Using variables that contain those particular values would be a possibility; however, it's not a very good one, since a variable requires storing the values in memory, and requires the processor to fetch those values from memory every time — no big deal, but it does feel wrong (and more importantly, there is a better solution).

Named constants are syntactically similar to variables, except that in general, they do not require memory storage — the compiler usually treats them as if you wrote the numbers directly . They are like variables in that they are a named piece of information — a named constant. Names for these constants follow the exact same rules as for variables; they also have a data type associated, and must be declared with a syntax that is identical to a variable declaration, except for one detail: the const qualifier (in case you have not guessed it by now: strictly speaking, named constants are variables; a more specific type of variables that have const-qualification).

The example below should illustrate all of the above (it is not a complete program; just a fragment showing the relevant details):

const double gst_rate = 0.07;
const double pst_rate = 0.08;

double gst = price * gst_rate;
double pst = price * pst_rate;

When you read the formulas, it is quite clear what the program is doing, even if we are not seeing the exact value by which we are multiplying — we know that it is the GST rate, whatever the actual number happens to be.

You could argue that in this case it should have been obvious if we do it the right way, declaring a variable to hold the GST; naming that variable gst makes it quite obvious that the “magic number” has to be the GST rate. This does not negate the general principle, in that it is still preferable to read gst_rate rather than 0.07.

There are additional reasons why the use of named constants is convenient. Suppose that a year from now (from the time you wrote the program), the government changes the tax rates. That means that you have to modify the program. If there are, say, 10 or 20 places in your program where you do tax calculations, you are going to have a lot of unnecessary work in changing all those 10 or 20 fragments of the program!

But the extra work that you could have saved is not the strongest argument; the real issue here is: what if you overlook one of them? You read through the program changing all the occurrences of 0.07 and 0.08, and you might end up changing 19 of them, because you overlooked one (trust me, it can and does happen!). If you have a named constant, all you have to do is change that constant at the point of its declaration — it is hard to make a mistake in such a simple and to-the-point maneuver, and the change will be automatically reflected in every statement that uses the named constant!

It is often stated as a good programming practice that a program should never have any “magic numbers”. That is, that a program should never have an actual number in it, other than 0, 1, or initializing a named constant. This is not to be taken literally and too strictly, but it is a good guideline. Virtually every number that we use in our programs has a meaning: we always want to read the meaning of the number, rather than the actual number.

Protecting Against Mistakes

Another use of the const qualifier is to add a non-mutable constraint to certain variables. Quite often, we use a variable to store a temporary or partial result; we want to use the content of that variable later in the program, but we know that the variable is not going to be modified. It is a good idea to const-qualify such variable — we make the promise (to the compiler) that that variable is not going to be modified. If we accidentally write some statement that attempts to modify the variable (again I must say trust me — really trust me: this does happen!), the compiler will detect it, and will realize it is a mistake only if we const-qualified the variable (otherwise the compiler can not know if we legitimately want to modify the variable or not).

As a general rule, simply add a const in front of any declaration of a variable that you know will not be modified. This, of course, requires that the variable be initialized in the declaration (otherwise the first assignment to that variable would constitute a violation of the constness of that variable — yes, I know constness is not a correct English word, but it is used in C++ to refer to “non-mutability” of a variable).

The following program illustrates the use of const-qualification in a relatively simple programming situation:

#include <iostream>
using namespace std;

int main()
{
    const double gst_rate = 0.07;
    const double pst_rate = 0.08;

    double price;

    cout << "Enter product's price: " << flush;
    cin >> price;

    const double gst = price * gst_rate;
    const double pst = (price + gst) * pst_rate;
    const double total = price + gst + pst;

    cout << "Sub-Total:\t"    << price
            << "\nGST:\t"     << gst
            << "\nPST:\t"     << pst
            << "\n\nTOTAL:\t" << total << endl;

    return 0;
}

Perhaps surprisingly, most of the time, variables get one initial value and never require modification — in the example above you notice that, of a total of four variables, three of them get an initial value and do not require further modification (and that is without counting the named variables — it should really be five out of six); actually, price follows that same rule, in a sense, but in that case, there is no way that we can get away with making that one const, since it must be modified after its declaration, given that it receives input from the user.