C++
Oddly enough, C++ is actually rather simpler than C (in at least some respects) when it comes to declarations vs. definitions, so let's start with it.
Declaration
In C++, a variable declaration would look something like: extern int a;
. This tells the compiler that there's an int
named a
that it should assume it'll be able to access.
The compiler can then produce code that uses a
, but in the object code, references to a
will have some sort of record that tells the linker that this is intended to refer to a
, but doesn't define it. It's then up to the linker to find the actual definition of a
and...link those other references to it. If it can't find a definition of a
, you'll get an "undefined reference" error from the linker.
Definition
Pretty much anything that doesn't include the extern
on it is a definition. That is, something like int a;
is a definition. This not only declares a
(i.e., tells the compiler that a
is a variable of type int
) but also tells it to allocate space for a
to be stored in. If, somewhere else in the program there's an extern int a;
(and this definition of a
allows it to be visible there) the linker will fix up that declaration to refer to this definition of a
.
Initialization
In C++, the difference between initialization and assignment is much more profound than it is in C. For example, references can only be initialized, not assigned. For example, let's consider code like this:
int a;
int &ra = a;
ra = b;
In this case, int a;
defines the variable a
. The int &ra = a;
defines ra
to be a reference to an int
, and initializes it with a
, so ra
refers to a
.
The ra = b;
is an assignment--but unlike the initialization, it doesn't actually assign to ra
itself at all. Rather, since ra
is a reference, it actually assigns to the variable to which ra
refers (which is to say, a
).
Assignment
As noted above, the ra = b;
line above is an assignment. Note that the int &ra = a;
line includes an =
sign, but it's still not an assignment. If it's part of a definition, like: T a = b;
, the =
signifies initialization, not assignment. You only get assignment when the =
is separate.
int a = 1; // initialization
int a;
a = 1; // assignment
This becomes especially important when/if you overload operators, because it affects what operator will be called in a given situation.
class foo {
public:
foo(int); // used for something like `foo f = 1;`
foo &operator=(int); // used for things like `foo f; f = 1;
};
In some cases, you might have (for example) defined a constructor, but deleted the assignment operator, in which case initialization will be allowed, but assignment won't.
C
By the time C was being standardized, there were a number of compilers that didn't all work quite the same way. Because of this, the C committee had to jump through some hoops to make a single set of rules that provided compatibility with most existing code.
locals
Let's start with the simple normal case: for a variable inside of a function, you basically have three storage classes:
int foo() {
extern int a;
static int b;
int c; // `auto int c;` is equivalent
In this case, the extern int a;
is a declaration, saying that somewhere at global scope, somebody has defined a variable named a
, and we're declaring it here so the code in this function knows about it and can refer to it.
The static int b;
defines a variable with local scope (its name is only visible in this functions) and static storage duration (it exists for the life of the program, rather than being re-created each time the function is executed).
The int c;
of course defines a variable with local scope and auto
storage duration, so it's destroyed every time you exit the scope, and created anew each time you enter the scope.
Globals
Here's where things get really hairy. To maintain compatibility with existing code, the C committee defined a couple of concepts that were new and unique: tentative definitions and composite types. It also defines a term called "linkage", which is at least sort of like what most of us think of as scope
--that is, the visibility of the name that allows the linker to link something else to this definition. As they defined the term, there are three forms of linkage: external (visible throughout the program), internal
(visible only within a single translation unit) and none
(only visible in the local scope).
Continuing with our theme of starting with the simplest case, we'll start with a definition that includes initialization, like: int a = 1;
. This is a variable definition. It declares a
to be a variable of type int
, allocates storage for it, and initializes it with the value 1
(and it has no linkage).
From here things get a little hairy though. For example, consider code like:
int a;
int a = 1;
In C++ this would be prohibited as a violation of the one definition rule. In C, however, it's allowed. The first int a;
is a tentative definition. If there were no other definition of a
, it would define a
, allocate storage, and (since it's a global) signify that a
will be initialized to 0
.
Likewise, we could have:
static int a;
int a = 1;
A tentative definition can contain the static
specifier, so this is still a tentative definition followed by a definition. The static
from the tentative definition is included in the type, so overall we've defined a single variable named a
with static storage duration and internal linkage that's initialized to 1
.
We could also have:
static int a;
extern int a;
Now, this might seem like a conflict. The static
specifies internal linkage, while the extern
seems to specify external linkage. In fact, it is allowed. In this case, the extern int a;
basically acts like a declaration, creating a reference to the static int a;
, so we have one definition of a
as an int
with internal linkage.
If, however, we try to change that around:
extern int a;
static int a;
The situation changes completely. A tentative definition can specify static
, but it can't specify extern
. As such, with this ordering, the extern int a;
is a full definition of a
that specifies that a
has external linkage. The static int a;
then attempts to specify that a
has internal linkage, which creates a conflict, so the compiler will normally reject the code with an error message.
This isn't the only possible conflict either. For example:
static int a;
int a;
...also creates a conflict, so we'd expect the code to be rejected with an error message.
Assignment vs. initialization
In C, there are no references, and no user-defined operator overloading, so the difference between initialization and assignment isn't important as often as it is in C++. There are a few cases, especially initialization of arrays, however, where we still need to keep track of the difference:
int a[] = { 1, 2, 3, 4}; // allowed--initializes `a`
int a[4]; // no problem, but `a` isn't initialized
a = { 1, 2, 3, 4}; // Error!