10

How important is it to initialize variables?

Does proper initializing avoid memory leaks or have performance advantages?

Martijn Pieters
  • 14,499
  • 10
  • 57
  • 58
Vivek
  • 135
  • 1
  • 1
  • 6
  • 18
    It depends on the language. In some languages it's pretty important to prevent bugs, in the rest it's merely a good thing to do to help readability. – Telastyn Jan 11 '14 at 14:54
  • Thanks Telastyn for your input. Can you put a case where it becomes important depending up on the language ? – Vivek Jan 11 '14 at 15:19
  • 4
    C++ is the notorious one here. In debug, local variables are initialized to 0 (or `null`) by the common compilers, but are random garbage when compiling for release. (though my C++ knowledge is from ~10 years ago now, things may have changed) – Telastyn Jan 11 '14 at 15:38
  • It's a case of once-burned-twice-shy. Since I've seen/had bugs caused by uninitialized variables, especially pointers, it's become a habit. For performance, it's usually irrelevant. For memory leaks, not really an issue. – Mike Dunlavey Sep 26 '14 at 14:27
  • 3
    @Telastyn it's worse than that. Undefined behaviour is not limited to garbage state, anything can happen. The compiler can assume that paths that read uninitialised variables are unreachable, and eliminate "unrelated" effects that occur along the way. – Caleth Sep 18 '18 at 10:48
  • 3
    Bugs can also be created by initializing variables too early. You may not realize you expected value isn't set until toy line by line step your code. Had you left out the extra allocation many compiles could have pointed out the missing assignment. – Matthew Whited Sep 19 '18 at 03:57

9 Answers9

11

Trying to use an uninitialized variable is always a bug, so it makes sense to minimize the probability of that bug occurring.

Probably the most common approach programming languages take to mitigate the problem is to automatically initialize to a default value, so at least if you forget to initialize a variable, it will be something like 0 instead of something like 0x16615c4b.

This solves a large percentage of bugs, if you happened to need a variable initialized to zero anyway. However, using a variable that was initialized to an incorrect value is just as bad as using one that wasn't initialized at all. In fact, sometimes it can be even worse, because the error can be more subtle and difficult to detect.

Functional programming languages solve this problem by not only disallowing uninitialized values, but by disallowing reassignment altogether. That eliminates the problem and turns out to not be as severe a restriction as you might think. Even in non-functional languages, if you wait to declare a variable until you have a correct value to initialize it with, your code tends to be much more robust.

As far as performance goes, it's probably negligible. At worst with uninitialized variables, you have one extra assignment, and tie up some memory for longer than necessary. Good compilers can optimize the differences out in a lot of cases.

Memory leaks are completely unrelated, although properly-initialized variables tend to be in scope for a shorter period of time, and therefore might be somewhat less likely for a programmer to accidentally leak.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
  • Always? You mean that "always" as in "How a fixed Valgrind message rendered OpenSSL next to useless" http://marc.info/?t=114651088900003&r=1&w=2? Or do you mean the other, the "almost always" one? – JensG Jan 12 '14 at 01:40
  • 2
    I can think of three languages that allow uninitialized variables without error, one of which uses such for a linguistic purpose. – DougM Jan 12 '14 at 14:52
  • I would be interested in the specifics. I would suspect in those cases the variables are not truly uninitialized, but are initialized in a way other than directly by the programmer at the declaration site. Or they are assigned by some indirect means before being dereferenced. – Karl Bielefeldt Jan 12 '14 at 17:37
  • Using an uninitialized variable can be intentional. We call uninitialized memory garbage because that's what it is. You can't even rely on it to be random. But you can rely on it to be garbage. Which means it's exactly what you want if you're trying to scan memory for leaked secrets like passwords. Sorry but "always a bug" is an overreach. – candied_orange Aug 16 '21 at 16:35
9

Uninitialized variables make a program non-deterministic. Each time the program runs, it may behave differently. Unrelated changes to operating environment, time of day, phase of the moon and permutations of such affect how and when these daemons manifest. The program may run a million times before the defect presents, them may do it every time, or run another million. Many problems are put down to "glitches" and ignored, or defect reports from customers closed as "Unreproducible". How often have you rebooted a machine to 'fix' a problem? How often have you said to a customer "Never seen that happen, let me know if you see it again" - hoping (knowing) full well they won't!

As reproduction of a defect can be next to impossible in the test environment, its next to impossible to find and fix.

It can take years for the bug to surface, commonly in code thought to be be reliable and stable. The defect is presumed to be in more recent code - tracking it down can take significantly longer. A change in compiler, a compiler switch, even adding a line of code can change the behavior.

Initializing variables has a huge performance advantage, not only because a program that works correctly is infinity faster than one that does not, but the developers spend less time finding and fixing defects that should not be there and more time doing "real" work.

The other significant advantage of initialing variables is the original author of the code must decide what to initialize them to. This is not always a trivial exercise, and when not trivial, can be an indication of a poor design.

Memory leaks are a different problem, but proper initialization can not only assist in preventing them, it can also help in detecting them and finding the source - its highly language dependent and that's really a separate question worthy of further exploration than I am able to give in this answer.

Edit: In some languages (e.g. C#) it is not possible to use uninitialized variables, as the program will not compile, or report an error when executed, if done. However, many languages with these characteristics have interfaces to potentially unsafe code, so care must be taken when using such interfaces no to introduce uninitialized variables.

mattnz
  • 21,315
  • 5
  • 54
  • 83
  • 10
    Many programming languages automatically set their variables to some predefined value, so much of what you say here is not applicable to those languages. – Robert Harvey Jan 13 '14 at 17:15
  • 6
    Just to reiterate what @RobertHarvey said, *none* of this is applicable to C#. There's no performance advantage to initializing your variables when you declare them, and it's impossible to use an uninitialized variable, so you can't blame unreproducible bugs on that. (It *is* possible to use an uninitialized class field, but it gets set to a default value and generates a warning in that case) – Bobson Jan 13 '14 at 20:40
  • @Robert: As I understand it, if the language sets the variables to a predefined value, in context of a langugae agnostic question, they are considered initiasiled to that value. – mattnz Jan 13 '14 at 21:02
  • Yes, that's right. – Robert Harvey Jan 13 '14 at 21:03
  • @Bobson: You are correct, but the question does not mention C#, so whats your point. Also, C# can get uninitialised values though native calls passing them in. – mattnz Jan 13 '14 at 21:04
  • 8
    @mattnz - The point is that for languages which behave like C# (or Java), some of this advice is misleading or outright wrong. As a language agnostic question, it should have a language agnostic response, which means addressing languages which *do* handle initialized variables safely as well as those that don't. – Bobson Jan 13 '14 at 21:44
  • 1
    I'd also add that uninitialized variable issue are not hard to find as any half decent compiler/static analyser will warn about them – jk. Jan 14 '14 at 13:16
  • 3
    For Java (and C# presumably) prematurely initializing locals is unnecessary and arguably leads to more bugs. For example, setting a variable to null prior to assigning it conditionally defeats the compiler's ability to tell you that one of the paths through the code may not result in the variable being assigned. – JimmyJames Sep 18 '18 at 14:29
  • @Bobson: Consistently, certainly. Safely? That only comes depending on lots of other surrounding guarantees, for a certain definition of safe. – Deduplicator Aug 16 '21 at 13:43
7

Initializing a variable as Telastyn pointed out can prevent bugs. If the variable is a reference type, initializing it can prevent null reference errors down the line.

A variable of any type that has a non null default will take up some memory to store the default value.

Kevin
  • 798
  • 3
  • 7
7

Initializing, implies that the initial value matters. If the initial value matters, then yes, clearly you must make sure it is initialized. If it doesn't matter, that implies that it will get initialized later.

Unnecessary initialization causes wasted CPU cycles. While these wasted cycles might not matter in certain programs, in other programs, every single cycle is important as speed is of primary concern. So it's very important to understand what one's performance goals are and if variables need to be initialized or not.

Memory leaks are a completely different issue which typically involve a memory allocator function to issue and later recycle blocks of memory. Think of a post office. You go and ask for a mailbox. They give you one. You ask for another one. They give you another one. The rule is that when you are done using a mail box that you need to give it back. If you forget to give it back they still think you have it, and the box can't be re-used by anyone else. So there is a chunk of memory tied up and not being used, and this is what is referred to as a memory leak. If you keep asking for boxes at some point you will run out of memory. I've oversimplified this, but this is the basic idea.

Elliptical view
  • 205
  • 1
  • 7
  • -1 you are redefining what initialization means in this context. – Pieter B Sep 18 '18 at 10:47
  • @Pieter B, I don't understand your comment. Please, if you will, say how I am, "redefining what initialization means in this context". Thank you – Elliptical view Sep 18 '18 at 14:23
  • Read you own sentence, it's circular reasoning: "Initializing, implies that the initial value matters. If the initial value matters, then yes, clearly you must make sure it is initialized. If it doesn't matter, that implies that it will get initialized later." – Pieter B Sep 18 '18 at 14:43
  • @Pieter B, Some people initialize as a general rule rather than for a programmatic reason, i.e. they initialize whether the initial value matters or not. Isn't this the heart of OQ: How important is it to initialize a variable? Anyway, you've been out voted here. – Elliptical view Sep 19 '18 at 14:30
3

As others said, it depends on the language. But I'll demonstrate my Java (and Effective Java) ideas about initializing variables. These should be usable for many other higher level languages.

Constants and class variables

Class variables - marked with static in Java - are like constants. These variables should normally be final and initialized directly after definition using = or from within a class initializer block static { // initialize here }.

Fields

As in many higher level and scripting languages fields will be automatically be assigned a default value. For numbers and char this will be the zero value. For Strings and other objects it will be null. Now null is dangerous and should be used sparingly. So these fields should be set to a valid value as soon as possible. The constructor is normally a perfect place for this. To make sure that the variables are set during the constructor, and not changed afterwards you can mark them with the final keyword.

Try and resist the urge to use null as some kind of flag or special value. It is better to e.g. include a specific field to hold state. A field with the name state which uses the values of a State enumeration would be a good choice.

Method parameters

Because changes to values of parameters (be it references to objects or basic types like integers etc) will not be seen by the caller, parameters should be marked as final. This means that the values of the variable itself cannot be changed. Note that the value of mutable object instances can be changed, the reference can not be changed to point to a different object or null though.

Local variables

Local variables are not automatically initialized; they need to be initialized before their value can be used. One method to make sure that your variable is initialized is to initialize them to some kind of default value directly. This is however something you should not do. Most of the time the default value is not a value you would expect.

It is much better to only define the variable precisely where you need the variable. If the variable is only to take a single value (which is true for most variables in good code) then you can mark the variable final. This makes sure that the local variable is assigned exactly once, not zero times or two times. An example:

public static doMethod(final int x) {
    final int y; // no assignment yet, it's final so it *must* be assigned
    if (x < 0) {
        y = 0;
    } else if (x > 0) {
        y = x;
    } else {
        // do nothing <- error, y not assigned if x = 0
        // throwing an exception here is acceptable though
    }
}

Note that many languages will warn you if a variable remains uninitialized before use. Check the language specifications and forums to see if you don't worry needlessly.

Maarten Bodewes
  • 337
  • 2
  • 14
1

There is no problem with uninitializing variables.

The problem is only when you read a variable that has not been written yet.

Depending on the compiler and/or on the kind of variable, initialization is performed at application startup. Or not.

It is common usage to not rely on automatic initialization.

mouviciel
  • 15,473
  • 1
  • 37
  • 64
0

Initializing variables (implicitly or explicitly) is crucial. Not initializing a variable is always an error (they might be initialized implicitly, however. See below). Modern compliers like the C# compiler (as an example) treat this as an error and won't let you execute the code. An uninitialized variable is simply useless and harmful. Unless you are creating a random number generator, you expect from a piece of code to produce a deterministic and reproducible result. This can only be achieved if you start working with initialized variables.

The really interesting question is whether a variable is initialized automatically or whether you have to do it manually. It depends on the language used. In C# for instance, fields, i.e. "variables" at the class level, are always automatically initialized to the default value for that variable type default(T). This value corresponds to a bit pattern consisting of all zeroes. This is part of the language specification and not just a technical detail of the implementation of the language. Therefore you can safely rely on it. It is safe not to initialize a variable explicitly if (and only if) the language specification states that it is initialized implicitly. If you want another value, you must initialize the variable explicitly. However; in C# local variables, i.e. variables declared in methods, are not initialized automatically and you must always initialize the variable explicitly.

0

As @MatthewWhited said in a comment, it can be dangerous to zero (or default) initialize your variables.

In C, good compilers can notice when you use a variable uninitialized, and will complain. Even if the variable is declared in a translation unit and used in a different translation unit, recent versions of GCC will be able to detect such mistakes with the help of -fanalyzer.

So it is really difficult to have a bug because of using an uninitialized variable with the right tools.

But, if you initialize it manually with garbage (and here I'll consider 0 as garbage, as it's a value that probably isn't meaningful), the compiler won't be able to detect such errors, and instead you'll have a program that compiles, but that may have subtle bugs that won't be noticed except when it's too late.

Never initialize a variable if that's not meaningful. Normally, don't initialize a variable (C89 style is a good thing here). And if you have to, make sure you handle the case when it's not assigned again after that.

0

tl;dr The comomon advice is to always initialize, but this is sometimes unproductive or even harmful.


It is commonly recommended to "always initialize" (even to a zero/default value that would never be used), but that is not always productive. Ultimately, it is a tradeoff, and you need to make the decision according to the specific use case.

  • Pros of "always initialize": It makes the program deterministic, including incorrect behaviour. Bugs with non-deterministic symptoms are difficult to track down.

  • Cons of "always initialize": It has the potential to hide bugs. Accessing a non-initialized value can be detected by many debugging tools, such as compilers, static analysers or sanitizers. Blindly initializing to a default value (typically zero) will prevent these tools from detecting programming errors, but it will not eliminate those errors.

In my view, the choice depends on the application at hand. For example:

  1. If continued and predictable program operation under all conditions is important, then initialize.

    Catastrophic program failure such as a segmentation fault may have serious consequences in some applications. You don't want your drone falling out of the sky or an attacker gaining access to a locked down system because of a program crash.

    Related concept: Defensive programming.

  2. If producing correct results is important, then do not initialize, as it makes it harder to find all bugs.

    If you are writing scientific software, a crash does not usually have serious consequences, but silently returning wrong results does. As a scientist, I would always prefer to use a software that crashes (i.e. fails fast) when things go wrong instead of one that gives me incorrect results without any hint that something may have gone wrong, possibly costing me many days in work later or even a flawed publication.

I expect that most software engineers do work which falls into the first category, but it's good to be aware that some applications have different priorities. Scientific programming is sufficiently different that standard advice often doesn't apply—and not only when discussing initialization or debugging.

Szabolcs
  • 101
  • 3