20

Usually in C, we have to tell the computer the type of data in variable declaration. E.g. in the following program, I want to print the sum of two floating point numbers X and Y.

#include<stdio.h>
main()
{
  float X=5.2;
  float Y=5.1;
  float Z;
  Z=Y+X;
  printf("%f",Z);

}

I had to tell the compiler the type of variable X.

  • Can't the compiler determine the type of X on its own?

Yes, it can if I do this:

#define X 5.2

I can now write my program without telling the compiler the type of X as:

#include<stdio.h>
#define X 5.2
main()
{
  float Y=5.1;
  float Z;
  Z=Y+X;
  printf("%f",Z);

}  

So we see that C language has some kind of feature, using which it can determine the type of data on its own. In my case it determined that X is of type float.

  • Why do we have to mention the type of data, when we declare something in main()? Why can't the compiler determine the data type of a variable on its own in main() as it does in #define.
user106313
  • 638
  • 1
  • 6
  • 21
  • 14
    Actually those two programs are not equivalent, in that they may give slightly different outputs! `5.2` is a `double`, so the first program rounds the double literals to `float` precision, then adds them as floats, while the second rounds the double representation of 5.1 back to `double` and adds it to the `double` value 5.2 using `double` addition, *then* rounds the result of that calculation to `float` precision. Because the rounding occurs in different places, the result may dffer. This is just one example for the types of variables affecting the behavior of an otherwise identical program. –  Dec 06 '14 at 11:09
  • 12
    When you do `#define X 5.2`, `X` is not a variable, but a constant, so it is literally replaced be preprocessor with `5.2` anywhere you mentioned `X`. You cannot reassign `X`. – scriptin Dec 06 '14 at 11:19
  • 16
    Just as a note: this is a blessing and a curse. On one hand, you have to type a few characters when the compiler really could have done it for you (C++'s `auto` actually does what you want). On the other hand, if you think you know what your code is doing, and you actually typed something else, static typing like this will catch an error earlier, before it becomes a huge problem. Every language strikes a balance: static typing, type inferrence, dynamic typing. For some tasks, the extra typing is actually worth it. For others, its a waste. – Cort Ammon Dec 06 '14 at 16:57
  • Learn Ocaml and/or Haskell.... you'll be happy about their type inference abilities. – Basile Starynkevitch Jan 10 '15 at 09:07

4 Answers4

47

You are comparing variable declarations to #defines, which is incorrect. With a #define, you create a mapping between an identifier and a snippet of source code. The C preprocessor will then literally substitute any occurrences of that identifier with the provided snippet. Writing

#define FOO 40 + 2
int foos = FOO + FOO * FOO;

ends up being the same thing to the compiler as writing

int foos = 40 + 2 + 40 + 2 * 40 + 2;

Think of it as automated copy&paste.

Also, normal variables can be reassigned, while a macro created with #define can not (although you can re-#define it). The expression FOO = 7 would be a compiler error, since we can't assign to “rvalues”: 40 + 2 = 7 is illegal.

So, why do we need types at all? Some languages apparently get rid of types, this is especially common in scripting languages. However, they usually have something called “dynamic typing” where variables don't have fixed types, but values have. While this is far more flexible, it's also less performant. C likes performance, so it has a very simple and efficient concept of variables:

There's a stretch of memory called the “stack”. Each local variable corresponds to an area on the stack. Now the question is how many bytes long does this area have to be? In C, each type has a well-defined size which you can query via sizeof(type). The compiler needs to know the type of each variable so that it can reserve the correct amount of space on the stack.

Why don't constants created with #define need a type annotation? They are not stored on the stack. Instead, #define creates reusable snippets of source code in a slightly more maintainable manner than copy&paste. Literals in the source code such as "foo" or 42.87 are stored by the compiler either inline as special instructions, or in a separate data section of the resulting binary.

However, literals do have types. A string literal is a char *. 42 is an int but can also be used for shorter types (narrowing conversion). 42.8 would be a double. If you have a literal and want it to have a different type (e.g. to make 42.8 a float, or 42 an unsigned long int), then you can use suffixes – a letter after the literal that changes how the compiler treats that literal. In our case, we might say 42.8f or 42ul.

Some languages have static typing as in C, but the type annotations are optional. Examples are ML, Haskell, Scala, C#, C++11, and Go. How does that work? Magic? No, this is called “type inference”. In C# and Go, the compiler looks at the right hand side of an assignment, and deduces the type of that. This is fairly straightforward if the right hand side is a literal such as 42ul. Then it's obvious what the type of the variable should be. Other languages also have more complex algorithms that take into account how a variable is used. E.g. if you do x/2, then x can't be a string but must have some numeric type.

amon
  • 132,749
  • 27
  • 279
  • 375
  • Thank you for explaining. The thing I understand is that when we declare the type of a variable(local or global), we are actually telling the compiler, how much space should it reserve for that variable in the stack. On the other hand in `#define` we are having a constant which is directly converted into binary code--however long it may be--and it is stored in the memory as it is. – user106313 Dec 06 '14 at 13:59
  • 2
    @user31782 - not quite. When you declare a variable the type tells the compiler what properties the variable has. One of those properties is size; other properties include how it represents values and what operations can be performed on those values. – Pete Becker Dec 06 '14 at 17:05
  • @PeteBecker Then how does the compiler know these other properties in `#define X 5.2`? – user106313 Dec 06 '14 at 17:10
  • @user31782 - `5.2` is a floating-point literal of type double, so it has the properties of a double. – Pete Becker Dec 06 '14 at 17:12
  • If the complier can know that 5.2 has the properteis of double on its own then why can't it do the same in variable declaration. – user106313 Dec 06 '14 at 17:13
  • @user31782 Please think of `#define` as a copy&paste mechanism, and nothing else. This copy&pasting is performed before the source code reaches the actual compiler that knows about syntax and types etc.. I tried to explain in my post how the C compiler figures out the type of literals, and in fact it would be trivial to put simple type inference into C – the necessary work is already being done by the type checker. But type inference was cutting-edge technology in academia when C was designed, and since then no one seems to have fought hard enough to include it in a later standard. – amon Dec 06 '14 at 17:29
  • amon, so `#define` statement is equivalent to writing a literal in the code. The C compiler automatically assign types to literals, int to simple integers, double to floating point numbers and ASCII value to character. This is fine, but I do not think that the _type_ tells the compiler anything more than just allocating the required space. E.g. `char c=15; int d=c/3;` ,this should be illegal(because characters cannot be divided), but the program runs well. I do not understand @PeteBecker's comment. – user106313 Dec 06 '14 at 18:20
  • As far as I know, in C and Java (possibly C++ too) `char`s and `int`s are interoperable. So, if you do some thing like `char a = 'a'; char b = a + 1;`, `b` will store `'b'`, because you incremented the ASCII value once. – Soham Chowdhury Dec 06 '14 at 18:35
  • Ok, there is something more than just allocating space. `float h; h='A'; printf("%d",h)` doesn't work. Perhaps I get PeteBecker's point. – user106313 Dec 06 '14 at 18:53
  • @user31782 - `char` is an integral type, just like `short`, `int`, and `long`. Not to mention the abomination of `long long`. They all support arithmetic operations. So `int d = c/3;` is legal; it divides the numeric value stored in `c` (i.e., 15) by 3 and stores the result into `d`. – Pete Becker Dec 06 '14 at 18:53
  • @PeteBecker What's wrong with this: `float h; h='A'; printf("%d",h)`? 'A' is 65, so `h` should store 65. When I print 'h' it gives me 0--if I print it as a decimal. It gives the correct value when I print it as float, 65.00000 – user106313 Dec 06 '14 at 18:58
  • @PeteBecker You said,"_other properties include how it represents values and what operations can be performed on those values._". I get the first part, e.g. int h=5.5; will become h=5, i.e. specifying int makes the data completely integer. But I do not get the second part "_and what operations can be performed on those values_". It seems like I can perform any operation on any data type. – user106313 Dec 06 '14 at 19:05
  • @user31782, `What's wrong with this: float h; h='A'; printf("%d",h)` Now your are mixing in the runtime behavior of a library function. By specifying the '%d' in the format specifier you are directing the printf function to treat the variable `h` as an integer. Note that this does not mean "convert h to an integer"! It means "interpret the bits stored in h as an integer, regardless of its actual type." On my computer your printf call prints out `1405020384`. I'm suspicious of your seeing `0` in the output. Are you running an Intel CPU? You may have a typo in your program. – Charles E. Grant Dec 06 '14 at 19:27
  • @user31782 (cont) Contrast this with `printf("%d\n",(int) h);` Here the cast directs the compiler to force a conversion to an integer before passing the results to printf. This prints '65' as you expect. – Charles E. Grant Dec 06 '14 at 19:28
  • @CharlesE.Grant Yes I get 0, no typo there. This is the code that I ran(copy-paste):`#include main() { float h; h='A'; printf("%d",h); }`. I get 0 as the output. I am using intel cor-2 duo, window 7. – user106313 Dec 06 '14 at 19:29
  • 1
    That's because by passing the wrong type to `printf` you invoked undefined behavior. On my machine that snippet prints a different value each time, on [Ideone](http://ideone.com/Jdxvd7) it crashes after printing zero. – Matteo Italia Dec 06 '14 at 19:49
  • 4
    @user31782 - "It seems like I can perform any operation on any data type" No. `X*Y` is not valid if `X` and `Y` are pointers, but it's okay if they're `int`s; `*X` isn't valid if `X` is an `int`, but it's okay if it's a pointer. – Pete Becker Dec 06 '14 at 19:50
  • It might be worth pointing out that in the case of `#define`, the compiler never sees `X`. I think that's the thing that the OP is failing to grasp. – Bryan Oakley Dec 07 '14 at 02:48
  • @BryanOakley I understand that `#define` acts as a copy/paste mechanism. It works before the compilation phase. It replaces `X` with `5.2` in the source code itself. – user106313 Dec 07 '14 at 06:04
4

X in the second example is never a float. It is called a macro, it replaces the defined macro value 'X' in the source with the value. A readable article on #define is here.

In the case of the supplied code, before compilation the preprocessor changes the code

Z=Y+X;

to

Z=Y+5.2;

and that is what gets compiled.

That means you can also replace those 'values' with code like

#define X sqrt(Y)

or even

#define X Y
James Snell
  • 3,168
  • 14
  • 17
1

The short answer is C needs types because of history / representing the hardware.

History: C was developed in the early 1970s and intended as a language for systems programming. Code is ideally fast and makes the best use of the capabilities of the hardware.

Inferring types at compile time would have been possible, but the already slow compile times would have increased (refer to XKCD's 'compiling' cartoon. This used to apply to 'hello world' for at least 10 years after C was published). Inferring types at runtime would not have fitted the aims of systems programming. Runtime inferrence requires additional run time library. C came long before the first PC. Which had 256 RAM. Not Gigabytes or Megabytes but Kilobytes.

In your example, if you omit the types

   X=5.2;
   Y=5.1;

   Z=Y+X;

Then the compiler could have happily worked out that X & Y are floats and made Z the same. In fact, a modern compiler would also work out that X & Y aren't needed and just set Z to 10.3.

Assume that the calculation is embedded inside a function. The function writer might want to use their knowledge of the hardware, or the problem being solved.

Would a double be more appropriate than a float ? Takes more memory and is slower but the accuracy of the result would be higher.

Maybe the return value of the function could be int (or long) because the decimals were not important, although conversion from float to int is not without cost.

The return value could also be made double guaranteeing that float + float does not overflow.

All of these questions seem pointless for the vast majority of code written today, but were vital when C was produced.

itj
  • 136
  • 1
  • 5
  • 1
    this doesn't explain eg why type declarations weren't made optional, allowing programmer to pick either to declare them explicitly or rely on compiler to infer – gnat Jan 07 '15 at 19:37
  • 1
    Indeed it doesn't @gnat. I have tweaked the text but there was no point in so doing at the time. The domain C was designed for actually wanted to decide to store 17 in 1 byte, or 2 bytes or 4 bytes or as a string or as 5 bits within a word. – itj Jan 09 '15 at 19:25
0

C doesn't have type inference (thats what it is called when a compiler guesses the type of a variable for you) because it is old. It was developed in the early 1970's

Many newer languages have systems that allow you to use variables without specifying their type (ruby, javascript, python, etc.)

  • 12
    None of the languages you mentioned (Ruby, JS, Python) have type inference as a language feature, although implementations may use it to increase efficiency. Instead, they use *dynamic typing* where values have types, but variables or other expressions don't. – amon Dec 06 '14 at 11:35
  • You are right, I edited the answer to be more correct. – Tristan Burnside Dec 06 '14 at 11:45
  • 2
    JS doesn't *allow* you to *omit* the type - it's just doesn't allow you to declare it whatsoever. It uses dynamic typing, where values have types (e.g. `true` is `boolean`), not variables (e.g. `var x` may contain value of any type). Also, type inference for such simple cases as those from question has been *probably* known a decade before C was released. – scriptin Dec 06 '14 at 12:11
  • 2
    That doesn't make the statement false (in order to force something you must also allow it). Type inference existing doesn't change the fact that C's type system is a result of its historical context (as opposed to a specifically stated philosophical reasoning or technical limitation) – Tristan Burnside Dec 06 '14 at 13:22
  • 2
    Considering that ML - which is pretty much just as old as C - has type inference, "it's old" isn't a good explanation. The context in which C was used and developed (small machines that demanded a very small footprint for the compiler) seems more likely. No idea why you'd mention dynamic typing languages instead of just some examples of languages with type inference - Haskell, ML, heck *C#* has it - hardly an obscure feature any more. – Voo Dec 06 '14 at 16:38
  • "...because it is old" ??? This is just plain incorrect. Lisp and Fortran are both much older than C. Neither of these require variable declarations with type info. – Brad S. Dec 06 '14 at 16:45
  • 2
    @BradS. Fortran is not a good example because the first letter of the variable name *is* a type declaration, unless you use `implicit none` in which case you *must* declare a type. – dmckee --- ex-moderator kitten Dec 07 '14 at 03:09
  • 1
    @dmckee Lisp is just as bad an example, because it has historically been dynamically typed (although I'm sure some dialect in recent years has tried static typing with type inference). There seems to be a great deal of confusion on the difference between dynamic typing and type inference in general here. – Voo Dec 07 '14 at 11:11
  • 1
    @Voo Admitedly, Fortran is a weak example but lisp is perfect. One never explicitly declares the type of a variable in lisp and lisp is much older than C. The point is that C is the way it is by design...not "because it is old". There are also new languages that do not have type inference...and look a lot like C. Again, the C programming language uses *explicit type declaration* because that is/was how the language is designed. Not "because it is old". – Brad S. Dec 07 '14 at 15:59