52

I was reading a thread titled "strlen vs sizeof" on CodeGuru, and one of the replies states that "it's anyways [sic] bad practice to initialie [sic] a char array with a string literal."

Is this true, or is that just his (albeit an "elite member") opinion?


Here is the original question:

#include <stdio.h>
#include<string.h>
main()
{
    char string[] = "october";
    strcpy(string, "september");

    printf("the size of %s is %d and the length is %d\n\n", string, sizeof(string), strlen(string));
    return 0;
}

right. the size should be the length plus 1 yes?

this is the output

the size of september is 8 and the length is 9

size should be 10 surely. its like its calculating the sizeof string before it is changed by strcpy but the length after.

Is there something wrong with my syntax or what?


Here is the reply:

It's anyways bad practice to initialie a char array with a string literal. So always do one of the following:

const char string1[] = "october";
char string2[20]; strcpy(string2, "september");
Pacerier
  • 4,973
  • 7
  • 39
  • 58
Cole Tobin
  • 1,440
  • 1
  • 15
  • 28

6 Answers6

69

It's anyways bad practice to initialie a char array with a string literal.

The author of that comment never really justifies it, and I find the statement puzzling.

In C (and you've tagged this as C), that's pretty much the only way to initialize an array of char with a string value (initialization is different from assignment). You can write either

char string[] = "october";

or

char string[8] = "october";

or

char string[MAX_MONTH_LENGTH] = "october";

In the first case, the size of the array is taken from the size of the initializer. String literals are stored as arrays of char with a terminating 0 byte, so the size of the array is 8 ('o', 'c', 't', 'o', 'b', 'e', 'r', 0). In the second two cases, the size of the array is specified as part of the declaration (8 and MAX_MONTH_LENGTH, whatever that happens to be).

What you cannot do is write something like

char string[];
string = "october";

or

char string[8];
string = "october";

etc. In the first case, the declaration of string is incomplete because no array size has been specified and there's no initializer to take the size from. In both cases, the = won't work because a) an array expression such as string may not be the target of an assignment and b) the = operator isn't defined to copy the contents of one array to another anyway.

By that same token, you can't write

char string[] = foo;

where foo is another array of char. This form of initialization will only work with string literals.

EDIT

I should amend this to say that you can also initialize arrays to hold a string with an array-style initializer, like

char string[] = {'o', 'c', 't', 'o', 'b', 'e', 'r', 0};

or

char string[] = {111, 99, 116, 111, 98, 101, 114, 0}; // assumes ASCII

but it's easier on the eyes to use string literals.

EDIT2

In order to assign the contents of an array outside of a declaration, you would need to use either strcpy/strncpy (for 0-terminated strings) or memcpy (for any other type of array):

if (sizeof string > strlen("october"))
  strcpy(string, "october");

or

strncpy(string, "october", sizeof string); // only copies as many characters as will
                                           // fit in the target buffer; 0 terminator
                                           // may not be copied, but the buffer is
                                           // uselessly completely zeroed if the
                                           // string is shorter!
8bittree
  • 5,637
  • 3
  • 27
  • 37
John Bode
  • 10,826
  • 1
  • 31
  • 43
  • 17
    [`strncpy` is rarely the right answer](http://the-flat-trantor-society.blogspot.com/2012/03/no-strncpy-is-not-safer-strcpy.html) – Keith Thompson Jan 17 '13 at 21:52
  • 1
    @KeithThompson: not disagreeing, just added it for completeness' sake. – John Bode Jan 17 '13 at 21:58
  • 19
    Please note that `char[8] str = "october";` is bad practice. I had to literally char count myself to make sure it wasn't an overflow and it breaks under maintenance... e.g. correcting a spelling error from `seprate` to `separate` will break if size not updated. – djechlin May 14 '13 at 21:22
  • 1
    I agree with djechlin, it is bad practice for the reasons given. JohnBode's answer doesn't comment at all on the "bad practice" aspect (which is the main part of the question!!), it just explains what you can or cannot do to initialize the array. – mastov Jun 11 '15 at 11:46
  • 1
    Minor: As 'length" value returned from `strlen()` does not include the null character, using `MAX_MONTH_LENGTH` to hold the maximum size needed for `char string[]` often _looks_ wrong. IMO, `MAX_MONTH_SIZE` would be better here. – chux - Reinstate Monica May 09 '18 at 19:21
  • @mastov: So it's not bad practice, it's ignorance. What @John Bode is trying to convey is the fact that `char str[] = "I'm a string literal used for initialization and what you want to do with str is not my business after that, I will not be modified at all."`. No bad practice. – NingW Apr 05 '21 at 18:18
11

The only problem I recall is assigning string literal to char *:

char var1[] = "september";
var1[0] = 'S'; // Ok - 10 element char array allocated on stack
char const *var2 = "september";
var2[0] = 'S'; // Compile time error - pointer to constant string
char *var3 = "september";
var3[0] = 'S'; // Modifying some memory - which may result in modifying... something or crash

For example take this program:

#include <stdio.h>

int main() {
  char *var1 = "september";
  char *var2 = "september";
  var1[0] = 'S';
  printf("%s\n", var2);
}

This on my platform (Linux) crashes as it tries to write to page marked as read-only. On other platforms it might print 'September' etc.

That said - initialization by literal makes the specific amount of reservation so this won't work:

char buf[] = "May";
strncpy(buf, "September", sizeof(buf)); // Result "Sep"

But this will

char buf[32] = "May";
strncpy(buf, "September", sizeof(buf));

As last remark - I wouldn't use strcpy at all:

char buf[8];
strcpy(buf, "very long string very long string"); // Oops. We overwrite some random memory

While some compilers can change it into safe call strncpy is much safer:

char buf[1024];
strncpy(buf, something_else, sizeof(buf)); // Copies at most sizeof(buf) chars so there is no possibility of buffer overrun. Please note that sizeof(buf) works for arrays but NOT pointers.
buf[sizeof(buf) - 1] = '\0';
Maciej Piechotka
  • 2,465
  • 2
  • 18
  • 19
  • There's still a risk for buffer overrun on that `strncpy` because it doesn't null terminate the copied string when length of `something_else` is greater than `sizeof(buf)`. I usually set the last char `buf[sizeof(buf)-1] = 0` to protect from that, or if `buf` is zero-initialized, use `sizeof(buf) - 1` as the copy length. – syockit Jun 24 '16 at 18:30
  • 1
    Use `strlcpy` or `strcpy_s` or even `snprintf` if you have to. – user253751 Jan 23 '18 at 01:30
  • Fixed. Unfortunatly there is no easy portable way of doing this unless you have a luxury of working with newest compilers (`strlcpy` and `snprintf` are not directly accessible on MSVC, at least orders and `strcpy_s` are not on *nix). – Maciej Piechotka Jan 23 '18 at 01:47
  • @MaciejPiechotka: Well, thank god Unix rejected the microsoft-sponsored annex k. – Deduplicator Jan 23 '18 at 02:12
6

Primarily because you won't have the size of the char[] in a variable / construct that you can easily use within the program.

The code sample from the link:

 char string[] = "october";
 strcpy(string, "september");

string is allocated on the stack as 7 or 8 characters long. I can't recall if it's null-terminated this way or not - the thread you linked to stated that it is.

Copying "september" over that string is an obvious memory overrun.

Another challenge comes about if you pass string to another function so the other function can write into the array. You need to tell the other function how long the array is so it doesn't create an overrun. You could pass string along with the result of strlen() but the thread explains how this can blow up if string is not null-terminated.

You're better off allocating a string with a fixed size (preferably defined as a constant) and then pass the array and fixed size to the other function. @John Bode's comment(s) are correct, and there are ways to mitigate these risks. They also require more effort on your part to use them.

In my experience, the value I initialized the char[] to is usually too small for the other values I need to place in there. Using a defined constant helps avoid that issue.


sizeof string will give you the size of the buffer (8 bytes); use the result of that expression instead of strlen when you're concerned about memory.
Similarly, you can make a check before the call to strcpy to see if your target buffer is large enough for the source string: if (sizeof target > strlen(src)) { strcpy (target, src); }.
Yes, if you have to pass the array to a function, you'll need to pass its physical size as well: foo (array, sizeof array / sizeof *array);. – John Bode

  • 3
    `sizeof string` will give you the size of the *buffer* (8 bytes); use the result of that expression instead of `strlen` when you're concerned about memory. Similarly, you can make a check before the call to `strcpy` to see if your target buffer is large enough for the source string: `if (sizeof target > strlen(src)) { strcpy (target, src); }`. Yes, if you have to pass the array to a function, you'll need to pass its physical size as well: `foo (array, sizeof array / sizeof *array);`. – John Bode Jan 16 '13 at 17:03
  • 1
    @JohnBode - thanks, and those are good points. I have incorporated your comment into my answer. –  Jan 16 '13 at 17:16
  • 1
    More precisely, most references to the array name `string` result in an implicit conversion to `char*`, pointing to the first element of the array. This loses the array bounds information. A function call is just one of the many contexts in which this happens. `char *ptr = string;` is another. Even `string[0]` is an example of this; the `[]` operator works on pointers, not directly on arrays. Suggested reading: Section 6 of the [comp.lang.c FAQ](http://www.c-faq.com/). – Keith Thompson Jan 17 '13 at 21:47
  • Finally an answer that actually refers to the question! – mastov Jun 11 '15 at 11:48
6

One thing that neither thread brings up is this:

char whopping_great[8192] = "foo";

vs.

char whopping_great[8192];
memcpy(whopping_great, "foo", sizeof("foo"));

The former will do something like:

memcpy(whopping_great, "foo", sizeof("foo"));
memset(&whopping_great[sizeof("foo")], 0, sizeof(whopping_great)-sizeof("foo"));

The latter only does the memcpy. The C standard insists that if any part of an array is initialized, it all is. So in this case, it's better to do it yourself. I think that may have been what treuss was getting at.

For sure

char whopping_big[8192];
whopping_big[0] = 0;

is better than either:

char whopping_big[8192] = {0};

or

char whopping_big[8192] = "";

p.s. For bonus points, you can do:

memcpy(whopping_great, "foo", (1/(sizeof("foo") <= sizeof(whopping_great)))*sizeof("foo"));

to throw a compile time divide by zero error if you're about to overflow the array.

Richard Fife
  • 61
  • 1
  • 2
2

I think the "bad practise" idea comes from the fact that this form :

char string[] = "october is a nice month";

makes implicitly a strcpy from the source machine code to the stack.

It is more efficient to handle only a link to that string. Like with :

char *string = "october is a nice month";

or directly :

strcpy(output, "october is a nice month");

(but of course in most code it probably doesn't matter)

toto
  • 21
  • 1
  • Wouldn't it only make a copy if you attempt to modify it? I would think the compiler would be smarter than that – Cole Tobin Sep 02 '15 at 23:38
  • 2
    What about cases like `char time_buf[] = "00:00";` where you're going to be modifying a buffer? A `char *` initialized to a string literal is set to the address of the first byte, so trying to modify it results in undefined behavior because the method of the string literal's storage is unknown (implementation defined), while modifying the bytes of a `char[]` is perfectly legal because the initialization copies the bytes to a writeable space allocated on the stack. To say that it's "less efficient" or "bad practice" without elaborating on the nuances of `char* vs char[]` is misleading. – Braden Best Aug 08 '16 at 16:49
-3

Never is really long time, but you should avoid initialization char[] to string, because, "string" is const char*, and you are assigning it to char*. So if you pass this char[] to method who changes data you can have interesting behavior.

As commend said I mixed a bit char[] with char*, that is not good as they differs a bit.

There's nothing wrong about assigning data to char array, but as intention of using this array is to use it as 'string' (char *), it is easy to forget that you should not modify this array.

Dainius
  • 340
  • 1
  • 8
  • 3
    Incorrect. The initialization copies the contents of the string literal into the array. The array object isn't `const` unless you define it that way. (And string literals in C are not `const`, though any attempt to modify a string literal does have undefined behavior.) `char *s = "literal";` does have the kind of behavior you're talking about; it's better written as `const char *s = "literal";` – Keith Thompson Jan 17 '13 at 21:48
  • indeed my fault, I mixed char[] with char*. But I wouldn't be so sure about copying content to array. Quick check with MS C compilator shows that 'char c[] = "asdf";' will create 'string' in const segment and then assign this address to array variable. That's actually a reason why I said about avoiding assignments to non const char array. – Dainius Jan 18 '13 at 07:45
  • I'm skeptical. Try [this program](https://gist.github.com/4563063) and let me know what output you get. – Keith Thompson Jan 18 '13 at 08:04
  • Logically, given `char c[] = "abcd";`, the string literal denotes a 5-byte array that may (or may not) be stored in some kind of read-only memory, and the initialization copies its contents into `c`, which is writable. Since the literal is only referred to once, and its address is not used, the compiler has chosen to optimize it away, storing the bytes directly into `c` -- which is a perfectly valid optimization. No compiler will allocate `c` to read-only memory (unless perhaps it can prove that the program never modifies it). I promise you that `c` is modifiable. – Keith Thompson Jan 18 '13 at 08:20
  • 3
    *"And in generally "asdf" is a constant, so it should be declared as const."* -- The same reasoning would call for a `const` on `int n = 42;`, because `42` is a constant. – Keith Thompson Jan 18 '13 at 08:21
  • 2
    It doesn't matter what machine you're on. The language standard guarantees that `c` is modifiable. It's exactly as strong a guarantee as the one that `1 + 1` evaluates to `2`. If [the program I linked to above](https://gist.github.com/4563063) does anything other than printing `EFGH`, it indicates a non-conforming C implementation. – Keith Thompson Jan 18 '13 at 15:40
  • 1
    @Dainus: the MSVC compiler has an optimisation called 'string pooling' which will put a single copy of identical strings into a read-only segment if it can guarantee that uses of them are read-only. Turn off the optimisation to see 'normal' behaviour. FYI the "Edit and Continue" requires this option to be on. More info here: http://msdn.microsoft.com/en-us/library/s0s0asdt.aspx – JBRWilkinson May 06 '13 at 20:44
  • 1
    I think Dainius is suggesting that in many cases the error is that the variable itself should be marked `const char *const` to prevent modification of the bytes or the pointer itself, but in many cases programmers will leave one or both mutable allowing some runtime code to modify what appears to be a typed constant (but is not constant). – T Percival Mar 20 '15 at 18:57