12

While reading an answer here, I saw this code:

char ** v = malloc(0);
while ((r = strtok(s, " ")) != NULL) {
    char ** vv = realloc(v, (n+1)*sizeof(*vv));

The thing that bugged me was the call to malloc with an argument of zero. According to the standard, this will return either NULL or a non-NULL pointer that can be successfully passed to free. I know that this does not invoke any problems (except for if you do stuff like if (v == NULL) or similar) but is there any practical reason whatsoever to prefer malloc(0) instead of NULL?

I saw the argument "to indicate the goal of that pointer is to be given to realloc later". To me that sounds like a pretty strange argument. I cannot see the value of that convention at all. First because it's an extra function call that's not needed. And second because the value of telling that you will use realloc later seems almost zero. And according to the answers on this question it does not seem to be any technical benefits whatsoever.

Personally, if I ever felt the need to tell that realloc would be used later I'd do this:

char **v = NULL; // Will be realloced later

or give it a name that makes that intention clear. I would not use a strange unmotivated function call. But IMHO, just initializing it to NULL is a very clear indication that SOMETHING will be done to it later on. I don't see the value of knowing in advance that it's realloc. What's next? A convention saying that malloc(0*0) indicates that strdup will be used later?

So to sum it up, these are the cons that I know of:

  • An extra unnecessary function call
  • Looks weird if you don't know that it indicates later realloc (and still looks weird to me anyway)
  • May return a valid pointer that should not be dereferenced (just strange)
  • May allocate memory that you cannot use (quite pointless)
  • Less predictable. You may get NULL. You may get something else.

Pros:

  • ?

The only sensible explanation I can think of that this habit may come from is that it is something from very early C, before NULL became a part of stddef.h and calling malloc(0) was the only portable way to get a pointer that was guaranteed to be safe to be passed to free without allocating anything. Could that be the case?

So is this really an accepted convention for indicating a later realloc? If so, is it a good convention? Does it have any benefits that I fail to see?

There is a related question on SO: What's the point of malloc(0)?

Clarification:

I'm not talking about malloc(n) where n happens to be zero in some cases. I'm talking about calling malloc(0) on purpose.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
klutt
  • 1,428
  • 1
  • 10
  • 25
  • 1
    *just initializing it to NULL is a very clear indication that SOMETHING will be done to it later on* You left out "and if it remains `NULL` that's a clear indication that SOMETHING was NOT done to it." Using `malloc(0)` removes that information. – Andrew Henle Jun 07 '20 at 16:24
  • Have you omitted any code for simplicity? Where does the value of `n` come from? What happens to `v` after this? For these two lines v could be NULL without affecting functionality. My guess is the code here is trying to allocate a buffer big enough to hold a token on the heap, and the code assumes the `realloc` function will only grow the allocation and never shrink it and does this efficiently. (probably true except for the efficient part) But why there is both a `v` and a `vv` variable isn't clear from the code quoted. – MZB Jun 09 '20 at 03:30
  • @MZB I cannot see how it would matter. Can you demonstrate something where initializing to `malloc(0)` instead of `NULL` is a good idea? The reason for `vv` is the standard one, which is a `v=vv` later. – klutt Jun 09 '20 at 07:40
  • @klutt With the code as illustrated, there's no reason to have separate v and vv variables - in fact it looks like a bug. Perhaps v is checked to be non-zero in some following logic? With just the snippet to go on, it's impossible to tell why both variables exist. (Although a likely possibility is the author didn't have the realloc manual page to hand and couldn't remember it's treatment of NULL values). – MZB Jun 10 '20 at 18:57
  • @MZB The reason is that if malloc fails, then you can keep the data you have and recover. But the question is not why both exist. It's about `malloc(0)` – klutt Jun 10 '20 at 21:28
  • @chux-ReinstateMonica Done – klutt Aug 28 '20 at 11:19
  • 1
    " either NULL or a pointer that can be successfully passed to free. " --> note that `free(NULL)` is well defined, so any return value from `*alloc()`, `NULL` or not, can be used in `free()`. – chux - Reinstate Monica Feb 15 '21 at 20:48
  • @chux-ReinstateMonica Fixed – klutt Feb 15 '21 at 20:50

4 Answers4

13

In my opinion, that is a horrible paradigm.

I see absolutely no pros and at least three substantial cons.

Needless code complexity

Since malloc(0) can return NULL, the code has to be written to handle that anyway.

And since malloc(0) can also produce a non-NULL result, the code also has to be written in a way to handle a non-NULL pointer.

Pointer state loses all meaning

By potentially producing a pointer that can not be dereferenced, malloc(0) removes a critical distinction between NULL and non-NULL pointers: the distinction where NULL pointers mean "there's nothing here" and non-NULL pointers mean "here's some actual valid data".

The NULL/non-NULL state of a pointer loses all information.

Using malloc(0) renders the almost universal use of code such as if (ptr) ... or if (ptr != NULL) ... useless by removing information from the state of a pointer simply being non-NULL. This simple code

if ( ptr )
{
    ...

would have to be

if ( ptr && pointerActuallyPointsToActualObject )
{
    ...

And now there are two values - the pointer and its "validity flag" that have to be kept in sync and passed around.

Code such as

Foo *dataPtr = getNewFoo();

would no longer work should the prospective new Foo * being returned from the function be initialized with malloc(0) because a non-NULL pointer would no longer mean "no new Foo for you!".

Substantially Increased Potential for Heisenbugs

Any non-NULL pointer that can not be safely dereferenced creates serious potential Heisenbugs.

In general, any erroneous dereference of a NULL pointer results in an immediate failure where the cause is obvious. Dereferencing a non-NULL pointer that can not be safely dereferenced is extremely likely to result in corrupt data and/or a corrupt heap, laying a land mine or twelve that will cause later failures in what can be totally unrelated code.

You code will have bugs. There's nothing but downside in using a code construct that makes those bugs more likely to occur along with making them harder to find when they do occur.

Andrew Henle
  • 357
  • 2
  • 4
  • 4
    There is a crucial difference between "no data" (yet) and "data of length zero" in many cases. Not that `malloc(0)` helps in either case. – Deduplicator Jun 07 '20 at 17:02
  • @Deduplicator True, but in a case like that you have to have to pass additional information such as data length that can't be conveyed simply by `NULL`/non-`NULL` state. – Andrew Henle Jun 07 '20 at 17:05
  • If malloc(n) returns a non-null pointer then I can access the first n bytes. That's the same for malloc(0); I can access the first zero bytes, that is none. There is no difference between both cases. Whenever you have a pointer, you assume that it points to a region containing n accessible bytes, and you know there is no way to check this. – gnasher729 Feb 16 '21 at 08:48
  • @gnasher729 *If malloc(n) returns a non-null pointer then I can access the first n bytes. That's the same for malloc(0); I can access the first zero bytes, that is none. There is no difference between both cases.* If the pointer is `NULL`, it's obvious that there's no object. If the pointer is not null but can't be dereferenced because it's the result of `malloc(0)` there's no way to tell from the state of the pointer whether or not there's an object that can be accessed. You have to drag a variable around with the pointer so you know if it's valid or not. That's a ***huge*** difference. – Andrew Henle Feb 16 '21 at 10:11
  • @AndrewHenle, if I call malloc(n) for any n and get a non-null pointer then I _always_ must know n to know how many bytes I can access. Whether n = 1,000,000 or n = 5 or n = 0 doesn't make a difference. If you don't know n, you're stuffed. I can't access p[2] if n <= 2. I can't access p[1] if n ≤ 1. And I can't access p[0] if n ≤ 0. – gnasher729 Feb 16 '21 at 14:25
  • @gnasher729 You miss the point: if you use `malloc(0)`, `p != NULL` loses all meaning. And since you can't know if `malloc(0)` returns `NULL` or an invalid non-`NULL` pointer, your code is now needlessly complex. And did you read [the Linux man page for `malloc()`](https://man7.org/linux/man-pages/man3/free.3.html): "If size is 0, then malloc() returns either NULL, or a unique pointer value that can later be successfully passed to free()." Glibc doesn't even ***bother*** to tell you what you're going to get so you have to handle ***both*** cases. – Andrew Henle Feb 17 '21 at 10:18
  • @gnasher729 You've yet to state a cogent positive to using `malloc(0)`. – Andrew Henle Feb 17 '21 at 10:19
5

Note that since C17/18 a subtle addition occurred:

If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned to indicate an error, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object. § 7.22.3 1

Now when malloc(0) returns NULL, that indicated an error1 such as out-of-memory or perhaps code reached a maximum number of allocations or ...


The only value I see with the below is determining, in this case via debugger as a weak test, to find if any memory or allocation is available.

char **v = malloc(0);
while ((r = strtok(s, " ")) != NULL) {
  char **vv = realloc(v, (n+1)*sizeof(*vv));

is it a good convention?

No, simple is better.

char **v = NULL;

1 C specification is fuzzy if a size of 0 in itself an error. So this new added to indicate an error I find less helpful.

In general, I try to avoid *alloc(0) as C historically lacks clarity with 0 and so I simply assign NULL with size 0 to avoid any ambiguity.

  • "A request for zero bytes always fails." still remains a valid choice for implementers. The only sticky point might be `realloc()`. – Deduplicator Jun 18 '20 at 00:17
  • @Deduplicator True. Any source for the quote? – chux - Reinstate Monica Jun 18 '20 at 01:20
  • 1
    *If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned to indicate an error* I don't read that as precluding an implementation from **defining** `malloc(0)` as an error and returning `NULL` for `malloc(0)` just as before. The only change is that a fully-conforming implementation would now have to set `errno` to something like `EINVAL`. – Andrew Henle Feb 15 '21 at 19:09
  • @AndrewHenle "only change is that a fully-conforming implementation would now have to set errno to something like EINVAL" is unclear as `EINVAL` is not in the C spec and "7.22.3 Memory management functions" does not specify any setting of `errno`. – chux - Reinstate Monica Feb 15 '21 at 19:39
  • @chux-ReinstateMonica That would fall under ["implementation-defined"](https://pubs.opengroup.org/onlinepubs/9699919799/functions/malloc.html) then. – Andrew Henle Feb 15 '21 at 20:05
  • 4
    It also occurs to me that if setting `errno` is not required, the change in wording in the C17/18 standard is meaningless. Pre-C17/18: "`malloc(0)` can return `NULL` or a non-`NULL` pointer that can not be safely dereferenced." C17/18: "`malloc(0)` can return `NULL` to indicate an error or a non-`NULL` pointer that can not be safely dereferenced." Without requiring `errno` to be set, there is absolutely zero observable difference between those two. – Andrew Henle Feb 15 '21 at 20:14
  • 1
    @AndrewHenle I agree - the change in spec did not advance overall clarity. I go by "just avoid" `malloc(0)` as I anticipate yet another subtle spec change. – chux - Reinstate Monica Feb 15 '21 at 20:31
  • @chux-ReinstateMonica I suspect after thirty+ years, there's too much legacy code either way to mandate any changes. – Andrew Henle Feb 15 '21 at 22:47
  • "A request for 0 bytes always fails" would be a rubbish implementation. "A request for 0 bytes intentionally always returns a null pointer" is perfectly fine. Since the behaviour is implementation defined, the implementation has to provide its choice in writing. – gnasher729 Feb 16 '21 at 14:22
  • klutt, "Why would you ever use `malloc(0)`?" --> variation on [Hanlon's razor](https://en.wikipedia.org/wiki/Hanlon%27s_razor) without malice. – chux - Reinstate Monica Feb 16 '21 at 16:19
1

The usual explanation is just "someone thought it was a good idea". Another explanation is "we've always done it this way". Another good explanation is: "I didn't want to change existing code, and I didn't want to introduce different code, so I copied the existing pattern".

If someone wrote "char** p = malloc(0);" in one function and "char** p = NULL;" in another function, you would then ask yourself why this was done in different ways, and you would be worried that something is going on that you don't understand (like it could be a workaround for a compiler bug; that kind of thing has happened). So being consistent is useful, even if it is consistent "not clever".

And there is one really important use case: If you return the pointer to a caller, and the caller checks for null pointers to detect errors. If allocating nothing beyond the 0 bytes is Ok (no error), and your implementation defines that malloc(0) doesn't return a null pointer except when there is no memory, then starting with malloc(0) is correct.

gnasher729
  • 42,090
  • 4
  • 59
  • 119
  • Besides that this guy actually used them in a different way. They used `maooc(0)` to indicate that they will use `realloc` later on. ;) – klutt Feb 16 '21 at 08:55
  • And tbh, your last paragraph does not make sense, because if you allocate zero bytes, you cannot really error check it. See Andrews answer in the part of null pointers losing their meaning. – klutt Feb 16 '21 at 08:58
  • @klutt: Think about it. Your caller writes void* p = BigFunction(); if (p == NULL) error();. To the caller a null pointer is an error. To you, a result containing 0 bytes is just fine. You'd have to return malloc(0) in that case to satisfy the caller. – gnasher729 Feb 16 '21 at 14:19
  • 1
    If consistency is useful, you can't use `malloc(0)` because you can't know if it will return `NULL` or a non-`NULL` pointer. – Andrew Henle Feb 16 '21 at 14:41
  • Andrew, read my answer more carefully. – gnasher729 Feb 17 '21 at 10:02
  • *starting with malloc(0) is correct* You keep trying to construct contrived situations where there's some advantage to using `malloc(0)` to get a non-`NULL` pointer to no memory. At best, you've identified a corner case on some implementations that's prone to spectacular failure should the code be ported to another system. And you think writing code so dependent on an unreliable feature of a particular C runtime implementation that it's *de facto* non-portable to the point it's dangerously unstable should it be ported to another system is a ***positive***?!? – Andrew Henle Feb 17 '21 at 10:26
  • I call that a thermonuclear land mine. – Andrew Henle Feb 17 '21 at 10:26
0

The realloc() function

Reallocates the given area of memory. It must be previously allocated by malloc(), calloc() or realloc() and not yet freed with a call to free or realloc. Otherwise, the results are undefined.

While passing a null pointer to realloc() works, it seems harmonious to initially use malloc(0).

cwallach
  • 327
  • 1
  • 7
  • I agree that `char *p = malloc(0); p = realloc(p, size);` is safe, or at least not less safe than a `char *p = NULL; p = realloc(p, size);` But it's still very weird. – klutt Feb 17 '21 at 09:43
  • If you pass a null pointer to realloc() it works just the same as malloc(). – gnasher729 Feb 17 '21 at 10:03