58

What is the advantage of returning a pointer to a structure as opposed to returning the whole structure in the return statement of the function?

I am talking about functions like fopen and other low level functions but probably there are higher level functions that return pointers to structures as well.

I believe that this is more of a design choice rather than just a programming question and I am curious to know more about the advantages and disadvantages of the two methods.

One of the reasons I thought that is would be an advantage to return a pointer to a structure is to be able to tell more easily if the function failed by returning NULL pointer.

Returning a full structure that is NULL would be harder I suppose or less efficient. Is this a valid reason?

yoyo_fun
  • 2,267
  • 3
  • 17
  • 22
  • 2
    The C language does not allow this. It is not clear if you are asking why programs don't do it (the answer is the language doesn't allow it, so programs CAN'T do it), or why the language doesn't allow it. – John R. Strohm Oct 19 '17 at 13:54
  • @JohnR.Strohm So it is not possible to return a variable that is of type struct something? I never tried it honestly. – yoyo_fun Oct 19 '17 at 13:57
  • 10
    @JohnR.Strohm I tried it and it actually works. A function can return a struct.... So what is the reason is not done? – yoyo_fun Oct 19 '17 at 14:01
  • 28
    Pre-standardization C did not allow structs to be copied or to be passed by value. The C standard library has many holdouts from that era that would not be written that way today, e.g. it took until C11 for the utterly misdesigned `gets()` function to be removed. Some programmers still have an aversion to copying structs, old habits die hard. – amon Oct 19 '17 at 14:12
  • 26
    `FILE*` is effectively an opaque handle. User code should not care what its internal structure is. – CodesInChaos Oct 19 '17 at 15:39
  • Nearly all languages don't even give you the choice, because returning by reference is a reasonable default. C's ability to return by value is much more rare and interesting. – Karl Bielefeldt Oct 19 '17 at 15:59
  • 3
    Return by reference is only a reasonable default when you have garbage collection. – Idan Arye Oct 19 '17 at 19:02
  • 7
    @JohnR.Strohm The "very senior" in your profile seems to go back before 1989 ;-) -- when ANSI C permitted what K&R C didn't: Copy structures in assignments, parameter passing and return values. K&R's original book indeed stated explicitly (I'm paraphrasing): "you can do exactly two things with a structure, take its address with `&` and access a member with `.`." – Peter - Reinstate Monica Oct 19 '17 at 20:17
  • @PeterA.Schneider Indeed, the current standard (paragraph 6.7.6.3) now only states _"A function declarator shall not specify a return type that is a function type or an array type."_ – pipe Oct 20 '17 at 07:16
  • Even though it turned out to be possible to do this, it might be a common belief that it's not possible. – Oskar Skog Oct 20 '17 at 18:01

6 Answers6

67

There are several practical reasons why functions like fopen return pointers to instead of instances of struct types:

  1. You want to hide the representation of the struct type from the user;
  2. You're allocating an object dynamically;
  3. You're referring to a single instance of an object via multiple references;

In the case of types like FILE *, it's because you don't want to expose details of the type's representation to the user - a FILE * object serves as an opaque handle, and you just pass that handle to various I/O routines (and while FILE is often implemented as a struct type, it doesn't have to be).

So, you can expose an incomplete struct type in a header somewhere:

typedef struct __some_internal_stream_implementation FILE;

While you cannot declare an instance of an incomplete type, you can declare a pointer to it. So I can create a FILE * and assign to it through fopen, freopen, etc., but I can't directly manipulate the object it points to.

It's also likely that the fopen function is allocating a FILE object dynamically, using malloc or similar. In that case, it makes sense to return a pointer.

Finally, it's possible you're storing some kind of state in a struct object, and you need to make that state available in several different places. If you returned instances of the struct type, those instances would be separate objects in memory from each other, and would eventually get out of sync. By returning a pointer to a single object, everyone's referring to the same object.

John Bode
  • 10,826
  • 1
  • 31
  • 43
  • 31
    A particular advantage of using the pointer as an opaque type is that the structure itself can change between library versions and you don't need to recompile the callers. – Barmar Oct 19 '17 at 21:54
  • 6
    @Barmar: Indeed, ABI Stability is *the* huge selling point of C, and it would not be as stable without opaque pointers. – Matthieu M. Oct 20 '17 at 12:06
40

There are two ways of "returning a structure." You can return a copy of the data, or you can return a reference (pointer) to it. It's generally preferred to return (and pass around in general) a pointer, for a couple of reasons.

First, copying a structure takes a lot more CPU time than copying a pointer. If this is something your code does frequently, it can cause a noticeable performance difference.

Second, no matter how many times you copy a pointer around, it's still pointing to the same structure in memory. All modifications to it will be reflected on the same structure. But if you copy the structure itself, and then make a modification, the change only shows up on that copy. Any code that holds a different copy won't see the change. Sometimes, very rarely, this is what you want, but most of the time it's not, and it can cause bugs if you get it wrong.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
Mason Wheeler
  • 82,151
  • 24
  • 234
  • 309
  • 59
    The drawback of returning by pointer: now you've got to track ownership of that object and possible free it. Also, pointer indirection may be more costly than a quick copy. There are a lot of variables here, so using pointers is not universally better. – amon Oct 19 '17 at 14:15
  • 19
    Also, pointers these days are 64 bits on most desktop and server platforms. I've seen more than a few structs in my career that would fit in 64 bits. So, you can't _always_ say that copying a pointer costs less than copying a struct. – Solomon Slow Oct 19 '17 at 15:01
  • In my experience the most popular approach is the caller passing in the pointer. – CodesInChaos Oct 19 '17 at 15:10
  • I would also add that sometimes when developing a public API, it's desirable to not even let the client know the contents of the structure. The API gives the client a 'handle' and the client passes that handle back to the API to perform operations on it. Many of the Windows APIs work like this. – 17 of 26 Oct 19 '17 at 17:06
  • 40
    This is mostly a good answer, but I disagree about the part *sometimes, very rarely, this is what you want, but most of the time it's not* - quite the opposite. Returning a pointer allows several kinds of unwanted side effects, and several kinds of nasty ways to get the ownership of a pointer wrong. In cases where CPU time is not that important, I prefer the copy variant, if that is an option, it is **much** less error prone. – Doc Brown Oct 19 '17 at 17:28
  • 2
    Mason, I would say regarding your second point that that is actually in many scenarios a con instead of a pro. If you are not so constrained, immutability can very well be worth the price. (Disclaimer, I don't write code where performance is critical or memory scarce) – bgusach Oct 19 '17 at 18:06
  • 6
    It should be noted that this really only applies for external APIs. For internal functions every even marginally competent compiler of the last decades will rewrite a function that returns a large struct to take a pointer as an additional argument and construct the object directly in there. The arguments of immutable vs mutable have been done often enough, but I think we can all agree that the claim that immutable data structures are almost never what you want is not true. – Voo Oct 19 '17 at 19:19
  • 4
    If you return a pointer that was not passed into the function you have to perform a dynamic allocation (as e.g. `strdup` does) (unless you return a pointer to static data which usually is not thread safe and limits resources). The cost of passing a few bytes on the stack, even without return value optimization, pale compared to a `malloc()`. – Peter - Reinstate Monica Oct 19 '17 at 20:22
  • 7
    You could also mention compilation fire walls as a pro for pointers. In large programs with widely shared headers incomplete types with functions prevent the necessity to re-compile every time an implementation detail changes. The better compilation behavior is actually a side effect of the encapsulation which is achieved when interface and implementation are separated. Returning (and passing, assigning) by value need the implementation information. – Peter - Reinstate Monica Oct 19 '17 at 20:29
  • 2
    Pros and cons both ways, but if the OP is talking about C Library function ("like fopen") remember that these were all written "back in the day". You can be pretty sure that the hardware available had an influence on the choice that was made here, as it did on the entire language. – MickeyfAgain_BeforeExitOfSO Oct 19 '17 at 20:59
  • 1
    @voo: "I think we can all agree that the claim that immutable data structures are almost never what you want is not true." No, actually I don't agree. There are good, legitimate uses for immutable data, but generally I find it gets in the way *far* more often than it helps. – Mason Wheeler Oct 20 '17 at 03:57
  • 2
    @Mason You're entitled to your opinion, but are you really disputing the fact remains that there's a large move towards immutable data structures and functional programming techniques? Just the simple fact that reasoning about mutable data in concurrent environments is hell is one great argument for immutability. – Voo Oct 20 '17 at 06:51
  • 3
    @Voo Programmers that use far too much mutable data in multithreads environment are just usually bad developers. This had nothing to do with a trend or anything. They just don't understand how important it is to avoid any unnecessary share of data between threads. Functionnal programming techniques are currently percing the most in fields where they're relevant : Big Data with MapReduce, which is a good things, but don't try to make it a silver bullet please. – Walfrat Oct 20 '17 at 07:25
  • 1
    @WAlfrat How is "immutable data has its uses" making it a silver bullet? – Voo Oct 20 '17 at 07:28
  • Indeed, I would change the "copying a structure takes a lot more CPU time than copying a pointer" part, because that generally isn't true. Considering all the overhead of handling pointers, and overhead of memory allocation, I'd say that if struct size is like 4x pointer size (32 bytes on 64 bit system), it's still more efficient to return struct as value. And many structs are actually that small. – hyde Oct 20 '17 at 07:30
  • 2
    @Voo how you worded it was more near of the silver bullet (without being it I agree) that just "has its uses". At least that how I understood it. If you're doing any kind of low level programming (kernel, real-time, ...), it's very likely that this isn't true. Even for OOP, where an object is meant to have a behaviour, I disagree with the principle to return a new instance. If an object behaviour traduce the fact that he's mutable, then I'll make it mutable, unless I have a specific reason to not do so. – Walfrat Oct 20 '17 at 07:40
  • @Voo "immutable/functional + concurrency": That is indeed the core issue. Even without explicit multithreading, aliasing hinders automatic parallelization/re-ordering on pipelined/multi-core CPUs, because memory side effects through the aliases are unknown. That is particularily true with non-managed languages like C/C++ which link to true binaries "invisible" to the compiler of a different TU. That said, this affects some users more than others, although the circle is widening. – Peter - Reinstate Monica Oct 20 '17 at 08:26
  • 3
    People write functional style programs in *C*?!! – Nick Keighley Oct 20 '17 at 09:25
  • @Voo: Many calling conventions (including i386 and x86-64 System V (Linux + OS X)) naturally return structs by hidden pointer, so the caller can pass a pointer to wherever they want them constructed instead of having them left on the stack. x86-64 SysV packs small structs (up to 16 bytes) into the `rdx:rax` register pair for arg passing as well as return values. (i386 SysV only uses `edx:eax` for int64_t, not for small structs, unfortunately). Anyway, I think hidden pointer is a typical design for returning large structs. Some also *pass* large structs by hidden pointer (Windows not SysV). – Peter Cordes Oct 20 '17 at 10:53
  • 2
    @hyde Of course it depends on the structure; but it's *always* true to say that the time to copy a pointer will be *less than or equal to* the time to copy a struct. Worst-case it all fits in a 64-bit word, after all. So using a pointer is never slower, and frequently it's faster. As far as memory allocation goes, that has to happen anyway - it's just a choice whether it's on the stack or on the heap. – Graham Oct 20 '17 at 13:08
  • 1
    @Walfrat You really think that "the claim that immutable data structures are almost never what you want is not true" is the same as "immutable data structures are the silver bullet for all our problems"? Well I guess we'll have to disagree on that, but it certainly was't my intention to give you that idea. – Voo Oct 20 '17 at 21:27
  • 1
    @Graham Just passing it yes, but as soon as you have to access it, copying data on the stack will happen in the l1 cache, while the dynamically allocated memory somewhere on the heap has a good chance of requiring a memory access. And that is orders of magnitudes slower than anything else we're talking about here. – Voo Oct 20 '17 at 21:31
  • @Nick Certainly people use immutable data structures and other ideas first invented in functional programming in C and C++. In what language do you think many high performing map/reduce implementations are done? – Voo Oct 20 '17 at 21:32
13

In addition to other answers, sometimes returning a small struct by value is worthwhile. For example, one could return a pair of one data, and some error (or success) code related to it.

To take an example, fopen returns just one data (the opened FILE*) and in case of error, gives the error code thru the errno pseudo-global variable. But it would be perhaps better to return a struct of two members: the FILE* handle, and the error code (which would be set if the file handle is NULL). For historical reasons it is not the case (and errors are reported thru the errno global, which today is a macro).

Notice that the Go language has a nice notation to return two (or a few) values.

Notice also, that on Linux/x86-64 the ABI and calling conventions (see x86-psABI page) specifies that a struct of two scalar members (e.g. a pointer and an integer, or two pointers, or two integers) is returned thru two registers (and this is very efficient and does not go thru memory).

So in new C code, returning a small C struct can be more readable, more thread-friendly, and more efficient.

Basile Starynkevitch
  • 32,434
  • 6
  • 84
  • 125
  • Actually small structs are *packed* into `rdx:rax`. So `struct foo { int a,b; };` is returned packed into `rax` (e.g. with shift/or), and has to be unpacked with shift / mov. [Here's an example on Godbolt](https://godbolt.org/g/ciPrsp). But x86 can use the low 32 bits of a 64-bit register for 32-bit operations without caring about the high bits, so it's always too bad, but definitely worse than using 2 registers most of the time for 2-member structs. – Peter Cordes Oct 20 '17 at 11:05
  • Related: https://bugs.llvm.org/show_bug.cgi?id=34840 `std::optional` returns the boolean in the top half of `rax`, so you need a 64-bit mask constant to test it with `test`. Or you could use `bt`. But it sucks for the caller and callee compare to using `dl`, which compilers should do for "private" functions. Also related: libstdc++'s `std::optional` isn't trivially-copyable even when T is, so it always returns via hidden pointer: https://stackoverflow.com/questions/46544019/why-is-the-construction-of-stdoptionalint-more-expensive-than-a-stdpairin. (libc++'s is trivially-copyable) – Peter Cordes Oct 20 '17 at 11:08
  • @PeterCordes: your related things are C++, not C – Basile Starynkevitch Oct 20 '17 at 14:15
  • Oops, right. Well the same thing would apply *exactly* to `struct { int a; _Bool b; };` in C, if the caller wanted to test the boolean, because trivially-copyable C++ structs use the same ABI as C. – Peter Cordes Oct 20 '17 at 23:25
  • 1
    Classic example `div_t div()` – chux - Reinstate Monica Mar 28 '19 at 15:42
6

You’re on the right track

Both the reasons you mentioned are valid:

One of the reasons I thought that is would be an advantage to return a pointer to a structure is to be able to tell more easily if the function failed by returning NULL pointer.

Returning a FULL structure that is NULL would be harder I suppose or less efficient. Is this a valid reason?

If you have a texture (for example) somewhere in memory, and you want to reference that texture in several places in your program; it wouldn't be wise to make a copy every time you wanted to reference it. Instead, if you simply pass around a pointer to reference the texture, your program will run much faster.

The biggest reason though is dynamic memory allocation. Often times, when a program is compiled, you aren’t sure exactly how much memory you need for certain data structures. When this happens, the amount of memory you need to use will be determined at runtime. You can request memory using ‘malloc’ and then free it when you are finished using ‘free’.

A good example of this is reading from a file that is specified by the user. In this case, you have no idea how large the file may be when you compile the program. You can only figure out how much memory you need when the program is actually running.

Both malloc and free return pointers to locations in memory. So functions that make use of dynamic memory allocation will return pointers to where they have created their structures in memory.

Also, in the comments I see that there’s a question as to whether you can return a struct from a function. You can indeed do this. The following should work:

struct s1 {
   int integer;
};

struct s1 f(struct s1 input){
   struct s1 returnValue = xinput
   return returnValue;
}

int main(void){
   struct s1 a = { 42 };
   struct s1 b= f(a);

   return 0;
}
Solomon Slow
  • 1,213
  • 9
  • 14
Ryan
  • 613
  • 3
  • 9
  • How is it possible to not know how much memory a certain variable will need if you already have the struct type defined? – yoyo_fun Oct 19 '17 at 14:09
  • 9
    @JenniferAnderson C has a concept of incomplete types: a type name can be declared but not yet defined, so it's size is unavailable. I cannot declare variables of that type, but can declare *pointers* to that type, e.g. `struct incomplete* foo(void)`. That way I can declare functions in a header, but only define the structs within a C file, thus allowing for encapsulation. – amon Oct 19 '17 at 14:18
  • @amon So this is how declaring function headers (prototypes/signatures) before declaring how they work is actually done in C ? And it is possible to do the same thing to the structures and unions in C – yoyo_fun Oct 19 '17 at 14:35
  • @JenniferAnderson you declare function *prototypes* (functions without bodies) in header files and can then call those functions in other code, without knowing the body of the functions, because the compiler just needs to know how to arrange the arguments, and how to accept the return value. By the time you link the program, you actually have to know the function *definition* (i.e. with a body), but you only need to process that once. If you use a non-simple type, it also needs to know that type's structure, but pointers are often the same size and it doesn't matter for a prototype's use. – simpleuser Oct 19 '17 at 19:29
6

Something like a FILE* isn't really a pointer to a structure as far as client code is concerned, but is instead a form of opaque identifier associated with some other entity like a file. When a program calls fopen, it generally won't care about any of the contents of the returned structure--all it will care about is that other functions like fread will do whatever they need to do with it.

If a standard library keeps within a FILE* information about e.g. the current read position within that file, a call to fread would need to be able to update that information. Having fread receive a pointer to the FILE makes that easy. If fread instead received a FILE, it would have no way of updating the FILE object held by the caller.

supercat
  • 8,335
  • 22
  • 28
4

Information Hiding

What is the advantage of returning a pointer to a structure as opposed to returning the whole structure in the return statement of the function?

The most common one is information hiding. C doesn't have, say, the ability to make fields of a struct private, let alone provide methods to access them.

So if you want to forcefully prevent developers from being able to see and tamper with the contents of a pointee, like FILE, then the one and only way is to prevent them from being exposed to its definition by treating the pointer as opaque whose pointee size and definition are unknown to the outside world. The definition of FILE will then only be visible to those implementing the operations that require its definition, like fopen, while only the structure declaration will be visible to the public header.

Binary Compatibility

Hiding the structure definition can also help provide breathing room to preserve binary compatibility in dylib APIs. It allows the library implementers to change the fields in the opaque structure without breaking binary compatibility with those who use the library, since the nature of their code only needs to know what they can do with the structure, not how large it is or what fields it has.

As an example, I can actually run some ancient programs built during the Windows 95 era today (not always perfectly, but surprisingly many still work). Chances are that some of the code for those ancient binaries used opaque pointers to structures whose size and contents have changed from the Windows 95 era. Yet the programs continue to work in new versions of windows since they weren't exposed to the contents of those structures. When working on a library where binary compatibility is important, what the client isn't exposed to is generally allowed to change without breaking backwards compatibility.

Efficiency

Returning a full structure that is NULL would be harder I suppose or less efficient. Is this a valid reason?

It is typically less efficient assuming the type can practically fit and be allocated on the stack unless there's typically a much less generalized memory allocator being used behind the scenes than malloc, like a fixed-sized rather than variable-sized allocator pooling memory already allocated. It's a safety trade-off in this case, most likely, to allow the library developers to maintain invariants (conceptual guarantees) related to FILE.

It's not such a valid reason at least from a performance standpoint to make fopen return a pointer since the only reason it'd return NULL is on failure to open a file. That would be optimizing an exceptional scenario in exchange for slowing down all common-case execution paths. There might be a valid productivity reason in some cases to make designs more straightforward to make them return pointers to allow NULL to be returned on some post-condition.

For file operations, the overhead is relatively quite trivial compared to the file operations themselves, and the manual need to fclose cannot be avoided anyway. So it's not like we can save the client the hassle of freeing (closing) the resource by exposing the definition of FILE and returning it by value in fopen or expect much of a performance boost given the relative cost of the file operations themselves to avoid a heap allocation.

Hotspots and Fixes

For other cases though, I've profiled a lot of wasteful C code in legacy codebases with hotspots in malloc and needless compulsory cache misses as a result of using this practice too frequently with opaque pointers and allocating too many things needlessly on the heap, sometimes in big loops.

An alternative practice I use instead is to expose structure definitions, even if the client is not meant to tamper them, by using a naming convention standard to communicate that no one else should touch the fields:

struct Foo
{
   /* priv_* indicates that you shouldn't tamper with these fields! */
   int priv_internal_field;
   int priv_other_one;
};

struct Foo foo_create(void);
void foo_destroy(struct Foo* foo);
void foo_something(struct Foo* foo);

If there are binary compatibility concerns in the future, then I've found it good enough to just superfluously reserve some extra space for future purposes, like so:

struct Foo
{
   /* priv_* indicates that you shouldn't tamper with these fields! */
   int priv_internal_field;
   int priv_other_one;

   /* reserved for possible future uses (emergency backup plan).
     currently just set to null. */
   void* priv_reserved;
};

That reserved space is a bit wasteful but can be a life saver if we find in the future that we need to add some more data to Foo without breaking the binaries that use our library.

In my opinion information hiding and binary compatibility is typically the only decent reason to only allow heap allocation of structures besides variable-length structs (which would always require it, or at least be a little bit awkward to use otherwise if the client had to allocate memory on the stack in a VLA fashion to allocate the VLS). Even large structs are often cheaper to return by value if that means the software working much more with the hot memory on the stack. And even if they weren't cheaper to return by value on creation, one could simply do this:

int foo_create(struct Foo* foo);
...
/* In the client code: */
struct Foo foo;
if (foo_create(&foo))
{
    foo_something(&foo);
    foo_destroy(&foo);
}

... to initialize Foo from the stack without the possibility of a superfluous copy. Or the client even has the freedom to allocate Foo on the heap if they want to for some reason.