37

The company I work at is initializing all of their data structures through an initialize function like so:

//the structure
typedef struct{
  int a,b,c;  
} Foo;

//the initialize function
InitializeFoo(Foo* const foo){
   foo->a = x; //derived here based on other data
   foo->b = y; //derived here based on other data
   foo->c = z; //derived here based on other data
}

//initializing the structure  
Foo foo;
InitializeFoo(&foo);

I've gotten some push back trying to initialize my structs like this:

//the structure
typedef struct{
  int a,b,c;  
} Foo;

//the initialize function
Foo ConstructFoo(int a, int b, int c){
   Foo foo;
   foo.a = a; //part of parameter input (inputs derived outside of function)
   foo.b = b; //part of parameter input (inputs derived outside of function)
   foo.c = c; //part of parameter input (inputs derived outside of function)
   return foo;
}

//initialize (or construct) the structure
Foo foo = ConstructFoo(x,y,z);

Is there an advantage to one over the other?
Which one should I do, and how would I justify it as a better practice?

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
Trevor Hickey
  • 993
  • 1
  • 9
  • 16
  • possible duplicate of [How would you know if you've written readable and easily maintainable code?](http://programmers.stackexchange.com/questions/141005/how-would-you-know-if-youve-written-readable-and-easily-maintainable-code) – gnat Jul 20 '15 at 15:44
  • 4
    @gnat This is an explicit question about struct initialization. That thread embodies some of the same rationale I would like to see applied for this particular design decision. – Trevor Hickey Jul 20 '15 at 15:48
  • I'm sorry but what's wrong with a simple constructor? – Shoe Jul 20 '15 at 16:06
  • @TrevorHickey: What kind of pushback did you get? – JacquesB Jul 20 '15 at 16:20
  • 2
    @Jefffrey We're in C, so we can't actually have methods. It's not always a direct set of values either. Sometimes initializing a struct is to get values(somehow), and performs some logic to initialize the struct. – Trevor Hickey Jul 20 '15 at 16:31
  • 1
    @JacquesB I got " Every component that you build will be different than others. There is an Initialize() function used elsewhere for the struct. Technically speaking, calling it a constructor is misleading." – Trevor Hickey Jul 20 '15 at 16:32
  • @TrevorHickey That's actually a very good argument: A constructor should construct the target object, not provide you with an object that you can use to initialize the target object by copying it over. – cmaster - reinstate monica Jul 20 '15 at 18:20
  • @cmaster Right. Ok. It's just, there's no actual constructors in C. In C++, it's impossible to get an instance of an object without "constructing" it. Constructors are called implicitly when getting an instance, and that's what I was trying to mimic here in C. But you might say that "C is not C++", in which case fine. – Trevor Hickey Jul 20 '15 at 18:57
  • 1
    @TrevorHickey `InitializeFoo()` is a constructor. The only difference from a C++ constructor is, that the `this` pointer is passed explicitly rather than implicitly. The compiled code of `InitializeFoo()` and a corresponding C++ `Foo::Foo()` is exactly the same. – cmaster - reinstate monica Jul 20 '15 at 19:23
  • @cmaster Ok, that's more clear. I can agree with that. – Trevor Hickey Jul 20 '15 at 19:40
  • The second approach can create `const Foo` objects, as in `Foo const foo = ConstructFoo(x,y,z);` This cannot be done with the first approach, unless you wrap it into another function (you can implement either approach using the other). – dyp Jul 20 '15 at 19:54
  • Meta.SE discussion about this question: **[Where do I ask opinionated technical questions regarding code?](http://meta.stackexchange.com/q/261076/255171)** –  Jul 20 '15 at 21:06
  • If they didn't typedef their structs, then it would be obvious that an initialization function is superfluous. – technosaurus Jul 21 '15 at 02:30
  • 2
    Better option: Stop using C over C++. Autowin. – Thomas Eding Jul 21 '15 at 11:48

5 Answers5

26

In the 2nd approach you will never have a half-initialised Foo. Putting all the construction in one place seems a more sensible, and obvious place.

But... the 1st way isn't so bad, and is often used in many areas (there's even a discussion of the best way to dependency-inject, either property-injection like your 1st way, or constructor injection like the 2nd). Neither is wrong.

So if neither is wrong and the rest of the company uses approach #1, then you should be fitting in with the existing codebase and not trying to mess it up by introducing a new pattern. This is really the most important factor at play here, play nice with your new friends and don't try to be that special snowflake who does things differently.

gbjbaanb
  • 48,354
  • 6
  • 102
  • 172
  • Ok, seems reasonable. I was under the impression that initializing an object without being able to see what kind of input was initializing it, would lead to confusion. I was trying to follow the concept of data in / data out to produce predictable and testable code. Doing it the other way had seemed to increase coupling as the source file of my struct needed extra dependencies to perform the initialize. You're right though, in that I don't want to rock the boat unless one way is highly preferred over another. – Trevor Hickey Jul 20 '15 at 15:57
  • 4
    @TrevorHickey: Actually I would say there are *two* key differences between the examples you give - (1) In one the function is passed a pointer to the structure to initialise, and in the other it returns an initialised structure; (2) In one the initialisation parameters are passed in to the function, and in the other they are implicit. You seem to be asking more about (2), but the answers here are concentrating on (1). You may wish to clarify that - I suspect most people would recommend a hybrid of the two using explicit parameters and a pointer: `void SetupFoo(Foo *out, int a, int b, int c)` – psmears Jul 20 '15 at 21:54
  • 2
    How would the first approach lead to a "half-initialized" `Foo` struct? The first approach also performs all initialization in one place. (Or are you considering an *un*initialized `Foo` struct to be "half-initialized"?) – jamesdlin Jul 21 '15 at 11:24
  • 1
    @jamesdlin in cases where the Foo is created, and the InitialiseFoo is accidentally missed. It was just a figure of speech to describe the 2--phase initialisation without typing out a lengthy description. I figured experienced developer type people would understand. – gbjbaanb Jul 21 '15 at 12:00
24

Both approaches bundle the initialization code into a single function call. So far, so good.

However, there are two issues with the second approach:

  1. The second one does not actually construct the resulting object, it initializes another object on the stack, which is then copied over to the final object. This is why I would see the second approach as slightly inferior. The push-back you have received is likely due to this extraneous copy.

    This is even worse when you derive a class Derived from Foo (structs are largely used for object orientation in C): With the second approach, the function ConstructDerived() would call ConstructFoo(), copy the resulting temporary Foo object over into the superclass slot of a Derived object; finish the initialization of the Derived object; only to have the resulting object copied over again on return. Add a third layer, and the entire thing becomes completely ridiculous.

  2. With the second approach, the ConstructClass() functions do not have access to the address of the object under construction. This makes it impossible to link up objects during construction, as it is needed when an object needs to register itself with another object for a callback.


Finally, not all structs are fully fledged classes. Some structs effectively just bundle a bunch of variables together, without any internal restrictions to the values of these variables. typedef struct Point { int x, y; } Point; would be a good example of this. For these use of an initializer function seems overkill. In these cases, the compound literal syntax may be convenient (it's C99):

Point = { .x = 7, .y = 9 };

or

Point foo(...) {
    //other stuff

    return (Point){ .x = n, .y = n*n };
}
  • 5
    I didn't think copies would be an issue because of https://en.wikipedia.org/wiki/Copy_elision – Trevor Hickey Jul 20 '15 at 18:48
  • 5
    That the compiler may elide the copy does not alleviate the fact that you have written down the copy. In C, writing superfluous operations and relying on the compiler to fix them is considered bad stile. This is different from C++ where people are proud when they can prove that the compiler can theoretically remove all the cruft left by their nested templates. In C, people try to write exactly the code that the machine should execute. Anyway, the point about the inaccessible addresses remains, copy elision cannot help you there. – cmaster - reinstate monica Jul 20 '15 at 19:19
  • 4
    Anyone using a compiler should expect the code they write to be transformed by the compiler. Unless they are running a hardware C interpreter, the code they write won't be the code they execute, even if it is easy to believe otherwise. If they do understand their compiler, they'll understand elision, and it is no different than `int x = 3;` not storing the string `x` in the binary. The address and inheritance points are good; the assumed failure of elision is foolish. – Yakk Jul 21 '15 at 14:04
  • @Yakk: Historically, C was invented to serve as a form of high-level assembly language for systems programming. In the years since, its identity has become increasingly murky. Some people want it be an optimized application language, but since no better form of high-level assembly language has emerged, C is still needed to serve that latter role. I see nothing wrong with the idea that a well-written program code should behave at least decently even when compiled with minimal optimizations, though to make that really work would require that C add some things that it has long lacked. – supercat Jul 23 '15 at 13:40
  • @Yakk: Having directives, for example, which would tell a compiler "The following variables may safely be kept in registers during the following stretch of code" as well as a means of block-copying a type other than `unsigned char` would allow optimizations for which the Strict Aliasing Rule would be insufficient, while simultaneously making programmer expectations clearer. – supercat Jul 23 '15 at 13:43
  • I'd upvote 20 times for point 2 if I could. – underscore_d Feb 07 '16 at 22:53
2

One argument in favor of the "output-parameter" style is that it allows the function to return an error code.

struct MyStruct {
    int x;
    char *y;
    // ...
};

int MyStruct_init(struct MyStruct *out) {
    // ...
    char *c = malloc(n);
    if (!c) {
        return -1;
    }
    out->y = c;
    return 0;  // Success!
}

Considering some set of related structs, if initialization can fail for any of them, it can be worth to have all of them use the out-parameter style for consistency’s sake.

Viktor Dahl
  • 168
  • 1
  • 7
1

Depending upon the contents of the structure and the particular compiler being used, either approach could be faster. A typical pattern is that structures meeting certain criteria can get returned in registers; for functions returning other structure types the caller is required to allocate space for the temporary structure somewhere (typically on the stack) and pass its address as a "hidden" parameter; in cases where a function's return is stored directly to a local variable whose address is not held by any outside code, some compilers may be able to pass the address of that variable directly.

If a structure type satisfies a particular implementation's requirements to be returned in registers (e.g. being either no larger than one machine word, or filling precisely two machine words) having a function return the structure may be faster than passing the address of a structure, especially since exposing the address of a variable to outside code that might keep a copy of it could preclude some useful optimizations. If a type does not satisfy such requirements, the generated code for a function that returns a struct will be similar to that for a function that accepts a destination pointer; the calling code would likely be faster for the form that takes a pointer, but that form lay lose some optimization opportunities.

It's too bad C doesn't provide a means of saying that an function is forbidden from keeping a copy of a passed-in pointer (semantics similar to a C++ reference) since passing such a restricted pointer would be gain the direct performance advantages of passing a pointer to a pre-existing object, but at the same time avoid the semantic costs of requiring a compiler to consider a variable's address "exposed".

supercat
  • 8,335
  • 22
  • 28
  • 3
    To your last point: There is nothing in C++ to stop a function from keeping a copy of a pointer that's passed in as a reference, the function can simply take the address of the object. It is also free to use the reference to construct another object that contains that reference (no naked pointer created). Nevertheless, either the pointer copy or the reference in the object may outlive the object that they point to, creating a dangling pointer/reference. So the point about reference-safety is pretty mute. – cmaster - reinstate monica Jul 21 '15 at 05:47
  • @cmaster: On platforms which return structures by passing a pointer to temporary storage, C compilers do not provide called functions with any way of accessing the address of that storage. In C++, it's possible to get the address of a variable passed by reference, but unless the caller guarantees the lifetime of the item passed (in which case it usually would have passed a pointer) Undefined Behavior will likely result. – supercat Jul 21 '15 at 13:47
1

I'm presuming that your focus is on initialization via output parameter vs. initialization via return, not the discrepancy in how construction arguments are supplied.

Note that the first approach could allow Foo to be opaque (although not with the way you currently use it), and that's usually desirable for long-term maintainability. You could consider, for example, a function that allocates an opaque Foo struct without initializing it. Or perhaps you need to re-initialize a Foo struct that was previously initialized with different values.

jamesdlin
  • 418
  • 4
  • 11