Is it bad practice to read large file in constructor?

Question

So, I am trying to create an English language trie data structure implementation in C++. I have created a Trie and TrieNode class. The TrieNode class takes in its constructor a vector<string> that is a list of words to construct the Trie from.

My solution to getting this vector<string> was to use an EnglishWordsListGenerator class, which in its constructor, takes in a string of the file name of a file which presumably contains a list of valid English words. (The ever popular enable1.txt file).

I'm new to C++, so I don't know if it is considered good practice or not to read the file directly from the constructor when initializing the object. After all, what if the operation fails? On the other hand though, I know Bjarne heavily pushes for RAII. What is the right thing to do here?

related (possibly a duplicate): [C++ - Constructor or Initialize Method to Startup](http://softwareengineering.stackexchange.com/questions/175594/c-constructor-or-initialize-method-to-startup) — gnat, Mar 09 '17 at 17:26
I read that, and am not sure whether this would fall into duplicate territory or not. — Bassinator, Mar 09 '17 at 17:31
Miško Hevery, IMO a very respectable programmer, has his own opinion about doing [real work in constructors](http://misko.hevery.com/code-reviewers-guide/flaw-constructor-does-real-work/). Check his article, that might give you few guidelines. — Andy, Mar 09 '17 at 20:29
I mentioned this in another comment on one of the responses, but I feel this is not a duplicate; what I am really trying to get at, I guess, is that file handling is something that can fail, but it seems possibly sketchy to have a constructor that can fail. Is this gut feeling correct? I suspect not, because otherwise what would be the point of RAII? — Bassinator, Mar 09 '17 at 23:02
How do you run your unit tests? Do you only ever use a file to run your tests? — BobDalgleish, Mar 13 '17 at 21:36
This is a relatively small piece of toy software, so there are no unit tests. — Bassinator, Mar 13 '17 at 23:23

Doc Brown · Accepted Answer · 2017-03-13T14:14:08.493

16

When seeing a class with a constructor signature like

 EnglishWordsListGenerator(const std::string &wordFileName)

I think it is pretty obvious that this constructor will read the given file (and so need some time), and it should not be to hard to understand that the caller has to care for possible exceptions from this (because file IO can fail). So despite what other answers are saying, this design is ok.

Don't get me wrong, in the future you might get requirements where this design might not be sufficient any more, but as long as this simple interface and behaviour is all your program needs, I would stick to the KISS and YAGNI principles and avoid overcomplicating things by providing a separate initialization method "just in case".

edited Mar 13 '17 at 14:14

answered Mar 09 '17 at 21:32

Doc Brown

199,015
33
367
565

1

Not from a C++ background, but as a "Generator" I would expect it to read the data file lazily, possibly even only opening it on first iteration/call. Even more so if the thing being generated is a "List" of something. This smells that the items do not contain cross-references which could require reading the whole file. – This company is turning evil. Mar 09 '17 at 23:52
1

@Kroltan: that is exactly what I meant when writing "in the future this design might not be sufficient any more". But without knowing the exact requirements, I would stick to the KISS principle and use the most simple implementation and behaviour I can think of. Lazy loading is an optimization (and I recommend not to optimize "in case"), and it comes for a cost (for example, the user of the class needs to extend his IO exception handling strategy). So it would not be my first choice. – Doc Brown Mar 10 '17 at 06:48

Cort Ammon · Answer 2 · 2017-03-13T16:35:36.037

If you are new to C++, it may be desirable to shy away from doing big work in a constructor. Constructors are wedded tightly to the way the language behaves, and you really don't want the constructor getting called unexpectedly (explicit is your friend!). Constructors are also rather unique in that they cannot return a value. This can make error handling more complicated.

This sort of recommendation becomes less important as you learn more about C++ and all the different things which can cause constructors to get called. In the perfect world, the correct answer is that your library should do exactly what the user wanted it to at all times. Quite often in C++ that is best implemented by doing things inside constructors. However, you just want to make sure that you understand constructors enough not to trap your users in a corner. Debugging why the program operates poorly because a constructor got invoked when a user did not intend it to be invoked can be a pain.

Likewise exception handling in constructors can be more difficult because the scope where the exception occurs is entwined with the scope where a variable gets created, rather than where you want to use it. Exceptions can be infuriating when they occur before main() is run, denying you the ability to handle that exception for the user. This can happen if you use global variables and the errors can be cryptic.

One option that sits between the extremes is to use factory methods. If your users expect to use factory methods, it makes sense that they could do a little extra initialization at that point. A factory method won't be called by accident like a constructor might, so it's harder to mess it up.

Once you are confident you understand the consequences of doing work in a constructor, it can be a powerful tool. The ultimate example of doing work in a constructor might be the lock_guard class. Lock guards are constructed by passing them a lock such as a mutex. They call .lock() on it during their constructor (aka doing real work) and .unlock() on it during their destructor.

These classes get away with doing real work in the constructor because they are designed to be used that way. Anyone working with a lock_guard is well aware that the constructor does work because code that uses it looks something like:

{
    lock_guard guardMe(myLock); // locks myLock
    doSomeStuff();
    doMoreStuff();
    // as I leave this block, guardMe will unlock myLock.  It will do so
    // even if I leave via an exception!
}

If you look at the implementations of lock_guard classes, you will find easily 80% of their attention or more is focused on how to manage the constructors correctly to prevent surprises. Properly implemented, they're very powerful. Improperly implemented lock_guards can be the bane of your existence because they can appear to do one thing when they actually do another.

I like this answer it acknowledges that the real world isn't always the ideal world. But from curiosity, how could I possibly accidentally call a constructor? — Bassinator, Mar 09 '17 at 22:03
@Airhead one example is implicit conversions, like for function calls: http://ideone.com/jTfzsB. Since constructors are implicit by default, calling a function with a type that is implicitly convertible to the expected type through a single parameter constructor, will call the constructor. — clcto, Mar 09 '17 at 23:17
Okay, so if I mark the constructor as `explicit`, then I should avoid this problem, no? Any other situations? Not being snarky, genuinely curious. — Bassinator, Mar 09 '17 at 23:20
I heavily disagree that "beeing new C++" is a valid reason for not "doing big work in a constructor". I also disagree that "big work in a constructor" for the case described in the question has a risk of "trapping users in a corner." That sounds all very superstitious. — Doc Brown, Mar 10 '17 at 06:43
@Airhead: "implicit conversion" for your case would mean the constructor gets called *implicitly* and gets a file name out of thin air? I think that is quite unlikely ;-) — Doc Brown, Mar 10 '17 at 06:54

score 5 · Answer 3 · answered Mar 09 '17 at 21:39

5

It's not a good practice to load an object from a file in the constructor.

Beyond the good C++ arguments in the other answers, I'd add a clean code principle: a function should do only one thing. Ok, the constructor is not a normal function, but the principle still applies.

I suggest the following:

keep the Trie as general as possible to aim at reuse. There are plenty of other applications that could use it.
keep it independent of the way you feed it: ideally. Different application could use different approaches, for example to feed it from a string or from an istream. After all, future versions of your app could querry the words from a database with different languages.
use a builder design pattern, to assemble the pieces of the construction process. The builder would then load the trie from a source, using some kind of general approach (e.g. Create the trie, Open the source, iterate through source data and inserting it into the trie, once it is done return the trie). You could then make a concrete builder for loading from a file, and easily opt for other alternatives.

answered Mar 09 '17 at 21:39

Christophe

74,672
10
115
187

1

I agree: using an `istream` instead of filename is a much better idea. – Mar 09 '17 at 21:55
3

*A function should only do one thing* ...Right... In this case though, that "one thing" is to initialize the object into a usable state, is it not? Regardless though, I think that argument falls more into duplicate territory as mentioned in the first comment on the original question. My question more specifically is geared toward the idea that file handling is something that can fail; is this bad, and how can I handle it? – Bassinator Mar 09 '17 at 22:58
I'm not saying I disagree with this answer, I'm saying I need further explanation of your reasoning with regard to those principles. Why does this not violate RAII? – Bassinator Mar 09 '17 at 23:05
1

@Airhead thanks for the feedback. My point was that according to the "one thing" a constructor should construct and initialise. Loading from disk is something more than initialisation, different from constructing. If you choose to nevertheless go for this approach you can handle failure either by throwing an exception (i.e. Means that the object couldn't be created and will not exist) or by keeping some internal status (i.e. The object is created, but it's state is not as expected). – Christophe Mar 09 '17 at 23:20
2

I think the single-responsibility principle is the best argument--not just for the constructor but also for the class itself. The class should concern itself with just one job. The Trie is just a specialized type of container. Other container types, like `std::vector` and `std::set`, know nothing about files, filenames, opening and closing streams, formatted input, etc. I'd have a default constructor make a valid, empty trie, and an insert method. As an advanced user, I might add a constructor that takes a pair of iterators and use istream_iterators to init from a file. – Adrian McCarthy Mar 10 '17 at 00:06

score 1 · Answer 4 · answered Mar 09 '17 at 20:17

1

I don't think I would read a file in the constructor for a couple of reasons:

it will significantly slow down object creation and potentially the startup of your program.
as you mentioned, there are limited options for error handling. You would have to either put the entire program in a try...catch block or set some sort of flag if something goes wrong.
what if you need to read it in more than once? You would have to have a duplicate of the code, or split the code out to a function the constructor calls, in which case just do it after the object is created anyway.

I'm sure there are other reason, those are just the first ones that came to my mind. If its a small file and you are only reading it the once, you are probably ok.

answered Mar 09 '17 at 20:17

bluegreen

119
2

5

1) Then move the object's creation outside of the program's startup. 2) What's wrong with throwing exceptions? It's the standard error mechanism in C++. 3) If you need to read it more than once, you would probably need to update all existing objects. Which is easily done by... simply reconstructing them. Or having an "update" method that re-reads the file. Or any number of things. – Nicol Bolas Mar 09 '17 at 20:37
so...I take it you disagree with me. That's ok. Its just been my experience that its best to minimize what a constructor does as much as possible. – bluegreen Mar 09 '17 at 20:54
1

@bluegreen you are correct, constructors should avoid doing any unnecessary work. But there is a _very_ important reason why, that Cort's answer focuses on: in C++, it is _very_ easy to call a constructor accidentally without realizing it. Or to do so purposefully, but bury the call behind generated code. Ex. the compiler may construct temporary objects, or call copy/move constructors implicitly. This is a very important point when discussing constructors that do non-trivial work. – Mar 09 '17 at 21:32
I completely agree, Cort's answer is better and more complete than mine. I was not trying to be the best answer, I just knocked out a few issues I have run into with file reading in constructors in the past. Truthfully, the copy constructor issue never occurred to me, but its a very good point. – bluegreen Mar 09 '17 at 21:37
A copy constructor is unlikely to trigger I/O to read a file, but may involve copying a large amount of data which itself could be slow if the data structure is not cached (likely in this case) and due to the sheer amount of data. – Mar 09 '17 at 21:53
Yeah, I realized that after I posted the previous comment, but I just didn't bother to go back and change it. – bluegreen Mar 10 '17 at 12:55

score 0 · Answer 5 · answered Jan 06 '23 at 00:36

If the constructor does a lot of work, then there can be exceptions. In C++, exception in a constructor can be handled, but in practice you wouldn’t know how to do it correctly, and it is an absolute pain.

Much easier to write a factory method or even just a plain function that creates an object on the heap, reads a file while handling exceptions, and returns a pointer to the object or a null pointer.

Is it bad practice to read large file in constructor?

5 Answers5