9

I'm currently building a server responsible of storing and managing few million records of fairly complex and interconnected data. For reasons beyond my control the work has to be done with C++.

I have my concerns about error handling and especially pointer safety. Dereferencing a null pointer can cause the entire server to crash whereas in some - perhaps more suitable - languages you would get a null pointer exception which you can then handle. So, basically, a bug in handling of some rare corner case of a rare operation can bring the whole service down instead of just making one particular request to fail. Bugs are impossible to avoid so how can I ensure availability and robustness in such an environment?

Muton
  • 617
  • 4
  • 11
  • 6
    Crashed processes can be restarted by a watchdog. A more serious problem may be use-after-free and out-of-bounds bugs, which can corrupt data (or allow exploits) without crashing. –  Oct 21 '14 at 06:40
  • 1
    Are databases against the rules also? Reinventing a DBM is the wildly wrong way to approach this. It is obscenely difficult to write C++ and make no pointer errors; the language promises more robustness than it can deliver. – msw Oct 21 '14 at 07:43
  • No, I'm using Postgres and am not so worried about data integrity. I'm not sure what I expected to get as an answer but it seems that I just have to accept the fact that occasionally my C++ server may crash and I have to manage that with watchdog. That's just hard to accept. I wrote Java for almost 10 years and, despite all the bugs I've introduced over the years, I can't remember a single incident where a service of mine crashed because of a bug and had to be restarted (out-of-memory errors aside :-). – Muton Oct 21 '14 at 07:59
  • 3
    I can't really imagine a situation where I'd have null pointer exception, handled it and the software sensibly worked. When it gets NPE, there is a bug and it has to be fixed anyway. – Jan Hudec Oct 21 '14 at 08:14
  • Oh, and of course you **can** handle null pointer exceptions in C++, though it is operating system dependent. – Jan Hudec Oct 21 '14 at 08:14
  • 3
    "C++ programmers think memory management is too important to leave to the compiler, Java programmers think that memory management is too important to leave to the programmer" — _someone_. Since at least half of C++/C defects result from pointer errors, I think we may have seen the last languages that give you such an ability to shoot yourself in the foot. – msw Oct 21 '14 at 08:15
  • 7
    @msw: I haven't had a pointer error in my code in years and in the codebase as a whole we had a couple and they came up in the first test and were quickly fixed. With proper use of constructors, destructors and smart pointers, they are not likely any more. And we don't even have C++11, only boost. What we do have problems with is thread synchronization, but that's a problem in Java and C# too. – Jan Hudec Oct 21 '14 at 08:17
  • @Jan Hudec: I agree that you cannot not sensibly recover from a NPE but in other platforms you can better isolate the error. I would rather have just one request fail instead of having the entire server to crash. – Muton Oct 21 '14 at 08:39
  • 1
    @Muton: As I said, you _can_ catch null (invalid) pointer access in C++. With structural exceptions on Windows and signals on Unix. Use-after-free and buffer overruns are worse as they usually just return garbage; most can be eliminated by using proper containers instead of raw pointers. – Jan Hudec Oct 21 '14 at 08:42
  • @JanHudec: That's something new to me and I have to read up on this and see if that would be a useful thing to do. Thanks! – Muton Oct 21 '14 at 08:46
  • BTW, I am not entirely sure that bugs are impossible to avoid. With enough resources and efforts, on a reasonably sized program (e.g. 50KLOC), you could avoid them (e.g. using formal methods). However, that is costly! – Basile Starynkevitch Oct 21 '14 at 17:47

2 Answers2

16

Use C++11 (e.g. with GCC 4.9.1), not some earlier standard. Read more about memory leaks, smart pointers, buffer overflow, memory corruption, RAII.

If you follow several coding guidelines, notably the rule of three (which in C++11 is actually a rule of five), you would be able to write robust code. Use smart pointer templates like std::shared_ptr, std::unique_ptr and use standard STL containers. You should almost never use raw pointers like SomeClass* and ::operator new (if you want to use them directly, you are probably wrong, and you should always initialize them, often to nullptr; I believe that every variable should be explicitly initialized -the compiler would optimize useless initializations-, because that makes program behavior more reproductible.).

C++11 is a difficult language. Take several weeks to read several books about it, notably by B.Stroutrup: Programming -- Principles and Practice Using C++, a Tour of C++, The C++ programming language.

Be aware that C++11 is a different language than C (and even than its predecessor C++98). You need to take time to learn it. Look also inside the source code of several recent free software projects coded in C++.

Compile with all warnings and debug info (g++ -Wall -Wextra -g). Improve your code till you got no warnings. Make an exhaustive test suite. Use valgrind (on systems where it is available). Perhaps consider using frameworks like Poco, Boost, Qt. On production machine, use a watchdog.

BTW, you might learn more about Garbage Collection (see also the GC handbook), and notice that C++ favors reference counting, which can be viewed as a simple and limited form of GC (which is unfriendly to circular references).

You could also use operating system specific things (for Linux, read Advanced Linux Programming and signal(7)...) to catch some runtime errors. (I'm not sure it is a good idea to catch SIGSEGV, you'll need some very system & processor specific code; but it is doable).

Also, you could perhaps generate some of your C++ code (see also this). My MELT system is doing so (and you might also customize GCC with MELT to add some additional coding rules checking).

You seems very concerned about null pointer dereferences. If that scares you so much (it should not, since it is actually an easy bug to find; memory leaks are harder to find), you might craft your own smart pointer (above existing facilities) which would raise an exception in that case.

BTW, robustness of your code is much more related to the code size and the care and effort put in developing and testing it, than to the amount of data or records it has to process: a well crafted 30KLOC software able to handle 1000 records would probably be able to handle many dozens of millions of records, if given enough resources (notably computer power, disk, and RAM). A few millions of records does not feel scary (consider using Sqlite or Postgresql or Mongodb, etc, etc....).

Notice that some people are able to write very robust C or C++ code; in particular, many free software database management systems (Sqlite, Postgresql, Mongodb) are coded in C or C++ and are able to run for years, managing terabytes -or perhaps petabytes- of data. If possible, make your software published as free software (and develop it in the open): you'll get useful feedback from outside.

Basile Starynkevitch
  • 32,434
  • 6
  • 84
  • 125
  • 2
    Thanks for the thorough answer and the effort of coming back to edit it and making it even more useful. Many of the techniques you mentioned I already use but there's also some good pointers there for me. I also use postgres but in a very unusual way. I already have a vertical slice implemented and am quite confident that performance will not be an issue. Robustness and code quality is my main concern at the moment. The biggest take-away for me from these answers is the overal (and obvious) realization that I also need to adjust the "mental models" I've built during my years with Java. – Muton Oct 21 '14 at 18:56
  • 2
    You should not *adjust* but *change* your mental models from Java to C++. Consider C++ as a new language and look at it with a *fresh* mindset. Your Java mental models would only *disturb* you in C++ ! – Basile Starynkevitch Oct 22 '14 at 12:37
4

A couple of auxiliary considerations (Basile got the important stuff):

  1. pointer errors are always the result of coding errors.

    The symptom may be some unpredictable run-time crash, but the cause is contained, statically, in your code. It's entirely possible to write code that just doesn't contain these errors. The guidelines and books Basile mentions will make this easier, by teaching the right idioms and conventions.

    • since pointer errors are always the result of coding errors, languages which make them easy to survive just make it easy to write buggy code that doesn't actually crash. That may be fine if not crashing is the limit of your ambition, but doesn't make your code correct.

      IMO a better approach for programs that need to be robust is a language that makes it hard to write incorrect code. Of course C++ isn't that either, but null pointer exceptions aren't a step in the right direction.

  2. if you need robustness, you should build it in at all levels.

    1. can you run multiple instances, so one failing won't take the others down?
    2. can you split tasks into stages which can be spread across processes for the same reason?
    3. can you journal/commit/rollback your changes cleanly? RAII-based commit/rollback works in the presence of exceptions (remember NPEs aren't the only unexpected event you should consider), and journalling can preserve changes across crashes/restarts.
    4. if you need to handle stray cosmic rays or other bit-flipping events (as per Vatine's comment), you need ECC memory. NPEs still don't help you here: the damage is just as likely to be in your code, or the kernel code, or some filesystem metadata, and even if by some great fortune the damage occurs to a pointer, it's much more likely to become a different (and possibly valid) non-null pointer.

For reference, I don't have anything against NPEs or the languages that provide them. They're good for adding resilience to non-critical code, and for allowing a useful degree of sloppiness in applications where development time is more important than correctness.

... in some - perhaps more suitable - languages you would get a null pointer exception which you can then handle.

This just indicates you'd like to write sloppy code and survive. While that's perfectly reasonably in some situations, that sloppy code is just as likely to silently and successfully produce garbage output as beautifully-handled NPEs.

Useless
  • 12,380
  • 2
  • 34
  • 46
  • Well, at large enough scale, you *will* have bits flipping unintentionally in RAM. – Vatine Oct 21 '14 at 10:35
  • 1
    At that scale, you're at least as likely to have damaged code as damaged pointers. Do you really want it to keep merrily trashing your _few million records_, or should it fail hard and early and let you recover before destroying all your business data? – Useless Oct 21 '14 at 10:46
  • I agree that code is vastly more likely, but even in the face of perfect code... – Vatine Oct 21 '14 at 13:18