Handling and managing error codes

Question

I'm looking for examples of creating, handling and managing error numbers/codes.

To understand what I'm takling about, let's take example scenario: (EU - end user, IT - it helpdesk)

EU: - (calls IT) something's wrong
IT: - what happened?
EU: - it says "cannot create xyz"
IT: - what have you done?
EU: - ummm....
(conversation continues for another gazillion years)

... and exception that produces message cannot create xyz is being thrown in dozen places in whole system, so IT guy doesn't know where to look for.

Another approach (which I really like and would like to use) is for example:

example error message

Now, when EU calls IT, whole conversation looks like this:

EU: - I have error 1004
IT: - ok, thanks, we'll take a look

Having error code, IT guy instantly knows where to look for error. He doesn't need to ask EU to provide loads of misleading details, he just can sit down and work or if he knows why, explain that this is intentional error because EU did something wrong.

Sounds like flowers and donuts, doesn't it?

But what are best practices to achieve this behaviour?

Let's assume that this is Java and we're building app from scratch. What I know so far is that I cannot get error codes from database, because if Exception is being thrown, I cannot rely on external sources.

This leads to problems with teamwork. Let's say that John, Paul and Adam assumed that they will use error codes while writing their app, and they will generate errors incrementaly. I see possible collision, when John wants to mark exception message in code using number #256, but Paul doesn't know about that and creates his own exception, also with #256 related it, and uses this exception in few dozens of places in app. At the end of the day noone knows what error #256 really means.

So maybe there's another approach - use small but handy one-table web app that will handle generating error numbers, so John, Paul and Adam just can click "generate" when they want new unique error number?

Above elaborate also leads to another question - what are best practices to set error codes? In exceptions themself, like this:

public class FooException extends Exception {
  public FooException() { super(); }
  public FooException(String message) { super("Error #256 - xyz is forbidden"); }
  public FooException(String message, Throwable cause) { super("Error #256 - xyz is forbidden", cause); }
  public FooException(Throwable cause) { super(cause); }
}

or:

try{
    (some stuff)
} catch (FooException ex){
    System.out.println("Error #256 - you cannot do xyz in here");
}

I suspect second option is better, because it indicates unique, specific place in code in whole project, but this looks harder to manage.

Any thoughts? Any practices? Advices? Thanks.

Note that a lot of people here think error codes are anachronistic and obsolete, e.g. see http://programmers.stackexchange.com/questions/98034. I disagree (numbers are much, much better than flowery prose for actually getting accurate info from users over the phone), and I hope there will be favorable discussion of that option here. — Kilian Foth, Sep 08 '14 at 10:05
recommended reading: **[Why is asking a question on “best practice” a bad thing?](http://meta.stackexchange.com/a/142354/165773)** — gnat, Sep 08 '14 at 10:20
@KilianFoth I somehow PARTIALLY agree with author of accepted answer in thread you mentioned. Returning only error code is worse that returning error code + message (where applicable) to user, so one sometimes can resolve issue on one's own. Updated my question. — ex3v, Sep 08 '14 at 10:20
Don't John, Paul, and Adam ultimately submit to a master repository? If they were using enum values to represent error codes, wouldn't that then cause conflicts that must be resolved when committing and having a collision with an existing commit? Not sure I understand the necessity to complicate things with a web application. — Neil, Sep 08 '14 at 10:30
In ideal world they should, but let's assume that John, Paul and Adam work on the same feature. They work along with code and `confluence`, posting error codes to manual. At merge time it comes out that there are merge errors, and now someone has to stay after hours to change `confluence` docs. — ex3v, Sep 08 '14 at 11:43
Merge errors, at least, aren't the same thing as error code mass confusion. If you want to avoid error code conflicts, then have different sections of the program start with different prefixes 1001, 1002, 1003 vs 2001, 2002, 2003. If there are programmers working on the *same* section working on the *same* feature, have them talk to each other more often. — Neil, Sep 08 '14 at 12:06
If two programmers on the same project are going to do things independent of each other and not communicate, very few things will work as expected. — JeffO, Sep 08 '14 at 13:41
I agree, but this is not an answer to my question. I asked about where to declare codes, how to use them, what to do and what not to do. — ex3v, Sep 08 '14 at 14:51
@ex3v yes - look at most OSes. Error code 10057 means 'socket not connected'. Nothing else I've ever seen uses that error code. Once upon a time we had to use codes for making new modules, as we were an international company each country used the phone dialling code as a prefix so the UK used 44000000 onwards, etc. Made life much easier to manage. — gbjbaanb, Sep 30 '15 at 14:30

score 3 · Answer 1 · answered Sep 30 '15 at 14:19

Go ahead and use error numbers if you want to use a reference table, but I'd recommend a different approach.

In your message tell the user:

What error occurred. If it's an intentional thrown error due to user input, tell them what they did wrong in the message.
The program name, class name and method name where the error occurred so a programmer or support person can find the code.
If possible, suggest actions the user can take to correct the problem.

Just using error codes is too cryptic for a user friendly program and not as easy for a support person to decode as you might think. Inevitably, the error will occur when the support rep is away from his/her desk (on call at home) or they will have misplaced the error code list.

Which message would you rather troubleshoot?

"Program failure! Error 0x80020018"

or "MyProgramName error 0X80020018 occurred in module SendMail, the email address is not valid. Please check that the email address you entered is valid and try again."

score 2 · Answer 2 · answered Sep 30 '15 at 14:45

What I've noticed is that there is a huge disconnect between what is needed for diagnosing a problem and what you display to the user. Error codes and exceptions are simply start points that give you a suggestion of what might have gone wrong.

For example: an error saying a file upload failed could be because the network was down, disconnected, rejected, data corrupted, the directory on the server could not be created, the file already existed, the file existed and was readonly, the directory was secured against the user writing to it, the filename being uploaded was too long, or even that the fancy customer SAN array thing took 40 seconds to respond and the network upload timed out! (damn you EMC and the customer that mis-configured your storage array!)

The problem here is that while you want to tell the user "failed to upload", you do not really want to create exceptions or error code for all the possibly permutations of what might have gone wrong - particularly in the last case where the file appears to have been written correctly when you go to look. I've always found that a decent logging mechanism is worth its weight, when an error occurs that cannot simply be retried by the user, the logs are going to be viewed to see what went on. The logs will often not tell you the exact error but should give you enough tracing information (including error codes written in them) for you to diagnose the problems with the data or environment that led to the error condition.

As for exceptions v error codes, I prefer the latter, even if they are thrown as exceptions! If you do use error codes you will need to provide a mechanism to convert them to text descriptions, a set of error codes per module (or class) makes sense to avoid the need to grab a new unique code, and keeps error codes from a module sequential.

@RobertHarvey we've got logs, logs, logs, chips and logs. Chips are off. — gbjbaanb, Sep 30 '15 at 14:54

score 0 · Answer 3 · answered Sep 30 '15 at 15:19

From the point of view of the problem that you are facing, a message like "Cannot create xyz" is absolutely no different from "Run-time error '1004'". If the "Cannot create xyz" exception is being thrown in a dozen places all over your system, so can error "Run-time error '1004'" be issued in a dozen places all over your system. Replacing one with the other does not help you in any way.

And generally, exceptions are always in a far better position to help than error codes ever will.

You have a number of options.

If by any chance it is very easy for you to collect the log files from the field, then you can generate a new unique error number each time an error occurs, and store it in the log together with the exception, so that you know where to look in the log file. This way, you present your user with a message like "ERROR 2624501; Please contact technical support" but the error number is always different, even if the type of error is the same.

If collecting log files is not easy, then you need to have a means of uniquely identifying every single place which may throw an exception. The way this is traditionally done is by means of release number, source file name, and line number.

Every exception that you throw contains a stack trace, and the topmost entry of the stack trace should contain the file name and line number, so you can report those. (Of course, the exception may have been caused by another exception, so you may first have to traverse the cause chain of exceptions first.)

The release version is necessary because you are likely to be making a release to the users and then continuing to work on your source code, so the code changes, so a particular line which throws an exception on the end-user's release may not be throwing any exception on the code that you have in front of you, so you may need to look for a specific version of the source file in your Version Control System.

If you do not want to be divulging to your end users highly technical information such as source file names, then you may need to obfuscate them by associating each source file name to yet another number, and only showing your end user numbers.

You could even automate the validation of this procedure, so as to ensure that no two classes have the same number. You can have each class which may throw an exception register with a "central obfuscating authority" object, which makes sure that the unique number of each class is really unique, so if it is not, then it will break during a test run and not in the field.

Unfortunately, reporting just the place where the error occurred is not very useful, because for example you may have one central function which opens all your files, so an exception thrown by that function does not tell you much. Which subsystem tried to open that file? For what purpose? What was the filename? That's why complete exception stack traces are useful.

The method that I would recommend is a bit involved, but it is the best.

Take your entire exception stack trace in a string, including all exception messages and all causal exceptions, and apply ZIP compression to it to make it small. Then, apply base-64 encoding to it to get a string which can be copied and pasted into an email. Add a "more info" box to your error dialogs, which presents this weird long sequence of characters to your end users, asking them to email it to you. This way, you can reverse the encoding procedure, (base-64 decode, then unZIP,) and you can have all of the pertinent information in your hands.

Handling and managing error codes

3 Answers3