User recoverable exceptions should not be "exceptions". Exceptions are for exceptional circumstances. Transposing a few letters in a form field is something that you should expect and plan for.
Part of the impetus behind a "Service-Oriented Architecture" is that services are reusable. Sure, it might be a client sending messages to it... or it might be another service, or an orchestration engine, or an event subscriber, or an automated task or batch job. These actors can't possibly be able to reliably recover from a fault, no matter how much detail you put into it. In many cases they may even be using one-way messaging (i.e. MSMQ), in which case you're not even allowed to send a fault back; there's simply no channel for it.
Once a service has made the decision to send back a fault message, assuming that the originator can actually receive it, then all the originator can sensibly do is roll back the transaction it's in - if it was smart enough to enlist in one.
Juval is exactly right. Marshaling fault messages into client exceptions is fine when you've exhausted all other options (i.e. unhandled exception), but there is no point in the service trying to provide all kinds of detail. None. Users will not read or understand the error message, and if you think having a stack trace is a benefit from the user perspective then you don't understand the first thing about usability.
Microsoft actually tells you to put exception detail in faults. But don't. Please don't. It just encourages you to be lazy and fault when you really should be handling the errors. I've been down that road and it is one of never-ending pain and misery. It's especially pernicious in WCF because faulting permanently invalidates the service proxy, and it's actually very difficult to design client apps to recover from this, particularly if you're following other "best practices" and doing dependency injection.
What you should - nay, must be doing is logging all errors on the service side, generally into persistent storage, and sending notifications as bug reports. More sophisticated, service-bus architectures will even have an error queue which holds all of the original messages that caused the errors - but at the very least, you want the errors themselves. You want them - not your users. Don't rely on them to give you the stack traces, because if you do, then you have already failed them.
"User recoverable exceptions" simply do not exist in an SOA. There is no such thing because you can't know in advance who the "user" is going to be. If an exception is recoverable then it should be part of the message - for example, in XML form:
<customerUpdateResponse customerId="123" status="notUpdated">
<validationErrors>
<requiredFieldMissing field="fullName"/>
<maxLengthExceeded field="phone" maxLength="30" actualLength="45"/>
</validationErrors>
</customer>
This is just off the top of my head, but hopefully you get the idea; if an operation can fail for known, documented reasons then that "failure" becomes part of the specification. In this case, the message is sending back an event saying what happened, and the client application can interpret this data appropriately. The important thing is that it is part of the contract, not some unexpected "stop the presses" error.
Now I know that WCF lets you use fault contracts and so on, but honestly, I don't see the point, it's just adding complexity where it's not really needed. SOAP faults are, honestly, a pain in the butt to deal with from any angle.
As mentioned earlier, you also have to carefully plan for the case where you can't send any response. Fledgling "SOAs" with a smattering of web services tend to be predominantly RPC style, but that's actually a poor strategy for designing a robust high-performance architecture. The killer feature of an SOA, in my opinion at least, is publish-subscribe, which allows you to totally decouple the services themselves and only ever share messages. But this comes at a cost: you have to dispense with two-way communication. If a service wants to fault after consuming an event, well, great, but nobody's going to be listening. Which means that proper logging and exception notification is really, really important.
A good overall strategy for the second case is to define a generalized message type for unrecoverable errors (technically you could just use the FaultException
) and install a component in the pipeline which forwards all faults to a fault queue, thus (a) ensuring that you don't lose any, and (b) collecting them all into a central location, which will make your life a whole lot easier when you have 30 different web services on 10 different servers. It's really very easy to set up a global exception handler in WCF - just attach to the Faulted
event of the ServiceHost
. You can also install your own IErrorHandler
to do all of this before the fault ever happens - your choice.
But in summary: Instrument your systems so that you can resolve serious issues proactively and don't fault for recoverable errors. To the end user, downtime is downtime; make the exception details discoverable for developers and support staff but don't leak them to users.