My client has a process which iterates over a number of actions that may or may not apply to a users portfolio. Quite frequently, processing of an action may give up and jump to the next action or portfolio without the process stopping (i.e. straight through processing).
Sometimes the decision which lead to the action processing being curtailed is written to the log and sometimes not. Even when it is, this is normally deliberately written at Info level rather than Error or Warning. There's a good reason for this: Errors and Warnings are also picked up by other processing rules and automatic tickets are generated. If we let this happen, we would be flooded with tickets, many of which would be unnecessary.
The problem is that occasionally a user will not unreasonably ask why an action was not carried out on their portfolio, and they normally do not ask until a few days or weeks after the processing has happened by which time the logs have often been purged.
I have suggested that we should introduce a more robust system for capturing these problems at the time they happen and have thought of the following:
- Simply improve the verbosity of the logging. Nice and easy but doesn't help with the log purging issue and doesn't make the problems any more obvious.
- Add a separate process to scan and parse the logs, picking up such issues. Benefit is that no or little code change is needed, but it may not be very obvious that this process is running or exists and it could be undermined if the logging messages are changed causing pattern matching to fail. Could be a maintenance headache.
- Alter the code to capture the messages separately to a database table. More robust and obvious in the code but the change is bigger and the design of the new database table might become a challenge (i.e. should it just be a free-text field with a date-time index or something more structured. The structure could become more complicated if parameterisation is needed - and XML blob maybe?
Other aspects that drive the design are: How easy can we use this information? (i.e. could we present it to the user themselves to prevent them calling us in the first place?) Do we need to create a UI to see / monitor it? Can we quickly search the data and access the details for the specific issue in question?
If anyone has any experience of creating (or better still bolting on) such functionality, I'd be very interested to hear of experiences etc.
The tech stak we're using is C++ (backend) + simple SQL database and Javascript / web front end.
However, I'm not so bothered about tech solutions as which solution is (a) Most effective for the least effort, (b) most maintainable and (c) adds most value in terms of reduce the time spent turning around enquiry tickets.