ISO 27001 and investigating production issues

Question

We get problems on production systems every now and then, most of which can't be replicated on dev/systest/uat for following reasons,

We don't have enough data on dev/systest/uat e.g. production has million rows... but our other environment only have few thousands
We don't have right data structures, depth of data etc...

As a developer I had been told that ISO 27001 credited developers can't touch production even just for investigation e.g. Reading data and not writing.

I personally don't have problems with this... but it makes investigating issues nightmare, we have to jump around infrastructure and support all day. Something we can test within hours takes days.

Is this how it meant to be like ?

That is why good logging is essential. You might work on an environment, where too realistic test-data is enough to inflict damage. For example XY's bank transactions. If You have good logging, than You might ask for the log's anonimized version, than work from that — ntohl, Feb 05 '18 at 10:09
@ntohl I agree but it's not logging and we are trying to implement that but it doesn't helps with volume of data — Mathematics, Feb 05 '18 at 10:30
@Mathematics Good logging also means customizable good logging. Narrow the field down where logging must be extended with extra information. Add some flags which switches between performance and lot of data. — ntohl, Feb 05 '18 at 11:11

score 3 · Answer 1 · answered Feb 05 '18 at 11:55

I've faced a LOT of these issues in the past and you're right, it's a nightmare. I suggest you start by putting yourself firmly on the side of the best practice and saying "Getting a copy of live, however tempting - is NOT an option". It puts you in the right headspace from the very beginning, for my reasons why see a previous answer of mine.

Getting a good test environment is crucial, these often evolve alongside your production environments and help test upgrade paths as well as regular bugs. Putting time in here and making sure you have a proper QA team and strategy will pay dividends down the line.

Having said that, this is real life and there are always issues which are only discovered in live. So, how on earth can you investigate an issue which is occurring on one system, for one client, and nowhere else?

The key is logging.

You have the code, and you have the logs. What you need to do is a process of elimination to work out what's going on at various stages.

But what happens if the logs and data you need don't exist?

Then you're a step forward, understanding what you need to solve the problem is the first step on the way to solving it. Identify what questions you have (did the code enter this IF statement or skip it) and prove it.

This is a lot easier said than done so having said that here are some pointers:

Your progress on these issues are now inexorably linked to your release schedules, rapid development and rapid deployment play a part more than ever.
Get the people who write the code solving the problems, otherwise you'll have a team singing the values of good logging and another ignoring them
NEVER log anything sensitive/inappropriate to insecure logs
Keep your communication open, a client is a lot more responsive if they know the plan, understand what is in the release and when they're going to get it
Developers being removed from live systems does not necessarily mean they can't ask questions, consider asking them to pair with the Ops guys asking questions but keeping their hands off the keyboad
Seriously consider leaving logging in place, if it stung you once then keeping the diagnostic resources in there will make it much easier to solve again.

The key to cracking this is having a good QA process and doing frequent incremental drops into production (which can help you investigate problems as you go). It's funny, that's the solution to a lot of software development headaches!

score 0 · Answer 2 · answered Feb 05 '18 at 15:56

0

For our onlineshop we have session based debug-logging.

Usually only important infos are logged (mostly log level warnig, errror and fatal are logged)

There is a shop-url-parameter that enables extended logging for a web session so your log gets additional info, trace, debug messages for the acitvity of that one customer that owns the session.

Unfortunately this only works for reproducable errors and you must ask you customer to do-it again with the url parameter.

answered Feb 05 '18 at 15:56

k3b

7,488
1
18
31

I've done something similar to this where we used Role based log levels. So users with a certain permission could log at more verbose levels than the rest. The big drawback with this of course is that a lot of the time a customer can't reproduce on demand! – Liath Feb 05 '18 at 17:23

ISO 27001 and investigating production issues

2 Answers2