Research & Consultancy

^ /Panic button procedures

There exists a so called Catch-22 situation within commonly used organisational procedures.

Often employees are bound by procedures which state that: "Actions which may have serious impact must first be approved by snr management"

But in case of an Unexpected emergency with very little information available and great uncertainty about how much time can be used to get more info before it is too late to act, the person on the dance floor wished their superior could actually understand the situation and make the decisions for him/her.

Sample:

From logbook from network & security administrator for a multinational bank company.
01:23h , automatic message on cell-phone. "Warning: Subnet 04 unreachable"
01:30h, using remote access link to access network monitoring server, from home.
~ Network status page shows 2 Red network segment icons.
~ Also core network switches both at 99% load.
01:35h, accessing backbone switch nr1 management interface, to get status details.
~ switch has too little processor and memory resource free to be able to run any diagnostics.
01:38h, sending a Reset command to switch nr1, in the hope it will go back to normal.
01:40h, can not access switch nr1 or nr2, both return "invalid password!"
01:42h, called department manager,  did get voicemail and left message to me call back ASAP.
01:45h, can not access monitoring server anymore, it is connected on the other side of those switches.
01:51h, Can access anti-virus server and internet link router.
~ anti-virus server logs say it has had no updates for 2 days.
~ internet link router status logs show 81 unsuccessful access attempts around 01:15h.
~ firewall logs show odd traffic patterns from internal finance system to several outside internet addresses.
!~ This looks a lot like a hacker has gained access, should disable internet link now.
!~ Need authorisation to disable internet link, will cause large amount of lost transactions from trading systems in subnet 2 & 3.
02:02h, Called department manager again, still voicemail.
02:05h, going to the office.
02:10h, .. still voicemail.
02:16h, Call from department manager, He assumes it must be a virus, have to report back after running an update cycle on the anti-virus server,  and see if it can detect and remove it. Even after i repeatedly explained that all the signs indicated a Hacker attack!.
02:28h, Arrived at office.
02:37h, Anti-virus server updated, but detected nothing unusual.
02:40h, Finance Servers in subnet 4 all have missing/removed system-log entries between 01:00 and 02:10h.
~ Switches back to 18% & 20% load.
~ Internet router statistics show 1.8TB of outgoing data during the last 4 hours.  Between 10MB and 80MB of outgoing traffic is normal when night time automatic software updates happen.
~ Can access backbone switches again after hard reset and re-loading the configurations from backup.

02:45h, Called management again, informing about the current situation.
!~ Now i get authorisation to disable the internet link.
....
.......
06:30h, Backup and Recovery people are restoring backups.
07:00h, going home, too tired to think straight.
~ internet link still disconnected.
~ external incident investigation specialists will be asked to create a report with a copy for the police.


-------------

You would not want to be standing in his shoes that night, Right?
Sadly, such unfortunate series of events happen around the world, because of the unforeseen exception to the strictly enforced procedures.
Procedures which looked like they well cover all types of situation when they were created in line with those before mentioned ineffective "officials standards".

Comment 1.: After such an 'incident', the focus is normally on preventing the same 'intrusion' from happening again, so patching the technical holes.  All assume that the procedures are correct, so no one bothers to have a closer look at them to see if they need to be corrected. Also because the front-line worker had those rules imposed by management, and the manager has little interest in changing procedures that seem too complicated.

Comment 2.: Now lets say that on-call specialist had some of our training in incident handling, and learned how and when to take control in a well-covered way. Then management could have entrusted him the power to decide him self. Then he could have disconnected the internet link remotely at say 01:40h, rushed to the office, have some time to investigate what actually was going on, thus preventing the wiping of the logs and more loss of data to the intruder. After adding a few extra rules to the firewall device, allowing only known internet servers to talk with only the trading systems and re-enable the internet link with a close eye on the firewall logs. Then he could have had a that chat with the manager about the next steps and the damage could have been minimal.

Comment 3.: The common foundation Catch-22:

    The technician has to do all he can to protect the systems.
    In order to do so, he/she needs in some situation's approval from above before doing so.
    The layer above is not in a position to make educated decisions about such matters.
    The technician can not do what needs to be done.

Therefore, a more appropriate structure is needed for such situations, [see Advanced Self Correcting Structure]