BT simulated a disaster to check its ability to cope - while loss adjusters Cunningham Lindsey faced the acid test of a serious fireBT
A clear June morning in Edinburgh was the setting for BT's most recent disaster recovery exercise. Its aim was to test and provide feedback on BT's ability to cope with a crisis that caused a collapse of telecommunications, resulting in disruption to residential, business and emergency services.
The scenario revolved around a simulated disaster, where fire had resulted in the partial loss of Edinburgh Castle. The consequence was the loss of a System X exchange - resulting in 30,000 lines going down, large numbers of payphones being out of service, widespread loss of 999, and the main police control ISPBX switch being out of action. Key individuals present at the site included observers from the Cabinet Office Civil Contingencies Secretariat, Barrie Maloney and Anna Bostromer.
One of the most serious consequences of the system failure was the immediate shut down of 999 services within the area. Getting these up and running in the shortest amount of time was top of the agenda. BT called upon its Commsure arm to achieve this goal.
With the BT Commsure mobile unit on 'hot standby', the team arrived well within their 12 hour deadline - the standard time that BT Commsure guarantees to reach any site within the UK. The 3.5 tonne unit that was deployed houses a Realitis DX PBX switching system capable of supporting up to 1,000 extensions, 300 digital trunks, 32 analogue trunks, and 8 switchboards.
The team immediately set about deploying the ISPBX switch, making it fully operational and networked to the embedded ISDN switch, located in Coventry. The combination of a speedy arrival, and BT Commsure's ability to configure the unit quickly, meant that limited 999 access was achieved in less than six hours. Full restoration of the service was completed within 12 hours, well inside the target 24-48 hour deadline.
What purpose do such exercises serve, and what do we really understand by a 'successful' exercise? Dave Birch, operations director, BT Commsure, comments: "There is a certain amount of irony in that the success of these rehearsals isn't necessarily about having everything go to plan. I find there is greater learning when things break down or equipment does not arrive to schedule; the exercise scenario should really be a situation whereby you push your contingencies to their absolute limits.
"Our business is about managing failure, or at least the threat of it. Risk managers, business continuity consultants, disaster recovery specialists and the like should not be afraid of seeing their best laid plans go up in smoke. Although the nuts and bolts of the rehearsal were quite standardised, it's always important to maintain the greatest vigilance in situations which are practised time and time again. It is easy to let complacency creep in. Despite working on countless scenarios like the Edinburgh one, the more straightforward the scenario appears, the more worried I am, because that is when the unexpected often causes the greatest headaches."
The testing rhetoric runs deep within such exercises. The process not only allows organisations like BT to make their recovery plans more robust, but it often helps individuals understand their businesses better. In this particular scenario, one of the key ways this happened was by the analysis of communication workflows between the BT divisions, including those that were both on and off site.
Birch adds: "Such exercises are based on a command, control, communicate, doctrine, which gives us the means to optimise our disaster recovery initiatives. Learning the value of lines of communication is a lesson that I cannot advocate strongly enough. It is also important to remember that testing such workflows does not require a full disaster recovery exercise. Even the most simple exercises can allow individuals to understand how communication flows throughout any disaster recovery team. An important starting point, though, is recognising the diversity within the team and understanding the components that must come together to make it a single entity."
In exploring the results of the exercise, BT Commsure highlighted that strategic and tactical improvements could be made in the way in which various BT groups functioned together. Naturally, getting the right people to the right place at the right time is crucial, but it is equally important to put in place procedures that allow those individuals to work together in a seamless fashion.
Noel Lindford, operational readiness technical support manager, BT Wholesale, explains: "As far as tangible outcomes go, we have since looked towards creating service level agreements between the various divisions. This means we can have a clear benchmark regarding exactly what services can be delivered, and how quickly."
Perhaps the most striking thing to come out of an exercise such as this is the realisation that despite significant technological advances, such scenarios, whether rehearsals or the real event, will always be at the mercy of effective communication.
The Chartered Institute of Loss Adjusters' newly-elected president, Michael Burnett, had just flown back from an Institute engagement in Italy and was ready for some overdue sleep when he received a telephone call saying that there had been a serious fire at his office. This was not the best start to his presidential year and certainly not how he planned to spend his Saturday evening.
However, within 24 hours, temporary premises were secured, emergency telephone and fax communication had been established and a full clean up programme had been engaged. By Monday morning, IT systems had been re-routed to create a 'virtual' office in Birmingham, and the temporary office was operational on an emergency basis. In less than a week, Cunningham Lindsey's Belfast office had completely relocated to nearby premises without significant loss of service or turning away any new claims. In fact, they took two major losses during the same week and genuinely maintained 'business as usual'.
So how does a business cope in such a situation? A lot of factors come into play, but planning is an important ingredient, says Peter Lascelles, compliance officer for Cunningham Lindsey. He explains his approach as follows.
"Corporate governance requires planning in conjunction with risk management to ensure that stakeholders are reasonably protected. To do this, a business needs to weigh up the risks it faces and make decisions about how it will deal with them. One of the options for handling many of the risks, or reducing their impact, is to have a business continuity plan or disaster plan as it is sometimes called.
"Not only is this good practice, but it is now mandatory for firms regulated by the FSA. Such firms are obliged to ensure their outsource providers also have plans, to avoid any gaps. As a result, there is a great deal of activity on this front but it is surprising how many businesses are still unprotected.
"Where do you start with disaster planning? In simple terms it is a case of identifying the worst things that can happen to your business and working out how to handle them. This is very crude, but may encourage those without plans to grasp the nettle and get started. Useful guides and information are available from the Business Continuity Institute.
"However, before the planning starts, all classic texts emphasise the need for board commitment, to ensure the correct response from everyone in the business. It is recommended that someone in a senior position leads the project. The recommendations also include the formation of recovery teams to take charge in the event of a serious incident.
"The task really begins with an assessment of the main threats to the organisation. Once these have been identified, planning becomes more of a logistical exercise. One approach is to work out what the business would do in a worst case scenario and then develop this for different contingencies. A good starting point is to consider what would happen if your business premises burnt down.
"You then need to consider how this would impact on the operational elements of your business. In the service sector these are broadly:
"These points are not in order of importance, but any plan needs to address how each would be sustained in a worst case scenario.
"The critical period is during the immediate aftermath of a disaster, when good planning brings about an effective response rather than chaos. This gives a valuable breathing space while the recovery team get to grips with the situation. While dealing with the incident itself, you should also have a mechanism in place for handling any media enquiries and informing clients.
"And what about the customers? One of the early stages in our plan is to remotely divert inbound telephone traffic to our call management centre. Staff there have access to our networked claim handling system and have the option of either dealing with enquiries themselves or forwarding calls to relevant field staff via their mobile phones. This proved particularly effective in the case of Belfast.
"A key factor is whether or not your business operates from a single location. Having a network of offices obviously gives greater flexibility for redistributing telephone traffic and workloads, but this assumes you have adequate resources and systems in place elsewhere. Advances in technology are leading to less dependency on premises, and this will impact directly on the shape of a disaster plan. Some businesses already allow all staff to work remotely and this takes the premises issue out of the equation altogether.
"Having a networked IT system certainly gives flexibility for sharing work. However, any shared resource such as an IT network can itself become the Achilles heel for the whole organisation if it goes down. Network resilience is therefore an important factor and there must be a backup plan for restoring the master system.
"Our plan included an IT/telecoms hit squad, which immediately flew out to Belfast when the disaster plan was invoked. They were able to reinstate the whole IT system and telephones at the alternative site in less than a week. This is when you realise how important IT/telecoms are to your business and how vital that backup tape is.
"However good your planning, it is the commitment of those involved that makes the difference between success and failure. The turnaround achieved by staff at our Belfast office was a measure of their determination to come through the disaster without letting customers down. But people probably expect nothing less from loss adjusters if they practice what they preach."
Following the fire at its Belfast office on a Saturday night, Cunningham Lindsey contacted restoration specialist, The Revival Company on its emergency 24 hour helpline. The Revival Northern Ireland crew responded by 8am on Sunday morning. The fire had severely damaged the reception area, and smoke damage affected all other areas of the office. Within 24 hours Cunningham Lindsey was relocated to a new location in an adjacent empty building. The Revival Company swiftly cleaned the office furniture and restored 99% of the company's files. Mitigating subsequent loss and preventing contamination to neighbouring properties were also main priorities, with exposed decorative steelwork being cleaned and stabilised, and major contaminants removed. The new office was operative on Monday morning with computer access restored later in the week. The original site in an old chapel with an ornate ceiling is being restored to its former glory by The Revival Company and appointed contractors.