Test, test and increase your Resilience: how to build your testing programme

This year has been exceptionally trying for individuals, businesses and governments globally. Living and working in a crisis mode introduced an array of challenges, with some firms dealing with them better and faster than others. What is the common denominator? The answer in most cases is strong crisis reflexes, built over the years with consistent effort.

Testing is an important part of operational resilience and can take many shapes and forms, from disaster recovery testing for ensuring service continuity to end-to-end crisis simulations examining decision-making. It enables to proactively manage risk, embed crisis management framework, and allows to continuously improve capabilities such as business continuity (BC), crisis management (CM), disaster recovery (DR), and cyber resilience (CR). Needless to say, training plays an important role in such a testing programme.

“Better awareness nurtures an organisational culture that embraces operational resilience and, as a result, improves the company’s preparedness to deal with adversity.”

From firm to firm, good testing programmes vary in nature, scale and complexity. Depending on how a firm is structured and what it does, testing is addressed at different organisational levels and locations, with involvement of external parties (i.e. critical suppliers). In reality, given little guidance from the regulators on what ‘good’ looks like, programmes are often fragmented and can cause a real headache.

Principles for creating a successful testing programme

While there is no silver bullet to creating a fit-for-purpose testing programme, we recommend following 6 guiding principles to devise one that is successful and tailored to your organisation’s needs. Following these could significantly improve the outcomes of the programme.

1. Think long term

When constructing a testing programme, it is of paramount importance to define what you want to achieve in 3 years. A focus on outcomes provides the required direction yet allows the flexibility to re-shape the testing programme each year in order to respond to changes while focusing on the end goal. Begin with small and less complex tests, such as test walkthroughs, and progress to very involved, realistic crisis simulation exercises.

2. Start with threats

Every test needs to link to threat(s) resulting in one or several plausible major incident scenarios (and impacts). Anticipate and understand new threats through market watch and leverage audit reports and risk assessments when building or reviewing your programme.

3. Focus on Important Business Services (IBS)

Align testing of existing contingency arrangements to important business services and key processes. This ensures preparedness when a situation of high business impact occurs and avoids challenges arising from lack of end-to-end vision.

4. Diversify testing

The most likely and most impactful scenarios should be examined with different stakeholder groups through different types of testing. This ensures that the theory works in practice and different reflexes are embedded in the organisation’s DNA.

To achieve more benefits, go beyond standalone contingency plans and comms tooling testing and examine a combination of them with internal and external, business and technical stakeholders.

The radar above is an indicative example of what a good testing programme would consist of. The threat categories considered are random and could be selected differently as long as diversification is maintained (mix-and-match).

Crisis simulations

Crisis simulations examine a hypothetical disaster situation with defined parties and multi-cells of stimulus. They allow to rehearse the establishment and communication of recovery requirements and carry out relevant activities effectively. Crisis simulation can be a tabletop exercise (level 1), a hands-on simulation (level 2), a multi-cell hands-on crisis simulation (level 3) or an international hands-on multi-cell multi-party simulation (level 4).

Work area recovery testing

Work area recovery testing checks whether full end-to-end business processes can be run offsite, ensuring that all elements of a process can be completed during a test and not just the technical aspects. They can involve a team (level 2) or a number of geographically dispersed teams (level 3) working from recovery sites or home. Both third parties (i.e. outsourced teams) and internal teams should be considered.

IT Disaster Recovery Plan and Cyber range testing

IT DRP and Cyber range testing practically examines each step in a specific disaster recovery plan or tests cyber forensics capabilities. This ensures the possibility to recover data, restore critical IT system after an interruption of its services, critical IT failure or complete disruption due to cyber attacks or IT disruptions. This testing can happen as a standalone (level 2) or as part of a crisis simulation (level 3-4).

Business Recovery Plan Walkthroughs

Business Recovery Plan walkthroughs for group/business divisions/business units are undertaken following a major revision of a plan or team and are designed to increase the understanding of the recovery processes, roles and responsibilities, and question the suitability and completeness of the plan. Normally this would be carried out as a review-and-challenge session with the plan owner and a BC expert (level 1) or to test the efficiency of the specific measures and planned workarounds (level 2).

Communication cascade tests

Communication cascade tests establish whether contact details are accurate, determine whether cascade roles and responsibilities are understood by staff, and establish whether or not the documented procedures are robust. They can be completed in one of three ways – either a standalone live test (e.g. text cascade; level 2), as part of a crisis simulation exercise (level 2-4), or an audit involving review of plans and interview of staff with key responsibilities (level 1).

5. Stay current

Review your testing programme at least once a year in order to adapt to the changing threats landscape and ultimately ensure operational resilience. Make sure your crisis management framework and contingency plans are regularly improved based on the testing outcomes and changes in the business.

6. Engage and drive

Involve different parties in shaping and running your testing programme (e.g. cyber, risk, Ops, DPO, legal, business resilience champions, etc.). Use MI to share progress and alignment with the 3-year operational resilience vision.

What next: how do you structure your testing programme?

While it is not possible to prescribe a testing programme without better understanding the organisation of interest and deep-diving into the specifics of a threat landscape, it is clear that investing time and resources is worthwhile from operational resilience and regulatory standpoints.

“Having recently gone through a pandemic, it is a high time to keep the momentum and continue fostering the right culture and correct reflexes for the next major crisis.”

A few concluding tips

Make it realistic: Where maturity allows, aim for more complex and realistic tests as they are essential to effectively respond to real events and increase end-to-end resilience. This means engaging more internal and external parties in the ‘live’ exercises.
Leverage internal and market crises: Continuously monitor events happening on the market (major incidents and crises) as well as your internal major incidents to feed your testing program, prioritise your threats and devise your scenarios making it more tangible for your stakeholders.
Engage early: Share the vision for testing with key stakeholder groups so they understand the journey on which you want to bring the organisation. This will enhance collaboration and, therefore, outcomes.
Facilitate remotely: Remote working arrangements should not put your whole testing programme on hold - use collaborative solutions or leverage tools from the market for carrying out the exercises. This is especially relevant for cyber range testing and follow-the-sun testing. Experience shows that digital workplace solutions introduce a more democratic participation and is an excellent way to record interactions.
Continuously improve: Reflect on tests by producing post-test reports and defining an action plan to drive and track improvements. Involve key stakeholders throughout so they understand the gravitas of the outcomes and help with driving positive changes.