Recent, high-profile network outages have shown how the convergence of people, processes and systems, accelerated by the introduction of new technologies, has created the potential for systemic failures to proliferate across the telecommunications eco-system, impacting multiple communications providers (CPs) at the same time.
Against this backdrop of increased risk and uncertainty, CPs should ask: are our existing approaches to resilience sufficient to meet customer and stakeholder expectations for service delivery?
We outline four areas that heads of resilience and executive team members may find helpful to consider within the context of their own CP’s approach to resilience.
1. Align risk management processes, external suppliers and other parties to help identify and manage emerging issues
CPs may outsource operational aspects of service delivery, relying on external facilities and procuring network services from other suppliers. However, they can’t outsource the risks associated with these. CPs need to understand, and where possible mitigate, the concentration risks from increased reliance on shared infrastructure services and common suppliers. This may mean:
- Better aligning risk management systems, such as operational risk, quality management, cyber and business continuity, to share information about known risks quickly and easily;
- Adopting a ‘business services’ view to understand what is critical to customers and using this view to identify operational dependencies and concentration risks that need to be managed;
- Working with external suppliers and other parties to ensure they have complementary approaches to resilience. These might include shared understanding of delivery priorities, critical business activities for customers, joint testing and exercises, and acceptance of alternative arrangements that may be used during a disruption.
2. Invest in diverse resilience solutions, but accept that disruptions will still happen
Although it is preferential for CPs to invest in measures to reduce the possibility of a disruption occurring, it is equally important to recognise that some disruptions will still happen. As such, it may be helpful for CPs to build a layered approach to resilience, catering for disruptions of different type and severity. This could include:
- Investing in redundancy where appropriate and/or diversifying technical solutions to reduce risks and technical hotspots;
- Focusing on enhancing processes for response, restoration and repair;
- Developing resilience capabilities to address reasonable worst case scenarios such as systemic or common-mode failure, as well as customer redress actions.
3. Build resilience and crisis management capabilities, especially at the Board level
Increased governmental and regulatory focus in the telecommunications sector and other industries is placing more emphasis on senior management and Boards to demonstrate commitment to, and accountability for, resilience. Board members may not be experts in operational resilience, but it is important they have the knowledge to ask the right questions and to make informed decisions at critical junctures. Board level involvement can be improved by:
- Asking the Board to review and approve the CP's tolerance for disruption. For example, UK financial regulators have recently introduced the concept of ‘impact tolerance statements’ for severe but plausible events in a discussion paper on an approach to improve the operational resilience of firms and financial market infrastructures.
- Asking the Board to participate in, or review the outcome of, reasonable worst case scenario testing.
- Ensuring Boards consider investment in operational resilience based on deficiencies in resilience arrangements identified through stress-testing. It’s worth noting that full testing of the network is nearly impossible, however, CPs should use the 'business services view' to focus on testing highly critical components whose failure could have the most severe impact across the service.
- Ensuring the Board is ‘crisis-ready’. This might include involving Board members in crisis management exercises to improve familiarity with the crisis management framework; exploring which decisions they may be consulted on and need their approval; and preparing the Chair for delivering an external media response, if required.
4. Focus on operational enhancements to future-proof resilience
Customers may acquire services from separate CPs believing these to be independent, but may not be aware that CPs may share infrastructure or rely on the same third party to deliver the service. It can come as a surprise, then, if a leased line or shared infrastructure fail and the customer loses multiple services simultaneously. To reduce single points of failure in the service and optimise the customer experience, CPs need to focus on future-proofing resilience. This could include:
Switching from reactive to proactive operational processes.
This means that CPs need to be proactive when it comes to managing faults and service degradation, for example by:
- Investing in technologies to improve diagnostic capabilities;
- Sharing documented strategies such as call gapping and prioritisation techniques for managing network congestion with other CPs that have a legitimate interest;
- Understanding response, restoration and repair times, focusing on escalation procedures and timely response management;
- Anticipating how critical services such as 999 and key customers will be prioritised during congestion periods.
Identifying opportunities to exchange information with other CPs
There are multiple sector-wide fora and protocols such as NEAT, TIDIE, and ResilienceDirect® that provide opportunities for CPs to exchange information proactively about the resilience of the wider network. Where possible, CPs might also consider participating in sector-wide exercises to identify hidden assumptions, network pinch points, and improve joint response capabilities.
Learning from past mistakes
CPs should treat past disruptions as an opportunity to enhance operational arrangements, inviting independent analysis where appropriate and routinely performing post-incident reviews to identify lessons learned and improvements.
The UK government’s Electronic Communications Resilience and Response Group recently issued updated infrastructure resilience guidelines reflecting an increased interest in the resilience of the UK’s Critical National Infrastructure and the communications providers operating within an evolving and hyper-connected eco-system.
The guidelines helpfully expand the discussion around what resilience is and distinguish it from traditional business continuity by advocating a more comprehensive framework for managing a broad range of disruptive risks. They also provide detail on how CPs can build operational resilience within their own organisations and the wider sector, including helpful technical and operational guidance and the standards required to achieve this.
If you would like to discuss the points raised here or the new guidelines please contact: Neil Bourke, Laura Schmuttermeier, or Lucy Jones using the details below.
Director, Crisis and Resilience
+44 20 7303 4682
Director, Risk Advisory
+44 20 7007 1457
Manager, Crisis and Resilience
+44 20 7303 4656