blog

What is Operational Resiliency?

Summary

Operational resiliency is a term that has been showing up a lot in the last 12 months. Is it a new operating model for the current operating environment in light of a pandemic? This article describes what operational resiliency is and how you can build a foundation for, or migrate to, operational resiliency.

Operational Resilience?

This has been a challenging year for everyone. Business continuity and disaster recovery staff have been forced to implement their plans. Many organizations were ill prepared for how to convert their workforce from onsite to remote. What is operational resilience? Formally, operational resilience is a set of techniques that allow people, processes and informational systems to adapt to changing patterns or events. It is the ability to alter operations in the face of changing business conditions. Operationally resilient enterprises have the organizational competencies to ramp up or slow down operations in a way that provides a competitive edge and enables quick and local process modification. Wait, that’s business continuity and disaster recovery, isn’t it? Well, let’s look at the terms and see what we can come up with relative to how an organization can migrate to operational resiliency. Operational resiliency is becoming a key agenda item for senior management and boards of directors. Organizations have grown more complex to support the business need – Internet of Things (IoT), 24x7 customer support or on demand manufacturing, and data sharing globally. The malicious actors have developed methods that require organizations to monitor and operate at a high level of security and sophistication to ensure there are no severe compromises.

We know operational means that things are ready for use anytime and anywhere. To security or risk personnel, they think of the governance, risk management, and compliance (GRC) framework the organization has implemented. Other staff trust that we have a process to move to a recovery scenario when an event is declared, and something has gone wrong. Organizations should plan for the types of events that occur in their local and regional areas where they have offices or remote staff. Many organizations did not plan for a pandemic where everyone would quickly have to be working remote. Organizations struggled to ensure their communication lines, firewalls, virtual private networks (VPNs), and their multi-factor authentication solutions would work for everyone in the organization. Organizations needed to be operational from day one onward. If you look at a GRC framework, this is the capability to reliably achieve objectives, while addressing uncertainty and acting with integrity. The notion of operational resilience requires that we understand the operational objectives of the organization and in that context manage the risk and uncertainty by hitting those objectives while operating within the boundaries of values and requirements set by the organization.

We know resiliency is the capacity to recover quickly from difficulties and to be able to keep the business operational. Yes, it is similar to business continuity. It requires work from the primary support units and staff and the priority systems that need to be operational to support customers, internal staff, and third parties. Many organizations have the systems to support customers, internal staff, and third parties. The issue was the resiliency of staff who need to be online and available to support those customers, internal staff, and third parties. Enter the concept of operational resiliency and what organizations really seek to implement.

What does operational resiliency need to include?

  • Business impact analysis
  • Business continuity
  • Disaster recovery
  • Changing conditions or an event
  • Communication
  • Continuous operations
  • No need for formal recovery

Organizations may say they have that implemented, but do they really? Can they switch in less than one day from an onsite business to a remote business, supporting all functions for internal staff, customers, and third parties? Organizations cannot separate the business continuity and disaster recovery functions if they want to have operational resiliency. They need to make a smooth transition to be able to say to their staff, customers, and third parties ‘We are here to continue our superior support for your teams.’ Organizations that have call centers and the call operators who are onsite will have the hardest time with the conversion, because many call centers are like clean rooms. A clean area is a “specified area in which the concentration of airborne particles is regulated and classified, and which has been designed and is being operated appropriately for regulating the introduction, formation and deposition of particles in the area.” (ISO 1464 41-1) Cleanroom technology for a call center includes all technical and operational measures avoiding the potential risk of information loss, which can include: no recording devices, no paper for notes, not being able to browse the Internet, no smart phones, etc. How do you quickly convert this to home-based technology over night when it has not been done before in an organization? Organizations that work with intellectual property development, software development, formulas of any type, all need to think about how to protect the information when reacting to an event.

Why do we need operational resilience?

Essentially, organizations need to move to operational resiliency where they can bend, no matter what happens, but not break and stop business operations. Operationally resilient organizations will focus on a broader scope than just business continuity and disaster recovery to being integrated into the risk-mitigation strategies of anything occurring. The organization will focus on anticipation of an event, prevention (governance) and constant change, rather than individual recovery activities. The cloud and virtual architectures and systems will be even more important, but again that will bring in a supply chain with reliance on third parties for support.

Innovation. Competition from other organizations and customer requests continually challenge organizations to implement and improve products and services. To an organization this can increase the complexity of support and risk of business imbalance with their own mission, vision, objectives, and goals.
Legacy infrastructure and technology. Many organizations need to closely watch budgets and with the advancements in technology, an organization needs to keep up or risk losing market share and, potentially, customers. Using legacy infrastructure and technology can be slow, hard to maintain, and is a higher risk of vulnerabilities that cannot be mitigated.
Supply chains. To save money organizations are looking to support services and third parties to reduce cost, but may impact the protection of assets and recovery, depending upon the contracts. This makes an organization even more complex and is a fertile ground for ‘It’s not my job’ or ‘It’s not in the contract’.
Sharing information. As an organization’s customers and third-party base grows, information sharing grows and without a strong foundation in technology and controls information, loss will happen. Organizations need to ensure the information is protected to the level of importance to the organization and its customers.
Malicious actors. Malicious actors continually evolve, sometimes faster than an organization can implement strong security controls and technology. The organization has a challenging time in preventing, detecting, responding, and recovering from a cyber incident.

Characteristics of Operational Resilience

Organizational focus. Organizations need to transition from individual business units and staff and key technology impacts and recovery, to critical business services from end to end, not thinking about business and support silos. The organization needs to look at the broader impact of an event and not have it focused on Information Technology, it needs to be a focus from services to support.
Governance. Currently, most organizations think compliance when they think governance, including senior management and board of directors. Governance needs to clearly define the accountability and responsibility of each person and third-party in the organization. It also needs to incorporate the corporate risk appetite and continuous improvement into every service and process using controls, documentation, and the evidence of that improvement.
Integration. Currently, organizations focus on a business function technology to integrate with everything. Organizations need to take an enterprise-wide viewpoint when examining technology and all the features it can include, not best of breed and having way too much technology to maintain and coordinate. When looking at technology replacement or new technology, evaluate what integration resiliency for the business is built into that technology for the organization.
Measurement. The old reliable metrics. What are the organization’s tolerance for interruptions and downtime of a service, not necessarily to a specific technology? Develop business scenarios for interruptions and base measurement and risk on those scenarios, looking toward the future. These measurements will help with the organization’s continuous improvement efforts.
Preparedness. Yes, tabletop exercises and live tests are one thing, and are limited in scope, type of event, and time. There needs to be a list of the various types of events that can occur – technology and human, man-made and nature events. The plan needs to be documented, monitored, tested, adapted continuously, and be actioned on a moment’s notice without a business interruption to customers and third parties.

Questions senior management and the board of directors should ask

  • What is our risk appetite and how does resilience fit into that?
  • Are we getting the right metrics to see that we are doing continuous improvement? Or what metrics do we need for operational resilience and who owns them?
  • Does the organization map the business unit interdependencies to ensure that we can implement operational resilience and what the priorities are?
  • Do we have a list of critical assets needed for operational resiliency to ensure we know what to implement to keep our service delivery operational?
  • Like a business impact, what are our critical business functions, are they resilient, and who owns them?
  • How are we going to have to change to implement more resiliency for our services?
  • What are some of risks of not having some business services with built-in resiliency?
  • How do we monitor and measure our resiliency and the risks associated with resiliency?
  • What type of scenarios are outside our current operational resilience implementation?
  • Do those scenarios pose a high risk?
  • How do we test and ensure that our operational resilience is effective, no matter the event or disruption?

Steps to Achieving Operational Resilience

So how does an organization get started with migrating to an operational resilient framework? First and foremost, you need to start small to provide transparency for one business service to prove it can be done and make an impact to the business processes.

  • Foundation. You need to establish a foundation with an owner, accountability, and objectives for a resilient operating model. Conduct a resilience baseline assessment to establish your current baseline for resilience. Establish and articulate your critical business services within the organization.
  • Provide visibility. Ensure regular communications with senior management and Board of Directors. They need to buy-in on the risk appetite and the resilience targets for the business. Metrics to show progress will help here.
  • Focus on a single service. As with any large project, you need a pilot that will have a larger productive efficiency and effectiveness of a specific business service. Identify that service’s key dependencies, because other services may need to be brought into the pilot. Define the business service tolerance for risk and dependencies, then define your roadmap. Don’t forget to define the lessons learned for the next phase of the implementation.
  • Expand the implementation. Drive the improved resiliency based on the lessons learned from the previous implementation of operational resilience. Ensure that you have prioritized the critical business services for the phased implementation.
  • Documentation. Document the roadmap, the critical business function order, lessons learned, and the updates to management. You may also need to update business and IT procedures and standards to ensure the operational resilience support also improves.

As an organization goes through implementing operational resiliency, they will be able to reduce their operational risk exposure, improve monitoring, be able to respond to events with less of a business impact, be more effective and efficient in the delivery of business services.

Is operational resiliency something new or just a new term? Operational resiliency is a new term that encompasses our same processes and includes more improvement in business services to be able to recover quickly from almost anything that may happen in the business or organizations environment. So, review your current service operational framework and see how much improvement is needed to be able to achieve operational improvement and switch the operating environment without a great impact to the business services and organizational operations.

Conclusion

Achieving operational resilience continues to be challenging given the increasing complexity of processes, technology infrastructure, organizational silos, and location of staff. However, the business benefits go beyond pure risk and compliance, often forming an inherent part of an organization’s business services value. Operational resilience requires organizations to understand how all domains (technology, data, third parties, facilities, operations, and people) impact critical service delivery and build a consistent set of resilient capabilities and controls across these domains. You need the dependencies of cross-functionality and specialized expertise to evaluate and measure the resilience of the organization in light of the specific risks it faces along with extensive coordination, collaboration, and preparation to ensure that the organization appropriately considers resilience in all activities and is ready when the worst happens. Being a resilient organization, you focus on anticipation, prevention and adaptation, rather than just recovery actions for when an unfortunate event happens.