Continuity & Resilience • Article

Resilience is a daily discipline, not a document left in a drawer

⏱️ Estimated reading time: 8 minutes

Technology incidents and third-party failures continue to prove that continuity is built in practice, every day.

Technology incidents and third-party failures show that resilience is a daily discipline, not a document stored in a drawer.

Recent cases that have proved this include:

  • CrowdStrike global outage (Jul/2024): a defective update caused BSOD (Blue Screen of Death) across millions of Windows systems and disrupted operations across multiple sectors — a “mega-incident” without malicious intrusion. Reuters
  • Iberian blackout (28/Apr/2025): a massive power failure paralysed Portugal and Spain for hours (transport, traffic lights, mobile networks), with services gradually restored. National reports identify the Iberian grid as the technical source (with no evidence of a cyberattack). RTP
  • Google Cloud (12/Jun/2025): an outage affected popular platforms (e.g. Spotify, Fitbit), exposing dependence on hyperscalers. Google Cloud Status
  • Microsoft 365/Outlook (Jul/2025): a prolonged disruption (~19h) halted email and calendar services — caused by configuration errors, with no evidence of a cyberattack. Computerworld
  • Attacks through suppliers: a significant increase in incidents affecting and occurring through the supply chain in 2024/25. Financial Times
  • Downtime costs: 2025 studies show significant losses per hour and a clear tendency for companies to underestimate them. IT Pro

What does “modern” continuity look like?

  1. From technical scenario to customer impact: start with the BIA (critical services, RTO/RPO), not with a simple list of servers.
  2. Five mandatory testing scenarios
    • Update failure (CrowdStrike-type) — widespread unavailability without a security breach. Reuters
    • Hyperscaler failure (GCP/Azure/M365) — temporary loss of shared services. Google Cloud Status
    • Critical third-party outage — SaaS/outsourcer failure beyond its RTO. Financial Times
    • Ransomware/data compromise — deciding between recovery and rebuilding.
    • Infrastructure failure (power/telecoms) — regional blackout with degraded-mode operations (UPS/generators, alternative communications) and a public communication plan RTP
  3. Resilient architecture to withstand failures safely: segmentation, break-glass accounts, immutable backups, multi-region design and functional degradation plans (minimum viable services).
  4. Governance and reporting: for financial entities, DORA requires resilience testing, third-party management and the ability to demonstrate evidence to supervisors; entities under NIS 2 must alert within 24h, notify within 72h and issue a final report within 1 month. Digital Strategy

7 metrics that matter (and that the Board understands)

  • RTO / RPO by critical service (target vs. achieved in testing)
  • MTTR by incident type (average and p95)
  • Coverage of restored backups (last 90 days)
  • EDR/SIEM coverage across endpoints and servers
  • % of runbooks successfully tested (and with defined owners)
  • Third-party dependencies mapped by service
  • Time to communicate with clients/regulators

100-day programme

  • 0–30 days: rapid BIA; map of services and dependencies (including power/telecoms); prioritisation by impact.
  • 31–60 days: executive exercises (table-top NIS 2 — 24h/72h/1 month), restore test, communication plan.
  • 61–100 days: technical test (update failure or cloud loss), blackout rehearsal (UPS/generators, alternative communications), review of third-party contracts (RTO/RPO, incident reporting, right to audit), and after-action review with improvements. Google Cloud Status

 

Author: Behaviour
Published on: 13 October 2025
Copying or reproduction of this article is not authorised.