The Lede

On May 19, 2026, Google Cloud incorrectly suspended Railway's production account, triggering an 8-hour platform-wide outage. The suspension took down Railway's control plane, API, and compute infrastructure, affecting not only Google Cloud but also other workloads on AWS and Railway Metal. This incident highlights the risks of relying on a single provider's account for control plane functionality in multi-cloud architectures.

Background & Context

Railway had flagged this architectural risk in its February 2026 postmortem, emphasizing the need for a more resilient multi-cloud setup. The company has been working to eliminate the single-point-of-dependency on GCP for route discovery and extend HA database shards across AWS and Metal. However, the recent outage demonstrates the challenges of achieving true resilience in the face of unexpected events.

Deep Dive

The outage was caused by an automated action taken by Google Cloud, which incorrectly suspended Railway's production account. This suspension affected Railway's control plane, API, dashboard, and GCP-hosted compute. As the route caches expired, other workloads on AWS and Railway Metal became unreachable, resulting in returning 404 errors. At peak impact, all Railway workloads across all regions were rendered unreachable. Railway's edge proxies, which depend on a GCP-hosted control plane to populate their routing tables, were particularly affected.

Expert Angle

Experts in the field agree that the incident highlights the risks of relying on a single provider's account for control plane functionality in multi-cloud architectures. 'Multi-cloud' does not mean 'resilient' if the control plane – routing, service discovery, configuration – sits in a single provider's account,' says a researcher at a leading cloud computing firm. Railway's plan to eliminate the single-point-of-dependency on GCP for route discovery and extend HA database shards across AWS and Metal is seen as a positive step towards achieving true resilience.

What Comes Next

Railway has announced plans to address the architectural risks exposed by the outage. The company will work to eliminate the single-point-of-dependency on GCP for route discovery and extend HA database shards across AWS and Metal. This move is expected to improve the resilience of Railway's multi-cloud setup and reduce the risk of similar outages in the future.