Major Azure Outage Disrupts VMs and Identity Services for Over 10 Hours
Microsoft’s Azure cloud platform experienced a significant outage that lasted over 10 hours, starting Monday evening and extending into Tuesday morning. The disruption affected core parts of Azure’s infrastructure, causing widespread issues for enterprise customers. During this time, users faced problems deploying virtual machines and managing identities across multiple regions.
What Caused the Outage
The root of the problem was a policy change that was unintentionally applied to a subset of Microsoft-managed storage accounts. These accounts are used for hosting virtual machine extension packages, important for VM setup and updates. The change effectively blocked public read access, which prevented many VM operations from completing successfully.
Customers reported errors during VM provisioning and scaling, with some unable to create or modify virtual machines. The outage also impacted other services that depend on these storage accounts, including Azure Kubernetes Service, Azure DevOps, and GitHub Actions. Performance issues were seen when downloading extension packages, further complicating recovery efforts.
Impact on Identity and Other Services
Microsoft quickly deployed an initial fix within about two hours. However, this workaround triggered a secondary issue involving Managed Identities for Azure Resources. Customers attempting to create, update, or delete resources, or obtain identity tokens, started experiencing authentication failures.
This second problem was linked to a traffic spike that overwhelmed the managed identities platform, especially in the East US and West US regions. As a result, many Azure services relying on managed identities, such as Azure Synapse Analytics, Azure Databricks, and Azure Container Apps, faced disruptions. Microsoft tried to manage the overload by scaling infrastructure, but when that proved insufficient, they removed traffic from the affected systems to repair the underlying issues.
According to Microsoft’s status updates, the outage impacted a broad set of services, including Azure Firewall, Azure AI Video Indexer, and Azure Database for PostgreSQL. The incident underscored how a single misstep in policy changes can ripple across a cloud ecosystem, causing widespread operational failures.
Industry experts note that outages of this scale are becoming more common among major cloud providers. As cloud infrastructure grows more complex, so do the challenges of maintaining uptime and stability. This incident highlights the importance for organizations to have contingency plans and resilient architectures to cope with such disruptions.















What do you think?
It is nice to know your opinion. Leave a comment.