When Microsoft 365 Services Are Interrupted, You Need Better Visibility
Written by Microsoft MVP, Nick Cavalancia.
In the past two weeks, Microsoft has experienced three service outages. This unlikely event demonstrates that every cloud services vendor is susceptible to interruptions in service that impact customers worldwide. According to Microsoft Service health status updates and social media posts, the most recent outage on October 7, Teams, Exchange Online, Outlook.com, SharePoint Online and OneDrive were all potentially impacted. Apparently, “a recent update to network infrastructure resulted in impact to Microsoft 365 services,” according to the Service health update.
This comes on the heels of Microsoft 365 having issues authenticating users to Azure Active Directory back on September 28, with Microsoft initially blaming a software code issue. Users were experiencing extremely long login delays or unsuccessful logins altogether.
It’s great that Microsoft provides a means by which its users can easily stay updated. And this high-level information probably suffices when it’s everyone that’s impacted. But what happens when the issue or the impact is something far more granular; that is, what happens if the problem is just, say, Teams meetings, and only for users in one that only impacts a specific geography? Being given generic service outage details doesn’t help organizations make business decisions to keep operations moving. Organizations relying on Microsoft 365 to operate need much better granularity so that any needed responses can be swift and purposeful.
One method is the use of monitoring services via Synthetic Transactions, which I’ve written about before. By automating the same activity as an actual user, organizations can determine exactly how the organization is being impacted and, in some cases, do so well before the outage actually has an effect on productivity.
Take the example of measuring Teams call quality (shown below as a measure of an average “opinion score of the quality of calls). By understanding both what and who is impacted, as well as where the impact is being felt, IT teams can take immediate action – for example, pushing out an alternative solution for digital calls or online meetings.
Synthetic Transactions can be configured to represent specific locations, operating systems, connection types, network routes, and more, giving organizations far more visibility into how an outage affects your users.
At the same time, just because there’s an official outage, it doesn’t mean that one of your users is experiencing it. Let’s say the October 7th outage had only impacted Teams, but one of your users was calling into the service desk to inform you their Exchange Online-based email isn’t working. The assumption might be that it’s all tied together. Instead, with synthetic transactions, IT has the visibility to see if specific services are working, and whether it’s just that one user. It could turn out that it’s the user’s VPN configuration routing them internally, rather than utilizing a split tunnel configuration to allow Microsoft 365 traffic to flow directly to Microsoft. In this instance, there is something IT can do, but without the needed granularity, they’ll never know.
As with any cloud-based application as a service, the service availability is the responsibility of the provider. However, ultimately the organization looks to IT to ensure service quality. And, given the critical reliance put upon a service like Microsoft 365, as well as the importance of your users productivity when using it, these three outages may be a sign it’s time to take matters into your own hands (as much as is possible). Whether the issues lies with Microsoft, or is rooted somewhere in your organization’s infrastructure, you need to look for ways to granularly monitor Microsoft 365 usage to produce insightful and actionable detail to help proactively act to reduce the impact of an outage as well as improve the organizations productivity during one.
To learn more about what it takes to properly monitor Microsoft 365 and understand the state of your user’s experience with it, read the whitepaper 5 Times You Should Be Monitoring the Office 365 User Experience.