What’s the Best Way to Monitor Microsoft Office 365?
Written by Nick Cavalancia, Microsoft Cloud & Datacenter MVP
The need for visibility into service availability and delivery quality has led to the rise in interest in monitoring Microsoft’s Office 365 services from the user perspective. With two different approaches available, what value do they each bring?
As with all cloud-based services, your internal IT team responsible for ensuring users can access Office 365 services are at disadvantage; when there’s an issue with a user connecting to, say, Microsoft Teams, internal IT has very little intel to go on to determine the root cause of the issue – let alone steps to remediate the problem. Sure, Microsoft offers service status detail and can provide some degree of visibility via logging, but it’s simply not enough for IT teams to be timely and accurate in their response when a service quality issue arises.
Add to this the remote worker factor – where users working from home adds an additional layer of complexity to determining whether a Microsoft 365 service delivery issue is a Microsoft problem or just slow home WiFi.
Solving the Visibility Problem with the User
The challenge is internal IT is sitting on the outside looking in, only having the user on one end of the spectrum saying there’s a problem, and Microsoft on the other end providing less-than-granular status detail.
But there is one perspective that can provide IT with far more visibility into the source of a service delivery issue – the user. Because the user interacts with the client application, the network, authentication, and a variety of Office 365 services, there’s an opportunity there to gain actionable insight into what is and isn’t working – and what to do about it.
So, if internal IT can somehow look at the user’s experience with all the components involved in delivering Office 365, it is possible to regain much of the lost visibility.
Real User Monitoring and Synthetic Transactions
Two approaches have evolved over the last few years to provide organizations with visibility from the user’s experience with Office 365 services.
- Real User Monitoring (RUM) provides insight into how an individual user interacts with an Office 365 service via an agent on the user’s endpoint. This method focuses on the user’s interaction with specific services and the quality of service provided therein within the Microsoft 365 cloud.
- Synthetic Transaction monitoring (ST) uses robot agents installed on separate systems on a per location, connection, network, or geography basis to simulate Microsoft 365 user activity, continually testing Office 365 workloads to help identify drops in service quality, providing detail on scope, location, and service impact.
These two approaches, while both looking at service delivery from the user’s perspective, are very distinctive technologies that provide different visibility and value to an organization.
Pros: Intended to truly represent the experience of a given user.
Cons: If the user isn’t actively using Microsoft 365, there’s no data to help IT proactively know an issue exists. Doesn’t provide any context around why there is a service delivery issue.
Solid Use Case: When you want to monitor your CEO’s laptop and know whether they specifically are experiencing any issues.
Using a robot agent installed in a system representative of one or more users (think users in one location, using WiFi vs. wired, in a particular geo, etc.) synthetic transactions mimics real user interaction with Office 365 – including authentication, file upload/download, and use of individual Office 365 services and functions. Measured transaction performance provides organizations with visibility into measured experiential metrics that can be used to identify performance issues and root causes of problems. The value of synthetic transactions is threefold – first, it is monitoring service delivery down to specific functions like whether a Teams meeting can be scheduled or if an email can be sent. Second, it provides visibility into performance degradation for a broader group of users – often before it’s a real issue. And third, robots can be implemented to help isolate whether the problem is related to internal infrastructure, network routing, etc.
Pros: Robots are always running, so monitoring is continuous, allowing IT to proactively be made aware of issues before users are impacted. Monitoring is incredibly granular, allowing for actionable insight.
Cons: Can’t truly reflect a specific user; only their environment and connectivity.
Solid Use Case: Wanting to proactively monitor the organization’s use of Microsoft 365 with visibility into what’s not working, and what subset of the user base is impacted.
What’s Best for a Remote Workforce?
Because so many organizations have a material percentage of their employees working remotely, there’s a specific need to ensure these users – who rely heavily on Microsoft 365 as their virtual workspace – are productive and are having a quality experience with it. Both service quality monitoring approaches have a play here but offer differing value to you.
RUM helps the organization determine at a high level whether the individual is having a slower experience but provides no detail into why. This is because RUM isn’t aware of how the user is routing to the Microsoft 365 cloud – are they using a VPN and going through the corporate network? Does the corporate network have traffic scanning solutions that create latency? And is the corporate network infrastructure having any issues itself?
Synthetic transactions can be implemented to provide the missing detail around the user experience for those connecting through the corporate network (and do so proactively even when no users are connecting to Microsoft 365) but will only be able to indicate an issue for a particular user by inference, as long as the user in question is taking the path monitored.
In short, the synthetic transaction approach will provide more detail, but the RUM still provides value for IT to be aware of the individual user’s current state.
“Vs.” or “And”?
Both methods provide more visibility than achieved with native Microsoft services only. So, it is a case of one or the other? In short, no – both methods provide IT with valuable visibility and detail around whether users are having issues with Microsoft 365 or not. The choice of which to use really comes down to what your objectives are. If it’s more about monitoring a specific individual where service and function granularity isn’t needed, RUM may be a better choice. But if it’s more about the org as a whole and/or needing visibility into specific Office 365 functions, synthetic transactions are the right choice.
It also depends on the Microsoft service. Take the case of Teams, which technically utilizes a number of additional Microsoft services including OneDrive for Business and SharePoint to, in total, present itself as Teams. RUM (using either approach mentioned previously) would not provide any insight into which back-end services are having issues. An organization relying on RUM will optimally also need synthetic transaction data to provide context and color to highlight what parts of Teams are having issues.
The Teams example makes the case that using both approaches together provide even greater visibility and, assuming solution integrations exist where data can be shared, the combination would provide organizations with complete visibility into whether the organization – down to the specific user – is experiencing service quality issues and allow them to correlate the two sets of data to quickly determine what’s not working and who specifically is impacted.