Originally posted on Savision.com
What is SCOM?
For the uninitiated we often get asked what SCOM is. For all the IT newbies out there, here is a description of “SCOM for Dummies”! System Center Operations Manager (SCOM for short) is Microsoft’s software solution for infrastructure and application monitoring.
It is actually a part of a suite of products contained within Microsoft System Center – this mix of technically independent point products is aimed at corporate IT operations1to manage a complex cloud based network of server and client desktop systems. The 6 main component products within System Center 2012 R2 (Microsoft’s latest version) are:
- Operations Manager (SCOM): Infrastructure and application monitoring
- Configuration Manager (SCCM): Configuration management, hardware/software asset management, patch deployment tools for Windows desktops
- Service Manager (SCSM): Asset tracking as well as incident, problem, change and configuration management (code name: Service Desk). Ties in with SCOM, SCCM
- Virtual Machine Manager (SCVMM): Virtual-machine management and datacenter virtualization
- Data Protection Manager (SCDPM): Continuous data protection and data recovery
- Azure Operational Insights (AOI)– Software-as-a-service offering that helps change or assess the configuration of Microsoft Servers software over the Internet (previously called Advisor or Opalis)
How does SCOM work?
System Center Operations Manager is a software product that passively and actively monitors an organisations IT environment, whether it be hardware or software based. It does this by placing agents2 on servers that continually check the performance of everything running on that physical server, whether it be the network or storage components or defined applications that are operating on the server. If the agents detect a performance degradation below predefined levels then the agent will send a signal in the form of an alert to the Management Server3 running SCOM to be actioned by the IT administrator.
The front end of SCOM that most people see is a list of SCOM alerts. SCOM is able to monitor thousands of components in real-time. The result of this continual monitoring is a list of SCOM alerts, which are basically small individual messages to a human (usually the IT Operations Admin) that something is not right in the IT environment. SCOM does not remedy the situation, it just tells the user that something might need looking at, and might need fixing. This is a standard view a SCOM user receives:
SCOM is a critical tool for most organisations that need to act upon IT problems and outages as soon as they occur. Any downtime can cost a company money, which means getting to the root-cause and fixing problems is of prime concern to all IT Operations teams. SCOM is a very technical tool to use and sometimes it is hard to find the most important alerts to focus on in a sea of thousands of alerts. This is a general problem known as “Alert Noise” which is a form of the 80:20 rule. In general over 80% of alerts generated are not critical and do not need immediate action. Whereas a smaller portion of alerts require a fast response. To overcome this problem an IT operations team will need some visualisation aids in form of dashboards. These dashboards can take performance and alert metrics from SCOM and bring attention to the ones having the largest impact.
1 IT Operations: The set of all processes and services that are both provisioned by an IT staff to their internal or external clients and used by themselves, to run themselves as a business.
2 Agents: A piece of software that sits on a physical server and collects performance information for tasks and workloads running on that server
3 Management Server: The main dedicated server that runs the SCOM software and collects all information sent to it from other servers and agents