How to Use APM to Hold Cloud Providers Accountable
Limited Time Offer!
For Less Than the Cost of a Starbucks Coffee, Access All DevOpsSchool Videos on YouTube Unlimitedly.
Master DevOps, SRE, DevSecOps Skills!
Source – devops.com
Here’s the challenge: Cloud providers’ service level agreements (SLAs) typically stop at the edge of their cloud, while your internal service level expectations run end to end. Measuring performance from the end user to the application and back, including device details, user productivity and page load times, is essential for customer satisfaction and continuous performance tuning. How long does it take to launch an online app, open a work order or complete a claim form? What are execution and completion rates in different locations? The metrics and associated metadata provided by application performance monitoring (APM) tools can help your DevOps teams rapidly identify actual performance issues, troubleshoot them quickly and efficiently and optimize cloud spend and service level agreements.
Automate Data Collection
Collecting performance details for multiple web browsers, operating systems and devices is impossible if you have to manually tag and instrument the code. Modern APM solutions automatically deploy JavaScript snippets, page tags or agents to your web pages, Java and .NET applications and mobile apps. Small binary agents and efficient compression algorithms ensure minimal performance impact to the end user device or the network as they send telemetry information to the APM back end. Your DevOps team can then work with the actual attributes of every device, including manufacturer, model, operating system type and version and resource consumption to monitor the end user experience across IaaS, PaaS, SaaS and hybrid environments.
Early Warnings with APM
To trigger early warnings, APM systems automatically determine an application baseline, or you can manually set specific thresholds. Typically, internal thresholds are set up to provide early warnings of possible issues before users are negatively impacted, while external thresholds can be linked to your actual SLA terms.
These SLA reports are more than colorful views. By using dashboards that color-code SLA status based on the end user experience, you can better enforce service levels with your cloud providers. For triage purposes, you can break out details by department, geography, operating system, device type, target server, carrier and other characteristics as needed. You can also see at a glance your compliance measurements, which show the percentage of response times that are meeting the agreed-upon targets with your cloud providers to hold them accountable for performance.
Streamlining Remediation
Automation is key to accelerating mean time to repair (MTTR). When an alert is triggered, APM tools integrate with your service desk systems to automatically open trouble tickets, identify probable causes and analyze the business impact for prioritization. This level of detail and integration significantly reduces delays that are historically associated with performance troubleshooting, as it provides the necessary context for troubleshooting. Another option with advanced APM solutions is the ability to automate remediation of common end user experience issues, without involvement of the service desk staff.
Comparing Before and After
Another useful outcome of these detailed metrics is the ability to compare parameters before and after a change. By baselining performance, APM tools help you document and validate the actual end user experience impact of code releases, device changes or service provider upgrades. It’s not uncommon for users to complain after a new release, but using APM reports you can take a data-driven approach to quickly investigate user complaints by comparing different time intervals. You can start by exploring various characteristics and isolate the affected group based on device health, network subnet, geography, server or cloud provider. Once you have determined whether the problem is within your control (code, architecture, etc.) or within your service providers’, you then can have more productive discussions with service providers to demonstrate exactly where and when an SLA violation occurred.
Keeping Track of SaaS Applications
As you expand use of SaaS applications such as Office 365, end user experience monitoring and SLA management can be used to hold these providers accountable. Slow application launches, logins, page loads and other performance issues can impact user productivity, adoption rates and deployment schedules, costing the organization support resources and unused license fees. That’s why SaaS-based APM tools give you immediate usage measurements, enabling you to quickly adjust license quantities and optimize costs. Real-time monitoring of user wait times, crashes, device health and other experience metrics gives your team comprehensive awareness of how performance, device type or operating system versions are affecting employee productivity. This information can be used to quickly identify problem areas, validate migration plans and prioritize future investments.
Monitoring for Accountability
Getting what you pay for from your cloud providers is much easier if you incorporate detailed user experience measurements into your SLAs. A best practice is to monitor everything that happens on every device across the organization such as how long it takes to boot up, login, open specified apps and perform business-critical tasks. If any of these are off target, your APM solution can open and prioritize a trouble ticket with a list of probable causes, before the affected users can call support. Automated end user experience monitoring, reporting, and policy management, backed by self-instrumenting devices and apps, are essential links between a service provider’s commitment and your actual results.