An alert helps teams stay informed about activities occurring in a monitored system.
A Service Level Indicator (SLI) is a quantitative measure of a metric. For example, an SLI might indicate that the current loading time of a webpage is 152 milliseconds.
A Service Level Agreement (SLA) is a contract with consumers about expected levels of service.
SLIs (Service Level Indicators), SLOs (Service Level Objectives), and SLAs (Service Level Agreements) are used to tie system monitoring metrics to business goals and objectives.
A Service Level Objective (SLO) is a range of valid measurements for a metric. For example, an SLO might define that a webpage loads within 200ms of the user accessing it.
In DevOps, monitoring tools are use to provide continuous process of identifying, tracking, analyzing, and alerting on specific components of the system.
Improper monitoring can produce too many alerts which are not useful or actionable. These noisy alerts can cause staff to distrust alerts and ignore alerts that are actually useful.
Monitoring allows teams to watch and understand the state of their systems by gathering predefined metrics or logs.
When implementing monitoring, metrics should be chosen that reveal the health of the system, as well as issues with user experience.
Metrics should exist to measure the quality of the monitoring and observability of systems. These metrics might include the number of improper alerts, time for issue resolution, and the time taken for an issue to be identified.
Observability is the degree to which the metrics of a system can be acted upon to locate and fix a problem.