Caleb received a Slack alert about an issue with one of the company’s backend applications. How does Caleb and his team track down the error in a complex maze of backend services?
Let’s take a look at the general steps Caleb’s team might go through when an issue arises:
1) Evaluate usage and performance data
2) Identify the cause of the issue.
3) Apply the appropriate solution, restoring system performance.
Observability is the degree to which a system’s information can be used to locate and fix a problem. In a system with high observability, a team can more easily trace, diagnose, and fix the problem. With poor observability, the data does little to help.
To improve a system’s observability:
- Make sure team is aligned with service level objectives
- Create meaningful alerts
- Optimize application logging by ensuring messages are informational and descriptive
- Automate work processes
Maintaining an observable system enables teams to proactively monitor and track for errors.
Instructions
What might be the effects of a system with very low observability?
See the answer!
We’ve learned about the important roles monitoring and observability play in our system. But how do we know we are doing a good job? Next, we will discuss how to measure the quality of our system’s monitoring.