Over the course of this lesson, we’ve discussed resiliency, a system’s ability to perform under problematic conditions. Common conditions we need our systems to be resilient against include:
- Internal failures, including hardware, software, and process breakdowns
- External failures, including APIs, external services, and third-party providers
- Malicious attacks, such as DDOS and SQL injection attacks
There are a variety of practices that allow us to measure how resilient our systems are. Some common practices include:
- Having backups for our infrastructure
- Limiting our crucial dependencies
- Validating client requests
In addition to using these methods, we also need to measure how resilient our systems are. We can use metrics such as the time needed to solve a problem or the amount of downtime our system experiences. Another option is to simulate issues using techniques such as:
- Penetration testing and load testing
- Chaos Engineering
- Disaster recovery exercises
Resiliency practices allow us to provide critical services even under adverse conditions. Building resiliency relies on DevOps cultural practices such as continuous experimentation and learning from failure. Let’s start building systems capable of weathering any storm!
Instructions
We’ve made it to the end of the lesson, review what we’ve learned before moving on!