It’s been a hectic week for Karla. First, there was a power supply failure that took down the main server. Then, on Wednesday, Karla’s company lost support for the payment processing service they have been using for years. Today, the home page is receiving a suspiciously large amount of traffic that isn’t actually interacting with the site — it looks like it’s probably a cyber-attack. What’s going to be thrown at this company next?
Encountering problems is an intrinsic part of dealing with software systems. However, we can make significant gains in resiliency by designing our system to handle some of the most common issues it will face.
The common types of system problems fit into the following categories:
- Internal problems: these problems come from within the components of the system that we control. Internal problems include in-house hardware issues and software bugs.
- External problems: these problems arise from dependencies we have on other parties outside of our control. External problems might include issues with an API, or a cloud service our application relies on.
- Malicious actors: these problems stem from other people (or sometimes bots) that seek to disrupt or exploit our services for a variety of reasons.
Through managing these threats, we can make our systems more resilient!
Can you match the failures listed in the opening scenario to the types of problems described in this exercise? Here’s the scenario again (select “Answer” to see if you were right!):
- There was that power supply failure that took down the main server.
- The company lost support for the payment processing service they have been using for years.
- The home page started receiving a suspiciously large amount of traffic that isn’t interacting with the site.
The diagram depicts the three common types of problems we need to be resilient against.
Over the next several exercises we will go into a bit more detail on these types of problems, and introduce methods to minimize their impact.