Any time you’ve refreshed your browser and gotten through the dreaded 404 error page, you can probably thank a Site Reliability Engineer for fixing the problem.
Site Reliability Engineers are responsible for working to keep websites functioning properly, ensuring that users can access a site and that they have a good experience using it. As you can imagine, plenty of websites need site reliability engineering, making it a flexible career path that can take you to most any industry you’re interested in.
Read on to learn what site reliability engineering is, what skills you need for the job, and how you can get your career path started.
What is site reliability engineering?
The Site Reliability Engineer’s role was first defined by Google and is closely related to a DevOps Engineer role. Google assigned a group of engineers the responsibility of keeping their site efficient, scalable, and reliable. Soon, other larger tech companies adopted these practices; and eventually, it became a standard technical role industry-wide.
A big part of the role is auditing the current systems of an enterprise, finding the simple, mundane tasks, and automating them to reduce errors in manual processes. This also frees up their time so they can focus on improving the system for the better.
A Site Reliability Engineer is usually the first person to take a close look at any errors or outages that happen in a system and figure out the quickest and most appropriate way to fix the immediate problem. Afterward, they evaluate, plan, and do the work to prevent the issue from happening again.
It is also part of the job to set up logging, monitoring, and analytics to detect problems with a system before they affect the users of a website or system. Along with these preventative tasks, they will use chaos engineering and run load tests on a system to find the weak spots it might have before live traffic exposes them.
Building your skills for Site Reliability Engineer jobs
A Site Reliability Engineer is usually a technology generalist who knows how to write software and manage IT infrastructure and networking. They also tend to have personality traits like:
- Impatient with mundane, repeatable tasks, which gives them the motivation to automate them
- Detail-oriented, which allows them to find problems with a system
- Systems thinkers who can see the big picture and how each part of a system contributes to the whole
- Interested in learning new things because technology is constantly changing, and there are always new problems to tackle
- Okay with working behind the scenes, writing code no users will see or even know exists to keep a system reliable
Site Reliability Engineers also need a lot of training since they touch almost every part of a system. Here are some skills you will need as a Site Reliability Engineer:
- Coding: You will definitely need to know shell scripting. Along with that, it helps to know other back-end programming languages like Python, Java, C, or Go.
- Writing: You will have to document the work you do, so it helps to know how to clearly describe the systems you’ve created.
- Version control: To track your code, you need to know a version control system like Git.
- Automation: Knowing how to automate testing and software builds with tools like Jenkins or Travis CI is a bonus.
- Container systems: Many modern software systems are deployed with container systems like Docker, so knowing the basics is necessary.
- Database systems: Site Reliability Engineers deal with the whole system, including databases like PostgreSQL.
- Monitoring tools: You should know how to set up logs, monitoring tools, and analytics to determine when a system is in trouble.
Of course, you don’t need to know all of these things to become a Site Reliability Engineer, but you should master at least a few. By looking at job listings, you can compare what you know to what a specific employer has requested to determine the gaps in your skillset and the training you need.
Getting through the hiring process
Once you have the skills you need to be a Site Reliability Engineer, it's time to look for a job. The first steps are to create a resume and a portfolio to show your coding skills. If you don’t already have one, you should also create a LinkedIn profile, because it is one of the top tools used by recruiters. Once you have applied for a job or have been contacted by a recruiter, you will usually have to go through the following hiring process steps to get the job.
Phone screens are an initial step in determining if a candidate has the necessary skills and experience for a given role. If a recruiter contacts you, there may be two phone screens. If you’ve applied directly, there will likely only be one. A recruiter will give you a short screening call to determine that you have the skills needed for the job. Recruiters are usually not technically inclined, their goal is simply to confirm your information before they send you to the hiring manager. The hiring manager will then call you to ask you about your skills and determine if you will be a good fit with the team.
In-person or remote interview
Upon passing the phone screen(s), you will move on to one or more interviews that may last several hours each, over a period of days or weeks. It all depends on the job level you are applying for, the size of the company, and their specific hiring process.
In this interview or set of interviews, you will be asked to talk about your work history and the past projects and technologies you worked with previously. You will also be asked some behavioral questions to determine if you are a good fit for the company’s culture. For more information on what to expect in this step, check out our technical and behavioral interview tips.
Once you get through that interview and the company and you have determined the job is a good fit, next comes the interview that will put your skills to the test. In this interview, you will be asked technical questions about the specific technologies you will work with on the job.
Your interviewer will most likely give you one or more code challenges you will either solve in front of them or on an online testing platform on your own time. Some challenges will provide you with an interactive coding environment where you will execute and see the results of your code. Others will be whiteboard challenges where you write code without executing it. It's important to talk through your solution as you work on it if the interviewer is present, because it's often more important to them to see how you solve a problem than the fact that you can solve it. Check out our complete guide on acing the technical interview for more tips.
Get started on your career path
Now that you have a sense of what it takes to become a Site Reliability Engineer and how the hiring process generally works, you can start building the technical skills you need. Check out our course catalog to find the training to fill in any gaps in your skills. Some great courses to start with are: