MongoDB Aggregation Pipelines: A Hands-on Tutorial

Codecademy Team
Learn how to analyze data using MongoDB aggregation pipelines.

Analyzing data and extracting metrics, requires several tasks such as filtering, sorting, grouping, etc. In MongoDB, we can perform these operations using aggregation pipelines.

Before we dive into the details, to understand different stages in MongoDB aggregation pipelines and to prepare the dataset, read this article on stages in MongoDB Aggregation pipelines .

Let’s discuss how to break complex analytical problems into smaller, manageable parts and how to utilize the different stages in MongoDB aggregation pipelines to gain insights from data. We’ll explore three analytical questions to illustrate how to work with MongoDB aggregation pipelines effectively.

What Are Aggregation Pipelines in MongoDB?

MongoDB aggregation pipelines consist of one or more stages that perform specific operations on documents. A stage in the MongoDB pipeline can filter data, group documents, sort them, or select specific fields from the documents.

  • We use the $match stage to filter the documents.
  • We use the $group stage to group the data by values in specific fields.
  • We use the $sort stage to sort the documents in the data by a particular field.
  • We use the $project stage to select specific fields from the documents.

We use the aggregate() function to combine all the stages together and execute the MongoDB aggregation pipeline.

Using Different Stages to Create Aggregation Pipelines in MongoDB

To solve complex problems, we create MongoDB aggregation pipelines with different stages like $match, $group, $sort, and others. We also use operators, such as $sum, $min, $max, and $avg.

To efficiently use aggregation pipelines to solve a problem, we follow this three-step process:

  1. Break the problem into small sub-problems like filtering, sorting, and selecting.
  2. Identify which MongoDB aggregation pipeline stage we can use to solve each sub-problem.
  3. Write the code to solve each sub-problem and combine them to build the pipeline to solve the original problem.

To understand the above steps, let’s consider the following examples:

  • Find the average salary of employees in the department ID 100 for the given employee data.
  • Find the average salary of the employees in each department of an organization and show the values in descending order of the average salary.
  • Show the name and salary of all the employees in department 100. Order the results by employee age in ascending order.

Using the $match, $group, and $project Stages Together

We will use the following steps to create a MongoDB aggregation pipeline for finding the average salary of employees in a department with id 100:

  1. First, we will use the $match stage to filter documents for department id 100.
  2. Next, we will use the $group stage to group all the documents with department id 100. For this, we will pass null to the _id key in the $group stage. Subsequently, we will use the $avg operator to calculate the average salary for the given department.
  3. Finally, we will use the $project stage to show the average salary value.

An example of this is as follows:

codecademy> db.employees.aggregate([{$match:{"dept_id":100}},
{$group:{_id:null, average_salary:{$avg:"$salary"}}},
{$project:{_id:0,average_salary:1}}])

Output:

[ { average_salary: 11750 } ]

Using the $group, $sort, and $project Stages Together

To calculate the average salary in each department of the organization and show the values in descending order of the average salary, we will build a MongoDB aggregation pipeline using the $group, $sort, and $project stages. For this, we use the following steps:

  1. First, we will use the $group stage and the $avg operator to find the average salary in each department.
  2. Next, we will use the $sort stage to sort the data by the average salary field.
  3. Finally, we use the $project stage to show the output.

You can observe this in the following example:

codecademy> db.employees.aggregate([
{$group: {_id: {dept_id:"$dept_id"}, average_salary:{$avg:"$salary"}}},
{$sort:{average_salary:-1}},
{$project:{_id:1, average_salary:1}}])

Output:

[
{ _id: { dept_id: 300 }, average_salary: 22500 },
{ _id: { dept_id: 200 }, average_salary: 15000 },
{ _id: { dept_id: 400 }, average_salary: 15000 },
{ _id: { dept_id: 100 }, average_salary: 11750 }
]

Using the $match, $sort, and $project Stages Together

We will use the $match, $sort, and $project stages to build a MongoDB aggregation pipeline showing the name and salary of all the employees in department 100, ordered by employee age. We do this like so::

  1. We use the $match stage to filter documents for the department ID 100.
  2. We use the $sort stage to sort the documents using the emp_age field.
  3. We will use the $project stage to show only the emp_name and salary fields.

You can see this in the following example:

codecademy> db.employees.aggregate([{$match:{"dept_id":100}},
{$sort:{emp_age:1}},
{$project:{_id:0,emp_name:1,salary:1}}])

Output:

[
{ emp_name: 'Katy', salary: 12000 },
{ emp_name: 'Aditya', salary: 12000 },
{ emp_name: 'Adam', salary: 13000 },
{ emp_name: 'Ankit', salary: 10000 }
]

Conclusion

Mastering MongoDB aggregation pipelines can help you solve analytical questions easily. Using the examples, we discussed how to break complex analytical problems into sub-problems and solve them using different stages in MongoDB aggregation pipelines.

To better understand the concepts, we suggest you frame some questions to calculate different metrics on any given dataset and create aggregation pipelines for it. This will help you understand how to break problems down, solve them using different stages, and create a complete aggregation pipeline in MongoDB.

To read more tutorials on topics like data science, cloud computing, artificial intelligence, etc., visit the Codecademy article hub.