MongoDB Aggregation Pipelines: A Hands-on Tutorial
Analyzing data and extracting metrics requires several tasks such as filtering, sorting, grouping, etc. In MongoDB, we can perform these operations using aggregation pipelines.
Before we dive into the details, to understand different stages in MongoDB aggregation pipelines and to prepare the dataset, read this article on stages in MongoDB Aggregation pipelines.
Let’s discuss how to break complex analytical problems into smaller, manageable parts and how to utilize the different stages in MongoDB aggregation pipelines to gain insights from data. We’ll explore three analytical questions to illustrate how to work with MongoDB aggregation pipelines effectively.
What Are Aggregation Pipelines in MongoDB?
MongoDB aggregation pipelines consist of one or more stages that perform specific operations on documents. A stage in the MongoDB pipeline can filter data, group documents, sort them, or select specific fields from the documents.
- We use the
$matchstage to filter the documents. - We use the
$groupstage to group the data by values in specific fields. - We use the
$sortstage to sort the documents in the data by a particular field. - We use the
$projectstage to select specific fields from the documents.
We use the aggregate() function to combine all the stages together and execute the MongoDB aggregation pipeline.
Using Different Stages to Create Aggregation Pipelines in MongoDB
To solve complex problems, we create MongoDB aggregation pipelines with different stages like $match, $group, $sort, and others. We also use operators, such as $sum, $min, $max, and $avg.
To efficiently use aggregation pipelines to solve a problem, we follow this three-step process:
- Break the problem into small sub-problems like filtering, sorting, and selecting.
- Identify which MongoDB aggregation pipeline stage we can use to solve each sub-problem.
- Write the code to solve each sub-problem and combine them to build the pipeline to solve the original problem.
To understand the above steps, let’s consider the following examples:
- Find the average salary of employees in the department ID 100 for the given employee data.
- Find the average salary of the employees in each department of an organization and show the values in descending order of the average salary.
- Show the name and salary of all the employees in department 100. Order the results by employee age in ascending order.
Using the $match, $group, and $project Stages Together
We will use the following steps to create a MongoDB aggregation pipeline for finding the average salary of employees in a department with id 100:
- First, we will use the
$matchstage to filter documents for departmentid100. - Next, we will use the
$groupstage to group all the documents with departmentid100. For this, we will passnullto the_idkey in the$groupstage. Subsequently, we will use the$avgoperator to calculate the average salary for the given department. - Finally, we will use the
$projectstage to show the average salary value.
An example of this is as follows:
codecademy> db.employees.aggregate([{$match:{"dept_id":100}},{$group:{_id:null, average_salary:{$avg:"$salary"}}},{$project:{_id:0,average_salary:1}}])
Output:
[ { average_salary: 11750 } ]
Using the $group, $sort, and $project Stages Together
To calculate the average salary in each department of the organization and show the values in descending order of the average salary, we will build a MongoDB aggregation pipeline using the $group, $sort, and $project stages. For this, we use the following steps:
- First, we will use the
$groupstage and the$avgoperator to find the average salary in each department. - Next, we will use the
$sortstage to sort the data by the average salary field. - Finally, we use the
$projectstage to show the output.
You can observe this in the following example:
codecademy> db.employees.aggregate([{$group: {_id: {dept_id:"$dept_id"}, average_salary:{$avg:"$salary"}}},{$sort:{average_salary:-1}},{$project:{_id:1, average_salary:1}}])
Output:
[{ _id: { dept_id: 300 }, average_salary: 22500 },{ _id: { dept_id: 200 }, average_salary: 15000 },{ _id: { dept_id: 400 }, average_salary: 15000 },{ _id: { dept_id: 100 }, average_salary: 11750 }]
Using the $match, $sort, and $project Stages Together
We will use the $match, $sort, and $project stages to build a MongoDB aggregation pipeline showing the name and salary of all the employees in department 100, ordered by employee age. We do this like so::
- We use the
$matchstage to filter documents for the department ID 100. - We use the
$sortstage to sort the documents using theemp_agefield. - We will use the
$projectstage to show only theemp_nameandsalaryfields.
You can see this in the following example:
codecademy> db.employees.aggregate([{$match:{"dept_id":100}},{$sort:{emp_age:1}},{$project:{_id:0,emp_name:1,salary:1}}])
Output:
[{ emp_name: 'Katy', salary: 12000 },{ emp_name: 'Aditya', salary: 12000 },{ emp_name: 'Adam', salary: 13000 },{ emp_name: 'Ankit', salary: 10000 }]
Conclusion
Mastering MongoDB aggregation pipelines can help you solve analytical questions easily. Using the examples, we discussed how to break complex analytical problems into sub-problems and solve them using different stages in MongoDB aggregation pipelines.
To better understand the concepts, we suggest you frame some questions to calculate different metrics on any given dataset and create aggregation pipelines for it. This will help you understand how to break problems down, solve them using different stages, and create a complete aggregation pipeline in MongoDB.
To read more tutorials on topics like data science, cloud computing, artificial intelligence, etc., visit the Codecademy article hub.
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
MongoDB Aggregation Stages Explained
Learn MongoDB Aggregation Stages - Article
MongoDB Data
Learn about how data is stored in MongoDB. - Article
Difference Between WHERE and HAVING Clause in SQL
Learn the key differences between `WHERE` and `HAVING` clauses in SQL. Understand when to filter before vs after aggregation.
Learn more on Codecademy
- Learn how to work with NoSQL databases and run basic CRUD operations in this introduction to MongoDB.
- Beginner Friendly.5 hours
- Learn how to build machine pipelines that automate your workflow and keep everything consistent.
- Intermediate.1 hour
- A data engineer builds the pipelines to connect data input to analysis.
- Includes 17 Courses
- With Certificate
- Beginner Friendly.90 hours