What is Normalization in DBMS? Explained with Examples
What is normalization in DBMS?
Normalization in DBMS is a systematic process of arranging data in a relational database to minimize redundancy and improve data integrity. It involves decomposing large tables into smaller, related tables and defining relationships among them using keys.
The primary goal of normalization in DBMS is to ensure that each piece of data is stored only once, thereby avoiding inconsistencies and duplication.
By following normalization rules, we can:
- Eliminate redundant data
- Ensure logical data storage
- Enhance data integrity
- Simplify database maintenance
Normalization is applied using a series of normal forms. Let’s explore them in the next section.
Design Databases With PostgreSQL
Learn how to query SQL databases and design relational databases to efficiently store large quantities of data.Try it for freeTypes of normal forms in DBMS
In DBMS, there are several levels of normalization, commonly referred to as normal forms. Each form addresses specific types of anomalies and redundancies. Here, we’ll touch on 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF to understand how they differ in their functionalities.
First Normal Form (1NF)
First Normal Form (1NF) focuses on ensuring that the values in each column of a table are atomic, meaning they cannot be further divided. A table is in 1NF if:
- All attributes (columns) contain only single, indivisible values
- Each record is unique and there are no repeating groups
Scenario:
Consider a table storing student details:
StudentID | Name | Phone Numbers |
---|---|---|
101 | Robert | 9876543210, 9123456780 |
This violates 1NF because the “Phone Numbers” column holds multiple values. To convert this into 1NF, we can split the phone numbers into two rows:
StudentID | Name | Phone Number |
---|---|---|
101 | Robert | 9876543210 |
101 | Robert | 9123456780 |
Why 1NF matters:
Applying 1NF removes nested or grouped data and lays the groundwork for further normalization. It’s the first step in structuring data for consistency and clarity.
Second Normal Form (2NF)
Second Normal Form (2NF) builds upon 1NF and addresses partial dependencies. A table is in 2NF if:
- It is already in 1NF
- Every non-prime attribute (an attribute that’s not part of a candidate key) is fully functionally dependent on the entire primary key
Scenario:
Suppose we have a table for student-course enrollment:
StudentID | CourseID | StudentName | CourseName |
---|---|---|---|
101 | CS101 | Robert | DBMS |
102 | CS102 | Sam | OS |
Here, the composite primary key is (StudentID, CourseID). But “StudentName” depends only on “StudentID” and “CourseName” depends only on “CourseID”, which indicates partial dependency.
To convert this into 2NF, we can split the table into three parts:
- Student table
- Course table
- Enrollment table
The student table will look like:
StudentID | StudentName |
---|---|
101 | Robert |
102 | Sam |
The course table will look like:
CourseID | CourseName |
---|---|
CS101 | DBMS |
CS102 | OS |
The enrollment table will look like:
StudentID | CourseID |
---|---|
101 | CS101 |
102 | CS102 |
Why 2NF matters:
2NF ensures that data is placed in the appropriate tables and that every column is fully dependent on the whole key, not just part of it. This eliminates data duplication and makes updates more efficient.
Third Normal Form (3NF)
Third Normal Form (3NF) eliminates transitive dependencies. A table is in 3NF if:
- It is already in 2NF
- No non-prime attribute is dependent transitively on the primary key
In simpler terms, non-key attributes should not depend on other non-key attributes.
Scenario:
Consider a table storing employee details:
EmpID | EmpName | DeptID | DeptName |
---|---|---|---|
1 | John | D01 | HR |
2 | Emma | D02 | IT |
Here, “DeptName” depends on “DeptID”, which in turn depends on “EmpID”. This is a transitive dependency.
To convert this into 3NF, we can split the table into two parts:
- Employee table
- Department table
The employee table will look like:
EmpID | EmpName | DeptID |
---|---|---|
1 | John | D01 |
2 | Emma | D02 |
The department table will look like:
DeptID | DeptName |
---|---|
D01 | HR |
D02 | IT |
Why 3NF matters:
3NF promotes data integrity and reduces redundancy by ensuring that each non-key attribute is directly dependent on the primary key.
Boyce-Codd Normal Form (BCNF)
BCNF (Boyce-Codd Normal Form) is a higher version of 3NF. A table is in BCNF if:
- It is in 3NF
- Every functional dependency has a super key on the left-hand side
BCNF handles certain anomalies that 3NF cannot. If a table has overlapping candidate keys, it might violate BCNF even while being in 3NF.
Scenario:
Suppose we have a table for storing course details:
Professor | Course | Time |
---|---|---|
Smith | DBMS | 10AM |
Smith | DBMS | 2PM |
Johnson | OS | 11AM |
Here, a professor can teach multiple courses, and each course is taught by one professor. However, some professors may teach the same course at different times.
The functional dependencies in this scenario include:
- Professor → Course (Each professor teaches only one course)
- Course → Professor (Each course is taught by one professor)
But in this table, neither “Professor” nor “Course” is a super key, so it violates BCNF.
To convert this into BCNF, we can split the table into two parts:
- Professor-course table
- Course-schedule table
The professor-course table will look like:
Professor | Course |
---|---|
Robert | DBMS |
Sam | OS |
The course-schedule table will look like:
Course | Time |
---|---|
DBMS | 10AM |
DBMS | 2PM |
OS | 11AM |
Why BCNF matters:
BCNF ensures stricter normalization by resolving complex dependencies, especially in tables with multiple candidate keys.
Fourth Normal Form (4NF)
A table is in the Fourth Normal Form (4NF) if:
- It is in BCNF
- It contains no multi-valued dependencies
Scenario:
Consider this table:
Teacher | Subject | Language |
---|---|---|
Robert | Math | English |
Robert | Math | Spanish |
Robert | Physics | English |
Robert | Physics | Spanish |
In this scenario, a teacher can teach multiple subjects and speak multiple languages.
This has multi-valued dependencies:
- Teacher →→ Subject
- Teacher →→ Language
These are independent facts stored in the same table, leading to unnecessary repetition.
To convert this into 4NF, we can split the table into two:
- Teacher-subject table
- Teacher-language table
The teacher-subject table will look like:
Teacher | Subject |
---|---|
Robert | Math |
Robert | Physics |
The teacher-language table will look like:
Teacher | Language |
---|---|
Robert | English |
Robert | Spanish |
Why 4NF matters:
4NF separates logically independent data, avoiding data explosion and maintaining clarity.
Fifth Normal Form (5NF)
Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJNF), ensures that a relation is broken down into smaller relations that can be joined back without any loss of information or introduction of invalid combinations, and that all join dependencies are implied by candidate keys.
Scenario:
A school tracks which students are learning which subjects from which teachers. However, each combination (Student, Subject, Teacher) is valid only as a complete triple — it’s not enough to just know who is learning what or who teaches what.
Student | Subject | Teacher |
---|---|---|
Alice | Math | Mr. A |
Alice | Science | Ms. B |
Bob | Math | Mr. A |
We can’t decompose this into binary relations (e.g., Student-Subject, Student-Teacher, Subject-Teacher) without possibly recreating invalid combinations when we rejoin the data.
To convert this into 5NF, we can create three separate projections:
- Student-Subject
Student | Subject |
---|---|
Alice | Math |
Alice | Science |
Bob | Math |
- Student-Teacher
Student | Teacher |
---|---|
Alice | Mr. A |
Alice | Ms. B |
Bob | Mr. A |
- Subject-Teacher
Subject | Teacher |
---|---|
Math | Mr. A |
Science | Ms. B |
Why 5NF matters:
5NF handles highly complex relationships and ensures complete reconstruction of data without redundancy.
By understanding these normal forms, we can build databases that are logically structured, scalable, and efficient. Applying these rules during the design phase can save time, reduce maintenance effort, and prevent anomalies during data operations.
With a good understanding of the normal forms, let’s now discuss why normalization is important in DBMS.
Why is normalization important?
As our applications grow, so does the complexity of our data. Without proper normalization, a database can become bloated, inconsistent, and difficult to maintain. Here’s why normalization in DBMS is essential:
- Reduces data redundancy: Repetition of data across multiple rows and tables is minimized.
- Improves data consistency: Since each data item is stored in one place, updates are more reliable.
- Enhances query performance: Smaller, well-structured tables can be queried more efficiently.
- Eases maintenance: With logical data separation, updates, deletions, and insertions become simpler.
Now that we know the “why”, let’s discover the scenarios where it is ideal to use normalization.
When to use normalization?
We typically apply normalization in DBMS during the design phase of database development. It’s especially useful when:
- We’re dealing with large datasets prone to duplication
- We require high data consistency and accuracy
- We want scalable and maintainable database systems
However, in some cases—especially when performance is critical, and the schema is read-heavy—controlled denormalization (the opposite of normalization) may be appropriate.
Conclusion
In this article, we discussed what normalization in DBMS is, understood why it’s important, and discovered some scenarios where it is ideal to use normalization. We also learned about the different types of normal forms, including 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF.
Understanding and applying normalization correctly is essential for creating efficient and reliable relational databases. While 1NF, 2NF, and 3NF cover most practical needs, familiarity with higher forms ensures we are well-prepared for complex designs.
If you want to expand your knowledge of DBMS, check out the CompTIA IT Fundamentals: Database Concepts course on Codecademy.
Frequently asked questions
1. What is the difference between 2NF and 3NF?
- 2NF removes partial dependencies, while 3NF removes transitive dependencies.
- In 2NF, every non-key attribute must depend on the entire primary key. In 3NF, non-key attributes must depend on the primary key only and not on other non-key attributes.
2. What is the primary benefit of 3NF?
The primary benefit of 3NF is improved data integrity by removing transitive dependencies, which makes the database easier to maintain and reduces redundancy.
3. What are the limitations of 2NF?
2NF does not address transitive dependencies, which can still lead to redundant data and anomalies. That’s why moving to 3NF is often necessary.
4. What are the rules for normalization?
- Remove repeating groups (1NF)
- Eliminate partial dependencies (2NF)
- Eliminate transitive dependencies (3NF)
- Use candidate keys for all dependencies (BCNF)
- Eliminate multi-valued and join dependencies (4NF, 5NF)
5. What’s the difference between 3NF and BCNF?
While both aim to eliminate redundancy, BCNF is stricter. A table in 3NF may still violate BCNF if non-prime attributes depend on overlapping candidate keys.
6. What are the 3 objectives of normalization?
- Minimize redundancy
- Improve data integrity
- Facilitate efficient data management and updates
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
What is First Normal Form (1NF) in DBMS? Explained with Examples
Learn about the first normal form (1NF) in DBMS, its four key rules, and a step-by-step process for converting a table into 1st normal form for better database design. - Article
Common SQL Interview Questions
Practice with some common SQL interview questions. - Article
What is a Database? Complete Guide
Learn what databases are, explore different types like SQL and NoSQL, and understand key components. Complete guide for beginners.
Learn more on Codecademy
- Skill path
Design Databases With PostgreSQL
Learn how to query SQL databases and design relational databases to efficiently store large quantities of data.Includes 5 CoursesWith CertificateBeginner Friendly13 hours - Free course
Intro to SQL
Use SQL to create, access, and update tables of data in a relational database.Beginner Friendly2 hours - Free course
Learn SQL
In this SQL course, you'll learn how to manage large datasets and analyze real data using the standard data management language.Beginner Friendly5 hours