Articles

What is Normalization in DBMS? Explained with Examples

Learn what normalization in DBMS is. Complete guide to 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF with examples and practical applications.

What is normalization in DBMS?

Normalization in DBMS is a systematic process of arranging data in a relational database to minimize redundancy and improve data integrity. It involves decomposing large tables into smaller, related tables and defining relationships among them using keys.

The primary goal of normalization in DBMS is to ensure that each piece of data is stored only once, thereby avoiding inconsistencies and duplication.

By following normalization rules, we can:

  • Eliminate redundant data
  • Ensure logical data storage
  • Enhance data integrity
  • Simplify database maintenance

Normalization is applied using a series of normal forms. Let’s explore them in the next section.

Related Course

Design Databases With PostgreSQL

Learn how to query SQL databases and design relational databases to efficiently store large quantities of data.Try it for free

Types of normal forms in DBMS

In DBMS, there are several levels of normalization, commonly referred to as normal forms. Each form addresses specific types of anomalies and redundancies. Here, we’ll touch on 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF to understand how they differ in their functionalities.

First Normal Form (1NF)

First Normal Form (1NF) focuses on ensuring that the values in each column of a table are atomic, meaning they cannot be further divided. A table is in 1NF if:

  • All attributes (columns) contain only single, indivisible values
  • Each record is unique and there are no repeating groups

Scenario:

Consider a table storing student details:

StudentID Name Phone Numbers
101 Robert 9876543210, 9123456780

This violates 1NF because the “Phone Numbers” column holds multiple values. To convert this into 1NF, we can split the phone numbers into two rows:

StudentID Name Phone Number
101 Robert 9876543210
101 Robert 9123456780

Why 1NF matters:

Applying 1NF removes nested or grouped data and lays the groundwork for further normalization. It’s the first step in structuring data for consistency and clarity.

Second Normal Form (2NF)

Second Normal Form (2NF) builds upon 1NF and addresses partial dependencies. A table is in 2NF if:

  • It is already in 1NF
  • Every non-prime attribute (an attribute that’s not part of a candidate key) is fully functionally dependent on the entire primary key

Scenario:

Suppose we have a table for student-course enrollment:

StudentID CourseID StudentName CourseName
101 CS101 Robert DBMS
102 CS102 Sam OS

Here, the composite primary key is (StudentID, CourseID). But “StudentName” depends only on “StudentID” and “CourseName” depends only on “CourseID”, which indicates partial dependency.

To convert this into 2NF, we can split the table into three parts:

  • Student table
  • Course table
  • Enrollment table

The student table will look like:

StudentID StudentName
101 Robert
102 Sam

The course table will look like:

CourseID CourseName
CS101 DBMS
CS102 OS

The enrollment table will look like:

StudentID CourseID
101 CS101
102 CS102

Why 2NF matters:

2NF ensures that data is placed in the appropriate tables and that every column is fully dependent on the whole key, not just part of it. This eliminates data duplication and makes updates more efficient.

Third Normal Form (3NF)

Third Normal Form (3NF) eliminates transitive dependencies. A table is in 3NF if:

  • It is already in 2NF
  • No non-prime attribute is dependent transitively on the primary key

In simpler terms, non-key attributes should not depend on other non-key attributes.

Scenario:

Consider a table storing employee details:

EmpID EmpName DeptID DeptName
1 John D01 HR
2 Emma D02 IT

Here, “DeptName” depends on “DeptID”, which in turn depends on “EmpID”. This is a transitive dependency.

To convert this into 3NF, we can split the table into two parts:

  • Employee table
  • Department table

The employee table will look like:

EmpID EmpName DeptID
1 John D01
2 Emma D02

The department table will look like:

DeptID DeptName
D01 HR
D02 IT

Why 3NF matters:

3NF promotes data integrity and reduces redundancy by ensuring that each non-key attribute is directly dependent on the primary key.

Boyce-Codd Normal Form (BCNF)

BCNF (Boyce-Codd Normal Form) is a higher version of 3NF. A table is in BCNF if:

  • It is in 3NF
  • Every functional dependency has a super key on the left-hand side

BCNF handles certain anomalies that 3NF cannot. If a table has overlapping candidate keys, it might violate BCNF even while being in 3NF.

Scenario:

Suppose we have a table for storing course details:

Professor Course Time
Smith DBMS 10AM
Smith DBMS 2PM
Johnson OS 11AM

Here, a professor can teach multiple courses, and each course is taught by one professor. However, some professors may teach the same course at different times.

The functional dependencies in this scenario include:

  • Professor → Course (Each professor teaches only one course)
  • Course → Professor (Each course is taught by one professor)

But in this table, neither “Professor” nor “Course” is a super key, so it violates BCNF.

To convert this into BCNF, we can split the table into two parts:

  • Professor-course table
  • Course-schedule table

The professor-course table will look like:

Professor Course
Robert DBMS
Sam OS

The course-schedule table will look like:

Course Time
DBMS 10AM
DBMS 2PM
OS 11AM

Why BCNF matters:

BCNF ensures stricter normalization by resolving complex dependencies, especially in tables with multiple candidate keys.

Fourth Normal Form (4NF)

A table is in the Fourth Normal Form (4NF) if:

  • It is in BCNF
  • It contains no multi-valued dependencies

Scenario:

Consider this table:

Teacher Subject Language
Robert Math English
Robert Math Spanish
Robert Physics English
Robert Physics Spanish

In this scenario, a teacher can teach multiple subjects and speak multiple languages.

This has multi-valued dependencies:

  • Teacher →→ Subject
  • Teacher →→ Language

These are independent facts stored in the same table, leading to unnecessary repetition.

To convert this into 4NF, we can split the table into two:

  • Teacher-subject table
  • Teacher-language table

The teacher-subject table will look like:

Teacher Subject
Robert Math
Robert Physics

The teacher-language table will look like:

Teacher Language
Robert English
Robert Spanish

Why 4NF matters:

4NF separates logically independent data, avoiding data explosion and maintaining clarity.

Fifth Normal Form (5NF)

Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJNF), ensures that a relation is broken down into smaller relations that can be joined back without any loss of information or introduction of invalid combinations, and that all join dependencies are implied by candidate keys.

Scenario:

A school tracks which students are learning which subjects from which teachers. However, each combination (Student, Subject, Teacher) is valid only as a complete triple — it’s not enough to just know who is learning what or who teaches what.

Student Subject Teacher
Alice Math Mr. A
Alice Science Ms. B
Bob Math Mr. A

We can’t decompose this into binary relations (e.g., Student-Subject, Student-Teacher, Subject-Teacher) without possibly recreating invalid combinations when we rejoin the data.

To convert this into 5NF, we can create three separate projections:

  1. Student-Subject
Student Subject
Alice Math
Alice Science
Bob Math
  1. Student-Teacher
Student Teacher
Alice Mr. A
Alice Ms. B
Bob Mr. A
  1. Subject-Teacher
Subject Teacher
Math Mr. A
Science Ms. B

Why 5NF matters:

5NF handles highly complex relationships and ensures complete reconstruction of data without redundancy.

By understanding these normal forms, we can build databases that are logically structured, scalable, and efficient. Applying these rules during the design phase can save time, reduce maintenance effort, and prevent anomalies during data operations.

With a good understanding of the normal forms, let’s now discuss why normalization is important in DBMS.

Why is normalization important?

As our applications grow, so does the complexity of our data. Without proper normalization, a database can become bloated, inconsistent, and difficult to maintain. Here’s why normalization in DBMS is essential:

  • Reduces data redundancy: Repetition of data across multiple rows and tables is minimized.
  • Improves data consistency: Since each data item is stored in one place, updates are more reliable.
  • Enhances query performance: Smaller, well-structured tables can be queried more efficiently.
  • Eases maintenance: With logical data separation, updates, deletions, and insertions become simpler.

Now that we know the “why”, let’s discover the scenarios where it is ideal to use normalization.

When to use normalization?

We typically apply normalization in DBMS during the design phase of database development. It’s especially useful when:

  • We’re dealing with large datasets prone to duplication
  • We require high data consistency and accuracy
  • We want scalable and maintainable database systems

However, in some cases—especially when performance is critical, and the schema is read-heavy—controlled denormalization (the opposite of normalization) may be appropriate.

Conclusion

In this article, we discussed what normalization in DBMS is, understood why it’s important, and discovered some scenarios where it is ideal to use normalization. We also learned about the different types of normal forms, including 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF.

Understanding and applying normalization correctly is essential for creating efficient and reliable relational databases. While 1NF, 2NF, and 3NF cover most practical needs, familiarity with higher forms ensures we are well-prepared for complex designs.

If you want to expand your knowledge of DBMS, check out the CompTIA IT Fundamentals: Database Concepts course on Codecademy.

Frequently asked questions

1. What is the difference between 2NF and 3NF?

  • 2NF removes partial dependencies, while 3NF removes transitive dependencies.
  • In 2NF, every non-key attribute must depend on the entire primary key. In 3NF, non-key attributes must depend on the primary key only and not on other non-key attributes.

2. What is the primary benefit of 3NF?

The primary benefit of 3NF is improved data integrity by removing transitive dependencies, which makes the database easier to maintain and reduces redundancy.

3. What are the limitations of 2NF?

2NF does not address transitive dependencies, which can still lead to redundant data and anomalies. That’s why moving to 3NF is often necessary.

4. What are the rules for normalization?

  • Remove repeating groups (1NF)
  • Eliminate partial dependencies (2NF)
  • Eliminate transitive dependencies (3NF)
  • Use candidate keys for all dependencies (BCNF)
  • Eliminate multi-valued and join dependencies (4NF, 5NF)

5. What’s the difference between 3NF and BCNF?

While both aim to eliminate redundancy, BCNF is stricter. A table in 3NF may still violate BCNF if non-prime attributes depend on overlapping candidate keys.

6. What are the 3 objectives of normalization?

  • Minimize redundancy
  • Improve data integrity
  • Facilitate efficient data management and updates
Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team