Database normalization is a process by which database and table structures are created or modified in order to address inefficiencies/complexities related to the following:
Repeating groups of columns in a database table can create inefficiencies and errors related to data storage, querying, and modification. For example, consider a songs
table with the following columns:
id
title
artist1_id
artist1_name
artist2_id
artist2_name
The repeating artist-related columns likely contain duplicated data. It would also be difficult to sort this table by artist.
In a relational database, columns that are not dependent on the primary key of a table can create inefficiencies related to data storage and modification, while also increasing the potential for future data errors. This is often because columns that are not dependent on the primary key contain duplicated information. For example, in a books
table with columns isbn
, title
, length
, author_id
, and author_name
, the author-related columns will contain duplicated data if the same author has written multiple books; moving author-related information to a separate table would solve this problem.
If the same information is stored in multiple locations in a database table, a database manager needs to be careful when updating the table. For example, in the database table shown here, each customer’s email address is stored in multiple rows. Therefore, in order to update a customer email, multiple fields will need to be changed. Normalizing the table gets rid of duplicated data and therefore makes data errors less likely.
|order_id |price |cust_id |cust_email || ----- | -----| -------| -------------- || 1 |20.43 | 1 |[email protected] || 2 |51.33 | 1 |[email protected] || 3 |80.01 | 2 |[email protected] || 4 |33.27 | 2 |[email protected] |
Problems can occur when updating a database table if new information needs to be inserted before the associated primary key is known. This can happen if columns are not dependent on the primary key. For example, consider a songs
table with the following columns:
song_id
(primary key)title
length
artist_id
artist_name
It would be impossible to add artist information to this table without also adding a value for song_id
, which could be problematic.
The efficiency of any database schema is dependent on how the database is going to be used. While normalization solves many problems related to data storage, modification, and querying, it can also make some things more difficult. For example, tables in a normalized database will need to be joined back together if a query relies on information in multiple tables. It is therefore not always beneficial to normalize every database table. Decisions about schema design should be made with future use in mind!