Codecademy Logo

Normalizing A Database

Database Normalization

Database normalization is a process by which database and table structures are created or modified in order to address inefficiencies/complexities related to the following:

  • data storage
  • data modification
  • querying database tables

Repeating Column Groups

Repeating groups of columns in a database table can create inefficiencies and errors related to data storage, querying, and modification. For example, consider a songs table with the following columns:

  • id
  • title
  • artist1_id
  • artist1_name
  • artist2_id
  • artist2_name

The repeating artist-related columns likely contain duplicated data. It would also be difficult to sort this table by artist.

Independent Columns

In a relational database, columns that are not dependent on the primary key of a table can create inefficiencies related to data storage and modification, while also increasing the potential for future data errors. This is often because columns that are not dependent on the primary key contain duplicated information. For example, in a books table with columns isbn, title, length, author_id, and author_name, the author-related columns will contain duplicated data if the same author has written multiple books; moving author-related information to a separate table would solve this problem.

Updating Duplicated Data

If the same information is stored in multiple locations in a database table, a database manager needs to be careful when updating the table. For example, in the database table shown here, each customer’s email address is stored in multiple rows. Therefore, in order to update a customer email, multiple fields will need to be changed. Normalizing the table gets rid of duplicated data and therefore makes data errors less likely.

|order_id |price |cust_id |cust_email |
| ----- | -----| -------| -------------- |
| 1 |20.43 | 1 |[email protected] |
| 2 |51.33 | 1 |[email protected] |
| 3 |80.01 | 2 |[email protected] |
| 4 |33.27 | 2 |[email protected] |

Inserting Data Without a Primary Key

Problems can occur when updating a database table if new information needs to be inserted before the associated primary key is known. This can happen if columns are not dependent on the primary key. For example, consider a songs table with the following columns:

  • song_id (primary key)
  • title
  • length
  • artist_id
  • artist_name

It would be impossible to add artist information to this table without also adding a value for song_id, which could be problematic.

Database Efficiency and Use

The efficiency of any database schema is dependent on how the database is going to be used. While normalization solves many problems related to data storage, modification, and querying, it can also make some things more difficult. For example, tables in a normalized database will need to be joined back together if a query relies on information in multiple tables. It is therefore not always beneficial to normalize every database table. Decisions about schema design should be made with future use in mind!

Learn more on Codecademy