Normalization

Normalization is a database design process. Its objective is to remove interdependent and redundant data in order to improve the overall quality of the information held in the database.

For instance, if John Smith, an employee of The University of Texas, has been paid every month for the past ten years, then a normalized database would only store his name once in the employee master table. A normalized database would not store his name 120 times on his 120 individual checks in the paycheck table. Why? Obviously, storing the string 'John Smith' 120 times will take up much more storage space than storing it just once. 119 copies of the name are redundant and unnecessary.

But, minimizing storage space isn't the only reason to eliminate the extra copies of the name. What happens if the names are stored as 'John Smith', 'John Smith', 'John Smith', 'John Smith', 'John Smith', 'John Smith', 'John Smith', 'John Smith', 'John Smith', 'Jonh Smith', 'John Smith', 'John Smith', 'John Smith', 'John Smith', 'John Smith', 'John Smith', 'John Smih', 'John Smith', 'John Smith', 'John Smith', 'John Smith', …, and 'John Smith'? Did you spot the typo? Actually, there were two typos. By eliminating the unnecessary copies of the name, we greatly reduce the chance of simple typos.

Normalization is a process that leads to improved data quality. That's the good news. The bad news is that normalization does not necessary lead to database simplification. A normalized database often has more tables and relationships than its non-normalized counterpart. A normalized database may not run as fast (for some tasks) as its non-normalized counterpart. Normalization is all about quality, not about speed.

More FAQs