De-Normalization

The objective of normalization is to improve data quality by eliminating interdependent and redundant data.

For instance, Mary Jones, a top notch business student at The University of Texas, has completed 21 courses. All but two of her grades are As. Her GPA is 3.90. That's great.

Obviously, a student's GPA depends on their courses and grades. The GPA should not be stored in a normalized database. It violates the third rule of normalization. Instead, the GPA should be computed whenever a user wants to know it. Since there are 50,000 students at UT and well over a million records in the grades table, the computation of Mary Jones' GPA may be slow. Very slow. The math isn't hard, but finding her 21 records in the table of a million records may take some time. (Of course, a good secondary index would fix this needle in a haystack search.)

Normalization is all about the quality of the answers to our questions, not about the time it takes to get an answer. De-normalization is the process of intentionally violating some of the rules of normalization in order to improve the speed of our queries.

For instance, we may decide to store each student's GPA in the table of 50,000 student records. That violates the third rule of normalization. But, now it is very fast to look up Mary Jones' outstanding GPA. We don't need to access the million grade records; we just need to find her record in the student table. Unfortunately, by storing her pre-computed GPA in the student table, we run the risk of getting the wrong answer. What happens if she makes five more As this semester… but her GPA isn't updated? We still think she's just a 3.90 student, but she's really a 3.92 student. Big deal. Small error. What happens if she makes five Fs instead of five As? We think she's a 3.90 student, but she's really fallen all the way down to 3.15. That's a big error.

So, what is more important: accuracy or speed? Hard choice.

More FAQs