3.1 Data Redundancy
Redundancy Indicates the data is repeated. It increases the time for updating, adding and deleting data. The disk space and disk I/O operations are increased due to the redundancy.
Consider the following table STUDENT.
STUDID | STUDENT NAME | STUDENT CITY | STUDENT CLASS | STUDENT TEST SCORE |
101 | Mary | Wales | MS | 50 |
101 | Mary | Wales | MS | 89 |
102 | Mark | Chicago | MBA | 60 |
102 | Mark | Chicago | MBA | 83 |
103 | Peter | Wales | BE | 70 |
103 | Peter | Wales | BE | 58 |
104 | Joe | Perth | Literature | 65 |
104 | Joe | Perth | Literature | 90 |
105 | Harry | New York | Science | 44 |
105 | Harry | New York | Science | 50 |
In the above table, the details of the students like STUDID, STUDENTNAME, and STUDENTADDRESS are repeated while adding the marks for each semester. If the user needs to modify the address for a student, it has to be modified in multiple rows for that student. If it is not done it could lead to data inconsistency across rows.
Redundancy can cause the following issues:
1) Insertion, deletion and modification of data that can cause inconsistencies.
2) Errors can occur when the data is repeated.
3) Unnecessary utilization of the extra disk space.
The concept of normalization can be used to reduce redundancy.
3.2 Normalization Types
Normalization is a process of breaking the complex table structures into simpler table structures by using rules. Using the approach user can reduce the redundancy in a table and eliminate the problem of inconsistency. There is no loss of information.
Normalization has the following benefits:
1) It helps in maintaining data integrity
2) It helps in simplifying the structure of the table and thus makes the database more compact
3) It helps in reducing the null values which reduces the complexity if the data operations
With normalization, tables are formed that follow certain specified rules and represent certain normal forms. The forms are used to check that no error or inconsistency in data is introduced in the database. A table structure is always in certain normal form. The widely used normal forms are as mentioned below:
1) First Normal Form (1NF)
2) Second Normal Form (2NF)
3) Third Normal Form (3NF)
4) Boyce Codd Normal Form (BCNF)
1) First Normal Form (1NF)
A table is said to be in 1NF when each cell of the table contains only one value.
The rules for converting a table to 1NF are as follows:
1) Place the related data values in a table. Define similar data values with the column name.
2) There should be no repeating group in the table
3) Every table must have a unique primary key
Consider the PROJECT table.
EMPLID | DEPT | DEPTHEAD | PROJECTCODE | Time |
E001 | Sales | E111 | P40 P70 P50 | 200 181 90 |
E281 | Marketing | E122 | P51 P35 | 190 50 |
E351 | HR | E133 | P67 P90 | NULL 30 |
In the above table, the values are related and similar values are defined by common attribute. The table has a primary key as EMPLID. But the columns PROJECTCODE and TIME have multiple values and hence it is not in the normal form.
By applying the 1NF to the table, the new PROJECT table is mentioned below:
EMPLID | DEPT | DEPTHEAD | PROJCODE | HOURS |
E001 | Sales | E111 | P40 | 200 |
E001 | Sales | E111 | P70 | 181 |
E001 | Sales | E111 | P50 | 90 |
E281 | Marketing | E122 | P51 | 190 |
E281 | Marketing | E122 | P35 | 50 |
E351 | HR | E133 | P67 | NULL |
E351 | HR | E133 | P90 | 20 |
2) Second Normal Form (2NF)
A table is said to be in 2NF when
1) It is in 1NF
2) No partial dependency exists between the non key and key attributes
The rules for converting a table into 2NF are as follows:
1) Remove the attributes that are functionally dependent on only a part of the key and not on the whole key. Place them in a different table.
2) Group the remaining attributes.
Consider the STUD_SUB_DETAILS table as shown below:
STUD_ID | STUD_FIRST_NAME | STUD_LAST_NAME | SUB_CODE | SUB_NAME |
ST32 | Mark | Steven | SC345 | Java |
ST34 | Peter | Hanks | SC534 | HTML |
ST65 | Joey | Carol | SC345 | Java |
ST44 | Lee | Jones | SC564 | JavaScript |
ST34 | Peter | Hanks | SC564 | JavaScript |
ST66 | Tom | Sean | SC655 | XML |
Consider the following points about the table.
1) The primary key of the preceding table is STUD_ID + SUB_CODE.
2) There are repeated values in the column STUD_FIRST_NAME, STUD_LAST_NAME and SUB_NAME. These attributes depend only on the part of the primary key.
3) The values in the column STUD_FIRST_NAME and STUD_LAST_NAME are dependent on STUD_ID.
4) The values in the SUB_NAME are only dependent on SUB_CODE and all non primary fields are not functionally dependent on the whole primary key.
Hence, to normalize the table in 2NF, user needs to break the table into smaller tables.
STUD_DETAILS
STUD_ID | STUD_FIRST_NAME | STUD_LAST_NAME |
ST32 | Mark | Steven |
ST34 | Peter | Hanks |
ST65 | Joey | Carol |
ST44 | Lee | Jones |
ST66 | Tom | Sean |
SUBJECT_DETAILS
SUB_CODE | SUB_NAME |
SC345 | Java |
SC534 | HTML |
SC564 | JavaScript |
SC655 | XML |
STUD_SUBJECT_DETAILS
STUD_ID | SUB_CODE |
ST32 | SC345 |
ST34 | SC534 |
ST65 | SC345 |
ST44 | SC564 |
ST34 | SC564 |
ST66 | SC655 |
3) Third Normal Form (3NF)
A relation is said to be in third normal form if and only if:
1) It is in 2NF
2) No transitive dependency exists between the non key attributes and key attributes.
The guidelines to convert the table into 3NF are as follows:
1) Find and remove all the non key attributes that are functionally dependent on attributes that are not in the primary key. Place them in different table.
2) Group the remaining attributes
Consider the following EMPLOYEE table.
EMPLID | DEPTNAME | DEPTHEAD |
E801 | Sales | E880 |
E231 | Marketing | E909 |
E350 | Production | E910 |
E481 | HR | E266 |
E550 | Testing | E714 |
E271 | Accounts | E945 |
In the above table, there is a single value in each cell. Hence it is in 1NF.
The primary key in the table is EMPLID. For each value of EMPLID there is only one value of DEPTNAME. Hence, DEPTNAME is functionally dependent on the primary key EMPLID. For each value of EMPLID there is one DEPTHEAD. Hence all the attributes are functionally dependent on the whole key, EMPLID. Hence it is in 2NF.
To convert the table into 3NF, user must remove the DEPTHEAD column from the table as it is not functionally dependent on the primary key, EMPLID.
The following table shows the 3NF form.
EMPLOYEE
EMPLID | DEPTNMAE |
E801 | Sales |
E231 | Marketing |
E350 | Production |
E481 | HR |
E550 | Testing |
E271 | Accounts |
DEPARTMENT
DEPTNAME | DEPTHEAD |
Sales | E880 |
Marketing | E909 |
Production | E910 |
HR | E266 |
Testing | E714 |
Accounts | E945 |
4) Boyce – Codd Normal Form (BCNF)
A relation is said to be in BCNF if and only if every determinant is a candidate key.
The guidelines for converting a table into BCNF are as follows:
1) Find and remove the overlapping candidate keys. Place the part of the candidate key and the attribute, it is functionally dependent on, in a different table.
2) Group the remaining items in a table.
Consider the following PROJECT table.
EMPLID | EMPLNAME | PROJCTCODE | TIME |
E201 | Nick | P272 | 101 |
E201 | Harry | P111 | 80 |
E302 | Sam | P134 | 150 |
E401 | Adam | P169 | 15 |
E503 | Nisha | P170 | 25 |
E909 | Perry | P180 | 60 |
The above table has redundancy. The PROJECT table has the following problems.
1) Multiple candidate keys that are ECODE+PROJCODE and NAME+PROJCODE.
2) Composite candidate keys
3) Candidate keys that overlap since the attribute PROJCODE is common between the two candidate keys.
The tables in BCNF are as shown below:
EMPLID | EMPNAME |
E201 | Nick |
E201 | Harry |
E302 | Sam |
E401 | Adam |
E503 | Nisha |
E909 | Perry |
PROJECT
EMPLID | PROJCTCODE | TIME |
E201 | P272 | 101 |
E201 | P111 | 80 |
E302 | P134 | 150 |
E401 | P169 | 15 |
E503 | P170 | 25 |
E909 | P180 | 60 |
3.3 Understanding denormalization
The intentional introduction of redundancy in a table in order to improve performance is called denormalization.
Consider the following tables as ORDERS and PRODUCTS.
ORDERS
ORDERNO | PRODUCTID | QTY |
101 | P1 | 2 |
102 | P3 | 1 |
103 | P1 | 1 |
104 | P2 | 3 |
105 | P2 | 2 |
PRODUCTS
PRODUCTID | PRODUCT DESC | COST |
P1 | Toy Dog | 20 |
P2 | Toy Tiger | 10 |
P3 | Toy Lion | 12 |
If a store keeper wants to calculate the total sale of a toy, there are thousands of rows. The server will take a lot of time to process the query and return the result. Hence, to speed up the query, the user can store the values of TAX, ORDERCOST, and PRODUCTCOST in one table as shown below.
ORDERS
ORDERNO | PRODUCTID | QTY | PRODUCTCOST | TAX | ORDERCOST |
101 | P1 | 2 | 40 | 8 | 88 |
102 | P3 | 1 | 12 | 1.2 | 13.2 |
103 | P1 | 1 | 20 | 2 | 22 |
104 | P2 | 3 | 30 | 9 | 99 |
105 | P2 | 2 | 20 | 4 | 44 |