Data Warehouse Testing
Well planned, well defined and significant testing guarantees the accurate conversion of the project into production. A business gains the real time use once the ETL processes are verified and validated by independent group of experts to ensure that the data warehouse is robust. It consists of the following features:
- Software testing involves the programmed code, while the data warehouse testing focuses on data and information
- Data warehouse focuses on the correctness and usefulness of the data delivered to the users
- Data warehouse testing involves a huge volume of data, hence the possible number of test scenarios are infinite in number
- It is almost sometimes difficult to predict all the types of errors that will be faced by the users in real operational data
Resources involved in Data Warehouse Testing
- Business analysts who create the requirements
- QA testers who prepare and execute the test plans and test cases
- Architects who organize the test environments
- DBAs who test for the performance and stress
- Business users who develop the functional User Acceptance Test
What is tested
- Data quality is tested to ensure the correctness of data loaded by the ETL procedures and accessed by the front-end tools
- Design quality signifies the verification of the user requirements that are well expressed by the conceptual and logical schema
How it is tested
- Functional test: It verifies that the item is manageable within its specified requirements
- Usability test: It manipulates the item by letting users to interact with it
- Performance test: It verifies that the item performance is adequate
under regular workload conditions
- Stress test: It reflects the performance of the item with huge amounts of data and very heavy workloads
- Recovery test: It checks how well an item is able to restore from crashes, hardware failures and other similar crashes
- Security test: It checks that the item preserves data and maintains the intended functionality
- Regression test: It ensures that the item functions normally after the occurrence of any change
Categories of Data Warehouse Testing
- New data warehouse testing: A new data warehouse is built and checked from the beginning. Data input is taken from customer requirements and different data sources and a new data warehouse is built and verified with the help of ETL tools
- Change request: New data is added from different sources to an existing data warehouse. A customer might also add a change to their existing business rules or they might integrate the rule
- Report testing: Reports are the resultant product of a data warehouse and the basic purpose of building a data warehouse. Reports are tested by validating the layout, data report, and calculation
Testing Process of a Data Warehouse
Requirements testing
The main significance of requirements testing is to check if the stated requirements are complete. The main factors of requirement testing are:
- Are the requirements complete?
- Are the requirements singular?
- Are the requirements inconclusive?
- Are the requirements developable?
- Are the requirements testable?
Unit testing
Unit testing is the white box for data warehouse testing. It should check for the ETL procedures/mappings/jobs and the reports generated. This is performed by the developers. It involves:
- Verification of the ETLs for access and pick up of the right data from the right source
- The accuracy of the data transformations in accordance to the business rules and the correct population of the transformed data into the data warehouse
- Testing the rejected records that do not adhere to the business and transformation rules
Integration testing
Unit testing is followed by integration testing. Integration testing should test the initial and incremental load of the data warehouse. The steps in integration testing are:
- Sequence of ETL jobs in a batch
- Initial load of data in a data warehouse
- Incremental load of records at a later date to check for the new inserts or the updates
- Generation of the error log
Scenarios handled in integration testing
- Count Validation: The record count verification of the reporting queries against the source and target as an initial check
- Source Isolation: Checking the data after separation from the driving sources
- Dimensional Analysis: Data integrity between the various source tables and relationships are verified
- Statistical Analysis: Verification for the various calculations
- Data Quality Validation: The missing data and discrepancies are verified
User Acceptance Testing
The system is tested for the complete functionality and is expected to function similarly in production. At the end of user acceptance testing the system should be acceptable by the clients in terms of the ETL process integrity, business functionality, and reporting
ETL Testing Techniques
The following are the techniques used for testing the ETL process:
- Check for the correctness of the data transformations in accordance to the business rules and requirements
- Check for the data population into the data warehouse without any data loss and truncations
- Validate the ETL process for the appropriate functionality of the rejections, replacements with default values and reporting of the invalid data
- Check for the data load time. It should be within the prescribed and expected time frames for the basis of high performance and expandability
ETL Testing Process
The EL testing process undergoes the following process:
- Business and requirement understanding
- Validating
- Test estimation
- Test planning based on the inputs from the test estimation and business requirements
- Designing test cases and test scenarios from all the inputs obtained
- The testing team proceeds with the approved test cases to perform pre-execution check and test data preparation for testing
- The execution is performed as the next step
Validation for ETL Testing
- Verify that data transformations from source to destination functions as expected
- Verify that expected data is loaded into the target system
- Verify that all the DB fields and field data is loaded without any truncation
- Verify the data checksum for the record count
- Verify that the rejected data is logged properly with all the required details
- Verify the NULL value fields
- Verify if any redundant data is loaded
- Verify the data integrity
ETL Testing Challenges
The challenges faced by the testing team are
- Conflicting and redundant data
- Loss of data during the ETL process
- Absence of inclusive test bed
- The unavailability of privileges to the testers to execute the ETL jobs on their own
- Volume and complexity of the data is huge
- Mistakes in the business process and procedures
- Difficulties in acquiring and building the test data
- Missing business flow information
Difference between Database Testing and Data Warehouse Testing
- Database testing is performed with lesser amount of data for OLTP applications while data warehouse testing is done with a huge amount of data for OLAP applications
- The database testing comprises of sources of data from homogenous sources while in data warehouse the data is obtained from heterogeneous sources
- In database testing create, read, update, and delete operations are performed whereas in data warehouse testing read-only option is used
- Normalized databases are used in database testing while denormalized databases are used for data warehouse testing