Architectural Framework of a Data Warehouse
Operational Source Systems
- Operational systems are used to process everyday transactions of an organization
- The operational systems are designed in such a way that the transactions occur smoothly and the data-integrity is maintained efficiently
- The operational systems have very fast insert/update since minimal data is affected each time a transaction occurs
- In order to improve performance the old data is purged systematically
Data Staging Area
ETL - Extraction, Transformation and Loading.
Extraction
- The extraction methods in a data warehouse depend on the performance of the source system and the demands of the business.
- Full extraction is applied when the data is required to be retrieved and loaded the first time. Hence, this extraction represents the current data available in the source system
- Incremental extraction is a process where the differences in the source data since the last extraction are captured. Only the changes will be loaded based on the last changed timestamp
- Online extraction is a process where the data is extracted from the source system directly
- Offline extraction is a process of extraction where the source system is emptied into a flat file outside of the source. This flat file is used to extract the data
Transformation
- The data is transformed based on the transformation rules provided by the business. The data is converted to a standard format and common semantics
- Data cleansing is the process of distinguishing and correcting the discrepant data from a database or table. Data cleansing also involves the synchronization of data. For example, the compliance of Male/Female to M/F
Loading
- Once the data is cleansed and transformed into a structure persistent with the data warehouse requisites, the data is then qualified to be loaded into a data warehouse
- Populating the data into the tables present in a data warehouse and verifying if the data is ready for use is the first step of loading
- After loading the facts and dimensions a DBA should check for referential integrity i.e. each record from the fact table should be related to a dimension record
Data Presentation Area
- The presentation area represents a collection of data marts. A data mart is a sub set of a data warehouse
- Data marts are preferred for smaller data volumes and fewer data sources. It enables easier data cleaning process
- Dependent data marts retrieve data from a central data warehouse whereas the independent data marts are standalone systems that extract data directly from the operational systems or external sources
Data Access Tools
- Business Intelligence tools are used for accessing the data for strategic, operational, and analytical purposes
- Senior executives and managers access the data warehouse for taking critical decisions. They devise strategies and observe the business performance
E.g. Balance Scorecards
- Operational managers execute the details of the strategies against the targets.
E.g. Sales Forecasts
- Analytical operations are performed by analysts to evaluate the outcomes of a business process and understand the functioning of the business
E.g. Financial and Sales Analysis
Technical Architecture
Two-tier Architecture
- Two-tier architecture consists of a front-end client component and a back-end server component. The two-tier model requires the SQL(Reporting Tool) to be hidden beneath the Graphical User Interface or executed as stored procedures in the database
- This architecture is not expandable and also cannot support a large number of end-users. It also has connectivity problems due to network limitations
Three-tier Architecture
Three-tier architecture observes the presence of the three layers of software – presentation, core application logic, and data and they exist in their own processors. This architecture is extensively used for data warehousing
Client
- This refers to the data that reaches the end-users. This can be in the form of tables or graphics or an email that is automatically generated everyday
- OLAP or a reporting tool like Business Objects and Cognos is used for representing the required data in a readable format
- Clients and stakeholders with different goals access the presentation layer for decision making
Application Server
The application server consists of the following:
- Summarized data is data available at different levels. For example, you can either retrieve data as independent transactions or as summaries by week, by year, by month, or by year
- Filtered data is used to limit the data in the report. For example, we can limit the values of a report based on a particular product by adding name of the product in the querying filter. The data is not modified but hidden at the report-level
- Metadata is defined as the consolidated data that relates to the detailed data. Structural metadata defines the data structures (database indexes, columns) and descriptive metadata consists of the occurrences of the application data and the data content
Database Server
- The database server is mostly a relational database system. The data is populated into the database using the ETL tools and utilities
Inmon’s Top-Down Approach
- A centralized data warehouse acts as a enterprise-wide data warehouse from which data marts are built as per the requirements of the specific departments
- The data model is based on Entity Relationship
- Persistent dimensional views of data across data marts can be viewed since all data marts are loaded from a data warehouse
- This data warehouse design is efficient against all business changes. Creation of a data mart from a data warehouse is very simple
- The analytic systems can access data in a data warehouse via the data marts
Kimball’s Bottom-Up Approach
- A business process is built using data marts which are joined together using common dimensions
- A dimensional data model with facts and dimensions is implemented here
- The reports can be generated quickly since the data marts are built first
- The data warehouse can be easily expandable to accommodate new units. It involves the creation of new data marts and then integrating with the other data marts
- The analytic systems access data via data marts
Federated Data Warehouse
- A federated data warehouse integrates all the legacy data warehouses, business intelligence systems into a newer system that provides analytical functionalities
- The implementation time is of a shorter period compared to building a enterprise data warehouse
Hub and Spokes Architecture
- The architecture is based on an extensive enterprise-level analysis of data requirements. The infrastructure that is built is expandable and maintainable
- The architecture can be developed in an iterative manner based on the subject area
- The dependent data marts that are developed later are based on the functional areas and may consist of normalized or denormalized data