04 - Data Warehouse Modelling

What is Data Modeling

The interpretation and documentation of the current processes and transactions that exist during the software design and development is known as data modeling. The data modeling techniques and tools simplify the complicated system designs into easier data flows which can be used for re-engineering. It is used to create the logical and physical design of a data warehouse.

What is the need for Data Modeling in a Data warehouse

Collecting the Business Requirements

  • Typically, a data warehouse is designed with the data architects and the business users determining the entities required in the data warehouse and the facts that need to be recorded. This initial design has much iteration before deciding the final model
  • At this stage, we need to overcome the common drawbacks faced when designing. Since a data warehouse is implemented from an existing system the architects at times implement a larger part of the older system into the new design to save time or leave out details
  • The logical model captures the business requirements efficiently and serves as a building block for the physical model

Enhancing Database Performance

  • Query performance is a vital feature of a data warehouse. Enormous data volumes are involved in a data warehouse, so using a data model product for management of the metadata and the data used by the BI users is very important
  • The physical model adds indexing which optimize a database performance. At times the schemas too are changed. For example, a star schema can be changed to a snow flake schema if it promises a faster retrieval of data

Offers Source and Target System Documentation

  • In the process of designing an ETL system, it is very important to verify the physical and logical models of the source and target systems respectively
  • The data modeling offers this documentation which acts as a reference for the future

 

Conceptual Data Model

A conceptual data model determines the highest-level relationships among the different entities.

  • It is the primary step in creation of a data-model in top-down approach that is an exact representation of the business organization
  • Conceives the overall structure of the database and gives information of the subject-areas
  • Consists of entity types and relationships. The relationship between the subject areas are represented by symbolic notations (IDEF1X or IE). Cardinality in a data model exhibits the one to one relationship or many-to-many relationship
  • No primary key is stated
  • No attribute is specified

Logical Data Model

Logical data model represents the specific particulars of the entities, attributes, and relationships involved in a business. It is the basis on which a physical model is designed.

 

 

 

  • The development of a logical model begins after the sign-off of the conceptual data model by the functional team
  • A logical model should systematize the physical design process by defining the data structures and the relationship between them
  • The primary keys and foreign keys are established here
  • Normalization occurs here
  • Represents all the entities and the relationships between them

 

 

Physical Data Model

Physical data model exhibits the model of the database that is to be built. It represents the table structures, column names, column data types, primary keys, and foreign keys.

 

  • The physical data model is developed after receiving the acceptance of the logical data model by the functional team
  • Physical data model might be different from the logical data model due to few physical constraints
  • Physical data model differs for different databases. The data types change for different databases
  • Denormalization takes place according to the user requirements
  • The logical model is changed to physical data model by implementing the database rules, referential integrity , super types, and sub types

Relational Data Model

Relational data modeling is used in OLTP systems which are transaction-oriented. The major characteristics of a relational data model are:

Relationship among the tables

  • All the data is stored in tables and each relation has rows and columns
  • The table should have a header and a body. Header is the list of columns in the table and body consist of the values populated in the table. Tuple is the unique value generated from the junction of one column and one row

Usage of keys

  • Primary key is the most important key in a table. It is used as a unique identifier. The primary key is always a not null column
  • Foreign key is used to relate to the primary key. They relate the data from one table to another table and establish a relationship

Data Redundancy

  • The relational data model applies rules to maintain data integrity
  • It eliminates data redundancy. The data is not stored repetitively. This helps in maintaining data consistency and limited data storage

Multi Dimensional Data Model

A multi dimensional data model is logical view of an enterprise that represents the important entities of a business and the relationship between them. It is not restricted to a physical database and tables. It’s not represented by E-R diagrams. The main components are:

Attributes

  • Attributes are the abstract terms devised for easier summarization of data on a report
  • They can also be defined as the column headings that are not part of any calculations on a report

Dimensions

  • A dimension is a data set comprising individual, non overlapping data elements
  • They enable end users to define, group and filter the data for display and browsing purposes

Facts

  • A fact is a table consisting of columns that are used for numeric purposes to answer the business questions
  • They consist of additive, non-additive, and semi-additive measures

Data Modeling Tools

The open source data modeling tools are

Argo UML

  • The vendor is Tigris
  • Used for modeling UML diagrams
  • Supports all standard UML 1.4 diagrams

DIA

  • Hosted by Sourceforge
  • Software similar to Visio that is used for drawing diagrams
  • It comprises special objects that envisage in drawing entity relationship diagrams, UML diagrams, flowcharts, and network diagrams

Power Architect

  • Vendor is SQL Power
  • User-friendly data modeling tool created by data warehouse designers
  • Helps users to auto-generate ETL metadata and execute data profiling

StarUML

  • Hosted by Sourceforge
  • Used for developing expansible, flexible and freely-available UML/MDA platform running on Win32 platform

 

Data Modeling Best Practices

  • Complete analysis of the business requirements of the clients should be performed before starting the data model
  • Conducting sessions with the clients discussing the requirements and data modeling methods and getting immediate confirmation from the business subject matter experts should be given paramount importance
  • Assuring data quality through a series of checkpoints in the process to eliminate errors and data redundancy
  • The data model should be understood by the business, whether in a graphical/metadata format or expressed as text business rules
  • The data model should be easily translatable into English for wider audience

Like us on Facebook