06 - Data Marts

What is a Data Mart?

A data mart is a repository of operational and aggregated data that supports the business enterprises in making critical decisions. The data mart consists of predefined subset of data that is organized for rapid analysis and reporting. The data mart focuses on meeting the different demands of a distinct group of users in terms of analysis and content

 

Reasons to create a Data Mart from a Data Warehouse

  • To create a passive copy of a few fact tables that does not change frequently for the purpose of analysis. This is a temporary process where the data mart is deleted after the analysis is completed
  • To decrease the query load on a data warehouse. A copy of the fact table is created in a separate database located on a different server, and the reports are pointed to this server. This mart is refreshed everyday
  • To simulate the critical business scenarios by changing the data in a data mart. The data in a data warehouse cannot be manipulated without affecting the performance. Hence a data mart offers the perfect solution

 

Modeling a Data Mart

Creating a logical design

  • The design of the facts and dimensions must be approached systematically. The schema is based on this design
  • The lowest level of the data stored in a fact: granularity is the first step in designing a fact. The granularity is designed based on which dimensions to be included and the hierarchy of the dimensions
  • The historical duration of the data should be considered since that provides the basis for the physical storage design of a data mart. The data mart is designed based on the amount of data to be stored

Designing a schema

  • The star schema is the simplest schema to design a data mart
  • Once the facts, dimensions, and their granularity is established; the relationship between the facts and dimensions are defined
  • The intuitive grouping of data, easier retrieval of data, and the granularity of the data are the reasons for using a star schema

Moving from logical to physical design

  • The data is converted to a physical description in the form of tables and constraints in the physical design process
  • The database structures are designed to achieve the optimum performance. The data mart users execute different type of queries for which the data retrieval time plays a key role. The data partitioning and data indexing are the major physical design decisions that enhance the performance

Estimating the size of the data mart

  • The sample data should be analyzed to predict the number of rows in the fact table
  • The size of one row should be estimated
  • The size of the fact table should be estimated by multiplying the number of rows by the size of the row
  • The size of the data mart is estimated next. This is based on the size of the fact table. The data mart is considered to be three to five times the size of the fact table

Independent Data Marts

 

  • An independent data mart also known as stand-alone data mart emphasizes on a particular subject area. It is not designed in an enterprise context
  • Business intelligent tools or analytic tools query data directly from the data mart and present information to the user
  • Independent data marts can be built in a short time

Dependent Data Marts

  • According to Bill Inmon, a dependent data mart is a place where its data comes from a data warehouse. Data in a data warehouse is aggregated, restructured, and summarized when it passes into a dependent data mart
  • To improve the performance of a data warehouse, building one or two dependent data marts is the best solution. This is due to the data being processed outside the data warehouse
  • The dependent data marts provide security to the business since the data is stored in a data mart and each department owns and controls the data

Data Warehouse vs Data Marts

Data warehouse

  • Contains multiple subject areas
  • Consists of detailed data
  • Integrates different data sources
  • Does not always use a dimensional data model

Data Marts

  • Contains of only one subject area
  • Consists of summarized data
  • Focuses on integrating information for particular subject area
  • Is built based on dimensional model using the star schema

Advantages of a Data Mart

  • Building a data mart is simpler compared to implementing a corporate data warehouse
  • Data marts break down the complex data design into simpler manageable pieces. This not only helps the end-user but also the development teams. The data mart being small and simple allows the teams to maintain them easily
  • The reporting queries become simpler since filtering is allowed in a data mart. Smaller queries performing on a smaller subset of data are easier to tune in cases of performance enhancement
  • The data marts can be placed on smaller distributed machines to allow users to break away from profoundly powered machines and still handle processing of the reports
  • The cost of implementing a data mart is far less when compared to build a data warehouse
  • Applicable data is provided as demanded by the requirements of the end-users
  • Easier understandability of data by the customers since the data caters to a specific area of interest

Disadvantages of a Data Mart

  • The data marts are frequently short-term, temporary solutions that are not part of a corporate architecture. Each group of users place a request to extract a piece of data from the production system and analyze for their own purpose, increasing the costs of extraction and downloading
  • The continuous implementation of data marts poses a problem of tracking. Each data mart may require a data refresh from one or more operational systems per day. This may require hardware upgrades in case of periodic failures
  • The numerous activities of data extractions, administration, and environmental impacts increase the hardware and software expenses

Like us on Facebook