Dimensional Data Modelling -
Dimensional modeling is a data modeling technique used primarily in data warehousing to organize and structure data in a way that is optimized for analysis and reporting. Instead of focusing on minimizing data redundancy like in transactional (OLTP) systems, dimensional modeling prioritizes making data easy to understand and query for business intelligence (BI) purposes.
The core concepts of dimensional modeling revolve around facts and dimensions:
- Facts: These are the quantitative measurements or metrics of a business process. They are typically numeric and can be aggregated (e.g., sales revenue, units sold, profit). Fact tables usually contain foreign keys that link them to dimension tables.
- Dimensions: These provide the context for the facts. They are descriptive attributes that answer the "who, what, when, where, and why" of a business event (e.g., product name, customer location, order date). Dimension tables contain attributes that are used for filtering, grouping, and labeling data in analyses.
These facts and dimensions are organized into structures called schemas, with the most common being the star schema.
Star Schema:
- This schema consists of one central fact table surrounded by several dimension tables, resembling a star.
- Each dimension table is directly related to the fact table through a foreign key.
- Dimension tables are typically denormalized, meaning they may contain some redundant data to improve query performance by reducing the need for joins.
These columns appear to contain descriptive attributes that provide context to the data. They seem to categorize or describe the business events.
- DATE OF PRICING: (When did this pricing information apply?)
- REP_ID: (Who is the representative associated with this data?)
- STATE NAME: (Where did this occur?)
- MARKET NAME: (Which specific market is this related to?)
- PRODUCT GROUP NAME: (What broad category does the product belong to?)
- PRODUCT NAME: (What is the specific product?)
- VARIETY: (What is the specific variety of the product?)
- ORIGIN: (Where did the product originate?)
Likely Fact Columns (indicated by the right green box):
These columns appear to contain numerical values that could represent measurements or metrics that you would want to analyze.
- ARRIVAL IN TONNES: (How much arrived?)
- MINIMUM PRICE: (What was the lowest price?)
- MAXIMUM PRICE: (What was the highest price?)
- MODAL PRICE: (What was the most frequent price?)
In summary, based on this snapshot:
- The columns in the left green box are strong candidates for dimension columns. They provide the context for understanding the "when," "where," "what," and "who" related to the pricing information.
- The columns in the right green box are strong candidates for fact columns (measures). They contain the numerical data about the arrival quantity and prices, which you would likely want to aggregate and analyze.
Step 2: Designing the Dimensional and Fact tables
What is Dimension Table (or) How to Identify Dimension Table?
If you discuss about any one or two behaviours of dimension table that will be fine
Dimension tables stores Business Description column values like markets , products
Relationship between each column inside one dimension table is always 1 - 1 . If any columns got 1 - Many relationship then need to be moved to new dimension table
Dimension table columns data don't change frequently and slowly change over the time
Include Surrogate Keys on Dimension tables to keep history of changes in source
What is Fact Table (or) How to Identify Fact Table?
If you discuss about any one or two behaviours of fact table that will be fine
Fact tables stores Business Measure column values like product-prices , product-quantities
Fact tables includes all Surrogate Keys from dimension table for joining and getting additional dimension values associated to the fact tables
Fact tables measures column data changed on regular interval and we need to load fact tables also regularly
Multiple fact tables required if measures captured are associated to different set of dimension tables
What is Logical and Physical Data Modelling?
Logical Data Model : Created based on business requirement without considering physical implementation
Physical Data Model : Created top of logical model to create it physically, normally in databases
What is Snowflake Schema?
The relationship between each column inside a dimension table is always 1 - 1 . If any columns got a 1 - Many relationship but still associated only to the same dimension table and not directly linked to fact table then we create child dimension table to store 1-M columns and link to the main dimension table using surrogate key. This way of designing Star Schema is called Snowflake Schema
Comments
Post a Comment