Let’s find out which schema should use on which condition.
We will go in three Phase:-
1:- Basic diff b/w Star Flake and Snow Flake
2:- Come up with some details which one I should prefer when?
3:- Where do the two methods fit in?
|Snowflake Schema||Star Schema|
|Ease of maintenance / change||No redundancy, so snowflake schemas are easier to maintain and change.||Has redundant data and hence less easy to maintain/change|
|Ease of Use||More complex queries and hence less easy to understand||Lower query complexity and easy to understand|
|Query Performance||More foreign keys and hence longer query execution time (slower)||Less number of foreign keys and hence shorter query execution time (faster)|
|Type of Datawarehouse||Good to use for datawarehouse core to simplify complex relationships (many:many)||Good for datamarts with simple relationships (1:1 or 1:many)|
|Joins||Higher number of Joins||Fewer Joins|
|Dimension table||A snowflake schema may have more than one dimension table for each dimension.||A star schema contains only single dimension table for each dimension.|
|When to use||When dimension table is relatively big in size, snowflaking is better as it reduces space.||When dimension table contains less number of rows, we can choose Star schema.|
|Normalization/ De-Normalization||Dimension Tables are in Normalized form but Fact Table is in De-Normalized form||Both Dimension and Fact Tables are in De-Normalized form|
|Data model||Bottom up approach||Top down approach|
Where should they be applied?
- Data optimization:
Snowflake model uses normalized data, i.e. the data is organized inside the database in order to eliminate redundancy and thus helps to reduce the amount of data. The hierarchy of the business and its dimensions are preserved in the data model through referential integrity.
Figure 1 – Snow flake model
Star model on the other hand uses de-normalized data. In the star model, dimensions directly refer to fact table and business hierarchy is not implemented via referential integrity between dimensions.
Figure 2 – Star model
- Business model:
Primary key is a single unique key (data attribute) that is selected for a particular data. In the previous ‘advertiser’ example, the Advertiser_ID will be the primary key (business key) of a dimension table. The foreign key (referential attribute) is just a field in one table that matches a primary key of another dimension table. In our example, the Advertiser_ID could be a foreign key in Account_dimension.
In the snowflake model, the business hierarchy of data model is represented in a primary key – Foreign key relationship between the various dimension tables.
In the star model all required dimension-tables have only foreign keys in the fact tables.
The third differentiator in this Star schema vs Snowflake schema face off is the performance of these models. The Snowflake model has higher number of joins between dimension table and then again the fact table and hence the performance is slower. For instance, if you want to know the Advertiser details, this model will ask for a lot of information such as the Advertiser Name, ID and address for which advertiser and account table needs to be joined with each other and then joined with fact table.
The Star model on the other hand has lesser joins between dimension tables and the facts table. In this model if you need information on the advertiser you will just have to join Advertiser dimension table with fact table.
Snowflake model loads the data marts and hence the ETL job is more complex in design and cannot be parallelized as dependency model restricts it.
The Star model loads dimension table without dependency between dimensions and hence the ETL job is simpler and can achieve higher parallelism.
This brings us to the end of the Star schema vs Snowflake schema debate. But where exactly do these approaches make sense?
Are we on Same Page?
I think you have doubt. Let me give two situation hope you will get better understanding.
Where do the two methods fit in?
With the snowflake model, dimension analysis is easier. For example, ‘how many accounts or campaigns are online for a given Advertiser?’
The star schema model is useful for Metrics analysis, such as – ‘What is the revenue for a given customer?’