It is common to want to model a Many-to-Many relationship between Dimensions and Facts.  For example, it may be necessary to see all employees associated with an opportunity, not just the primary.  This blog presents a series of tools and techniques that may be applied to solve a particular case. This is just an attempt to share this valuable information which has been written by one of my colleague while working on customer site. 
Technique #1: Select a Primary
Although not a technical solution, the best way to solve the M:M problem is to eliminate it.  By selecting one of the many dimensional records that are associated with a fact, the entire problem can be avoided.  In the Siebel OLTP, Primaries are used throughout the model, which are carried over and used in the Analytics model.  If it is at all possible to identify a primary, and the use of the primary is acceptable to the user community, then it is recommended to use this technique.
Technique #2: Direct Modeling into the Dimension
A straightforward technique where the table that serves as the intersection table is modeled into a lower level in the Dimension.  The specifics of this technique are similar to those outlined in Solution B of the No direct physical link between a base Dimension and a Fact table section above. 
Note that over-counting will occur when performing the many-to-many join.
Technique #3a: Use of a Bridge Table
Instead of modeling the relationship table into a new lower level in the dimension as in Technique #2, the relationship table can become a separate logical table that servers as the Bridge between the dimension and the facts.  Create a new Logical table with the M:M relationship table as the source, mark the logical table as a Bridge table, and adjust the Business model to show the relationship of Facts:Bridge as 1:M and Bridge:Dimension as M:1.  The indication that the Logical Table is a Bridge table is merely an indicator to Analytics that the table is not a Fact table, which it assumes to be any lowest-level table in the data model.
Note that over-counting will occur when performing the many-to-many join
Technique #3b: Use a Weighted Bridge Table
Similar to Technique #3a, this technique is the classic Kimball approach, where the Bridge table employs weighting factors to prorate a total value over multiple records.  For example, if there is one Opportunity  worth $1,000,000 and there are two Employees associated with it, the bridge table might contain a record for each with a weighting factor of 0.5.  In this way, each employee will be associated with 0.5 of the whole amount of $1,000,000, or $500,000.  If it is determined that Employee A should receive 75% of the credit, then the weighting factors would be stored as 0.75 and 0.25, which would give Employee A 75 of the total or $750,000.
It is important to note that the weighting factors must all add up to 1 (One), as they are effectively percentages of a whole.  Additional ETL effort will be required to complete this solution.
This technique eliminates over-counting, but may be difficult to implement if users are not comfortable prorating a value over several records.
Technique #4: Use Level Based Measures
As an enhancement to Techniques 2 and 3, the use of level based measures can help prevent the over counting problem associated with each.  When a metric or measure is explicitly bound to a specific level in a dimension, it is indicating that the metric will be viewed at that level. If the metrics in a fact table are to be viewed by a Dimension with which it has a M:M relationship, those metrics can be set to a level in the dimension, thereby forcing that the records be broken out across that dimension.  By forcing a breakout of rows (one fact row for each dimensional row), aggregation is prevented, and therefore over counting will not occur.
As an example, suppose there is a M:M between Employee and Fact_Opty_Revenue.  The data in the tables indicate that Tom, Larry and Bill are all linked to an Opportunity  worth $9 million.  The user makes a report that asks for the Opportunity Type and the total Potential Opportunity Revenue.  Without level setting the metrics on the fact table, a report that does not include the employee dimension will overcount, as each of the three dim records will be brought into the query and aggregated into one:
| 
Potential  | |
| 
Software Sales | 
$27,000,000 | 
By level setting the Revenue metrics to the Employee level in the Employee Dimension, this same report will return the following:
| 
Potential  | 
| 
Software Sales | 
$9,000,000 | 
| 
Software Sales | 
$9,000,000 | 
| 
Software Sales | 
$9,000,000 | 
Although not intuitively obvious as to the cause of the breakout to the end user, the over counting scenario is prevented.  When the user adds the Employee to the report, the breakout becomes clearer:
| 
Employee | 
Potential  | |
| 
Software Sales | 
Larry | 
$9,000,000 | 
| 
Software Sales | 
Tom | 
$9,000,000 | 
| 
Software Sales | 
Bill | 
$9,000,000 | 
Technique #5: Lower the Fact Table
The most complicated and involved solution is to lower the level of the fact table, and create a 1:M between the Dimensions and the Facts.  This involves a business rule to split up the metrics and spread them over all possible dimensional records.  In the example above, the simplest spread would be to assign Larry, Tom and Bill each 1/3 of the total amount of $9,000,000, or $3,000,000.  Thus, a report that does not break out by Employee will still total to the correct $9,000,000.  Note that this would require three records in the fact table instead of one, hence the concept of lowering the level of detail in the fact.
I will leave final call to choose an appropriate technique on designer or developer. The selection should be based on customer requirement as I understand technology design is to support the customer aspiration. 
 
 
 
Nice
ReplyDeletehttp://mkashu.blogspot.com
I am using Oracle 11g and I need to resolve a many to many relationship and I am having issues. I have a model for survey data. My fact table is at the survey question for each person that completed the survey. I have benchmark data for a given question in the fact table. There can be 1000 possiblites of topbox percentiles based on the percentile they want. I created a dim_percentile_grouping that has a one to many relationship to my fact table. I then created a bridge table between the dim_percentile_grouping table and the percentile table to resolve the many to many relationship. When I publish this, it works with very small filters, but the minute I start bringing in filters from the other dim tables attached to the fact table, it will not return any records. Do you have any idea what could be causing this?
ReplyDelete