Monday, February 27, 2012

Hot-Swappable dimensions

We are in the middle of creating a dataware house. In the current
model we have several hot-swappable (sub)dimensions which is linked to
a large dimension table. All dimensions are type 2. I don't think that
this is correct because when the data in the sub-dimension is changing
we need a new record in the large dimension too, otherwise you lose
the history of the record.
Instead I would like to create one FK in the Fact table which is
linked to all sub-dimensions. Because this key needs to be unique over
all sub-dimensions I am thinking about using a uniqueidentifier (16
byte GUID) field.
Is this the correct approach?
At a later stage at least 4 new hot-swappable dimensions will be
introduced.On May 2, 11:35 am, LuCarD <j...@.lucard.nl> wrote:
> We are in the middle of creating a dataware house. In the current
> model we have several hot-swappable (sub)dimensions which is linked to
> a large dimension table. All dimensions are type 2. I don't think that
> this is correct because when the data in the sub-dimension is changing
> we need a new record in the large dimension too, otherwise you lose
> the history of the record.
> Instead I would like to create one FK in the Fact table which is
> linked to all sub-dimensions. Because this key needs to be unique over
> all sub-dimensions I am thinking about using a uniqueidentifier (16
> byte GUID) field.
> Is this the correct approach?
> At a later stage at least 4 new hot-swappable dimensions will be
> introduced.
If you are going to create more than 2^31 record in the sub-dimension
table, probably you have to split that in several dimensions with a
reduced number of different values. Denormalizing the sub-dimension
into a "regular" dimension could be a good idea (it depends on many
other factors) but in this case there is no reason to continue using
surrogate keys with a tiny size (int or even smallint if you can).
When you have this data on the fact table, actually you preserved
history and you don't need to use SCD type 2 for these dimensions.
If you want to do more complex analysis, like "give me the sales of
this months projected on the organizational structure we had one year
ago", then you need to design a more complex model that will be much
difficult (and slow) to query, unless you use a particular OLAP model
like some of those described in the many-to-many paper you can find
here: http://www.sqlbi.eu/manytomany.aspx
Marco Russo
http://www.sqlbi.eu
http://sqlblog.com/blogs/marco_russo

No comments:

Post a Comment