Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rule Request] Every dimension table should have exactly one column who is the row identifier #29

Open
sqlenders opened this issue Mar 29, 2019 · 8 comments

Comments

@sqlenders
Copy link

sqlenders commented Mar 29, 2019

Sounds stupid until you find out how many tables you have without a rowidentifier/key

@otykier
Copy link
Collaborator

otykier commented Mar 29, 2019

In my opinion, this rule should only apply to dimension tables, or am I missing something?

@sqlenders
Copy link
Author

if you use a fact table as a bridge between many to many relationships you might end up with strange results.

@otykier
Copy link
Collaborator

otykier commented Mar 29, 2019

Indeed - but in pure star schemas where you don't use many-to-many across a fact table, you will pay a penalty in the VertiPaq compression if you add a column containing only unique values to a table - especially if the table is big (which fact tables typically are).

@sqlenders
Copy link
Author

You are right. Change the title to Dimension tables should have ... ?

@otykier otykier changed the title [Rule Request] Every table should have exactly one column who is the row identifier [Rule Request] Every dimension table should have exactly one column who is the row identifier Mar 29, 2019
@otykier
Copy link
Collaborator

otykier commented Mar 29, 2019

Agreed. And just for reference, the way we identify a table as being a dimension table, is if the table is on the "one" side of any relationship in the model.

@DaniilMaslyuk
Copy link

Agreed. And just for reference, the way we identify a table as being a dimension table, is if the table is on the "one" side of any relationship in the model.

Isn't then having a row identifier already implicitly enforced by VertiPaq? You can't put a table on the one side of a relationship unless it has a column with unique values, can you?

And having exactly one row identifier column is not something that should be enforced. You may have two keys in a dimension, with one being used for data coming from source A, and another key used for data coming from source B -- isn't this a legitimate reason to have two keys?

@otykier
Copy link
Collaborator

otykier commented Apr 16, 2019

Maybe I'm misunderstanding something, but I believe @sqlenders wants a rule that checks all dimension tables (i.e., tables that only appear on the "One" side of any relationship - this is just how the Best Practice Analyzer distinguishes dimension tables from other tables), to make sure the IsKey-property has been set to true for the column participating in relationships. I'm not sure what the advantage (if any) of setting this property is, but I believe it has something to do with the implicit RowNumber column that normally exists on all tables. The TOM does not permit having more than one column in a table with this property set, but nothing prevents you from having multiple columns in a table, that contain only unique values.

@DaniilMaslyuk
Copy link

There's probably something I'm misunderstanding as well. Perhaps there's some advantage for this proposed Best Practice Analyzer rule that I'm not seeing -- keen to hear from @sqlenders.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants