Thursday, March 29, 2012

Correcting old data

Hi,
What's the best way to handle data reloads when you have implemented Type 2
slowly changing dimension?
For example, you might have an Account dimension, with a status of open or
closed. Someone accidentally closes the account, which we then extract and
update, creating a new record. They then realise the mistake and re-open the
account. The next extract would be create a third record, with the original
value.
On top of this, fact data may have been loaded against the second (invalid)
record.
If the source system doesn't keep a history of these changes, then we can't
redo the extract because it will contain a mix of valid and invalid changes.
Thanks,
Wreck.
Hello Wreck,
Have you considered type 1 for dimensions that are affected by this
issue? This way the transactions are not effected. You can have a
mixture of type 1 and 2 in your stars.
Check out:
http://bi-on-sql-server.blogspot.com...-changing.html
I wrote a blog on type 2 and type 1.
Type 2 only works at the point of time for the extracts as you
mentioned. One way of resolving this problem with dirty changes is to
take more frequent extracts to capture more of the changes.
Another option is to remove dirty changes from the dimension if there
are no corresponding fact transactions.
Hope this helps
Myles Matheson
Data Warehouse Architect
http://bi-on-sql-server.blogspot.com/
|||Hi Wreck,
you are entering 'religious territory' ground.....
There are two religions here:
1. Go back and correct data 'known to be incorrect'.
2. Retain data known to be incorrect and enter new transactions which
update the DW such that it now represents the data 'known' to be
correct. (Which is really just 'thought to be correct now'.)
I come down in the camp of retain the incorrect data and enter new
transactions such the data is now set to the 'thought to be correct
value'. In your case, the account would be 're-opened' and it would
occur as 're-opened'.
Why?
Because going back and updating data is a very slippery slope. Where
does it end? And if decisions were made against data that was presented
at a certain point in time and then the underlying data is 'corrected'
and people come along to review those decisions and the data says
'these decisions are not well founded on the data' what does one do?
One must remember that data is never, ever 'true' or 'correct' it is
only ever 'thought to be correct/true' which means that data that we
have realised is incorrect and we now believe to be true has the same
likelyhood of being incorrect as the data we just changed....
Whereas, if we retain the fact that we found data to be incorrect and
we enter transactions to place new transactions to correct that data we
can see the information on which decisions were taken at the time. We
can also see how often data is being corrected....perhaps there are
underlying issues for a high rate of data corrections occurring such as
lack of proper training in the call centers taking calls and entering
data? It happens...
In some cases it is a legal requirement not to correct historical data.
For example, in most countries it is illegal to update the financial
statements of a company after the close of the annual accounts...yet
clearly there have been some very 'incorrect' entries in the financials
of some very large companies over the last few years.
Best Regards
Peter Nolan
www.peternolan.com
|||Thanks for your feedback guys.
Myles, the system currently has Type I changes across the board, and
naturally doesn't track history very well. I want to implement Type II, but
the data in the source systems has its issues, hence my concerns.
There are steps being taken to improve the data quality in the source systems.
Peter, I'm inclined to agree with you that we don't correct anything,
especially once it's been reported on. However, the business is still fairly
new to the whole warehouse concept - prior to this, it was the usual story of
lots of Access and Excel apps extracting and manipulating data for reporting.
When these data errors are found, they don't really understand why it's so
difficult to make a correction.
Thanks,
Wreck.
"Peter Nolan" wrote:

> Hi Wreck,
> you are entering 'religious territory' ground.....
> There are two religions here:
> 1. Go back and correct data 'known to be incorrect'.
> 2. Retain data known to be incorrect and enter new transactions which
> update the DW such that it now represents the data 'known' to be
> correct. (Which is really just 'thought to be correct now'.)
> I come down in the camp of retain the incorrect data and enter new
> transactions such the data is now set to the 'thought to be correct
> value'. In your case, the account would be 're-opened' and it would
> occur as 're-opened'.
> Why?
> Because going back and updating data is a very slippery slope. Where
> does it end? And if decisions were made against data that was presented
> at a certain point in time and then the underlying data is 'corrected'
> and people come along to review those decisions and the data says
> 'these decisions are not well founded on the data' what does one do?
> One must remember that data is never, ever 'true' or 'correct' it is
> only ever 'thought to be correct/true' which means that data that we
> have realised is incorrect and we now believe to be true has the same
> likelyhood of being incorrect as the data we just changed....
> Whereas, if we retain the fact that we found data to be incorrect and
> we enter transactions to place new transactions to correct that data we
> can see the information on which decisions were taken at the time. We
> can also see how often data is being corrected....perhaps there are
> underlying issues for a high rate of data corrections occurring such as
> lack of proper training in the call centers taking calls and entering
> data? It happens...
> In some cases it is a legal requirement not to correct historical data.
> For example, in most countries it is illegal to update the financial
> statements of a company after the close of the annual accounts...yet
> clearly there have been some very 'incorrect' entries in the financials
> of some very large companies over the last few years.
> Best Regards
> Peter Nolan
> www.peternolan.com
>
|||Hi Wreck,
I have been doing DWing for 15 years...and I have seen a great many
cases of significant destruction of value within a company on the basis
that 'the business users do not understand'.......and I have seen
even more because 'the IT people do not understand'.!!!... ;-)
Indeed, it seems that those who 'do not understand' seem to think the
onus is on those who 'do understand' to explain to them what it is that
is not understood to persuade them not to destroy value in their
company. Those who 'do not understand' do not seem to think the onus
is on them to make an effort to understand to assist them create value
in their company...and so value is commonly destroyed....it would be
funnier if it was not so common...(LOL)!!
You are lucky in that today there are tons of materials available on
reasonable practices for DW development.
However, what is published is never the 'leading edge' and is rarely
'best practice'.
'Leading edge' and 'Best Practice' are retained within companies for
competitive advantage...;-)
Best Regards
Peter Nolan
www.peternolan.com

No comments:

Post a Comment