Managing Data from the bottom up
From Manage Your Business Data Layers and Improve Data Quality: Creation, maintenance and control of the data from the bottom up is the solution to getting a handle on data quality.
Foundation Data Layer
- Information in this layer is the basis for all business applications in your organization
- Focus on this layer first
- Figure out how your organization defines customer and create a way to refer to them organizationally wide so you do not have to duplicate records around (duplication is Bad for Data Quality)
- Syncronize how you refer to products so that sales, support, engineering all use the same lexicon when referencing the product (multiple names for the same thing is another Bad Thing for Data Quality)
- Assign unqiue identifiers for all places of business and use them across the organization
- Integrate HR and Asset Tracking systems into the greater systems ecosystem
Transactional Data Layer
- Builds on the Foundation Layer
- This data varies based on the business of the organization
- Things are this layer create relationships between the various Foundation objects
Operational Reporting Layer
- Used to manage the business on a day-to-day basis
- Based upon relationships of bits of Transactional Data
- Bad Data is often first detected here as they will be looking at Foundation Data through 2 degrees of seperation
Financial Management Layer
- This is where the accounting department comes into play
- Interprets the Foundation and Transactional layers through the lens of accounting rules
- In theory, Bad Data here means someone could goto jail (if under SOX reporting rules)
Executive Information Layer
- Used to make organizational strategy decisions
- Built primarily from Financial Management layer, but also from Operational Reporting
- Contradictory information here can have large consequences for an organization
So why does Data Quality get so Bad?
Applications are developed at the Transactional level, thus skipping the ground floor of the Data Heirarchy. When that happens, you can get multiple ways of referring to the same thing depending on which silo the system is for.
What can you do about it?
Review your existing systems and map the different ways things are called and migrate them to a single one. Also, when investing in/creating new systems, use the common references. Finally, and this one isn’t mentioned in the article, don’t be afraid of having a massive. If everything is stored in one server, the need for duplication is less. There are more and more people out there who know how to manage TB (or larger) databases and the strategies for dealing with them effeciently are becoming more and more mature.