Why Systems Fail - The Dead-end of Dirty Data
by Olin Thompson


If your data does not reflect reality, the system can never be effective. In today's world of collaboration, showing a trading partner dirty data is giving them the wrong message and tearing down the trust called for in a collaborating partnership. What is dirty data? When reality and the data in your system do not agree, you have dirty data. It may be as simple as having 1000 units of an inventory item on the shelf and the system says you have 900 or 1002. It may be a customer's ship-to address is out of date. It may be that you have the same piece of information in two places (two applications or even two systems) and they do not agree - one of the two or maybe both are dirty. One rule of having the same piece of data in two places is - "identical data isn't".

How does it get dirty?

There are many reasons why a piece of data may be dirty. The data or the transaction causing a change in the data may have been inaccurately recorded. In the case of our inventory example, one of the potentially many transactions that change the on-hand data could have been inaccurate. For example, a receipt may have had an incorrect quantity.

Alternatively, the physical act may have been flawed, for example, an order called for 100 units to be picked, the transaction may have been recorded as 100 but the physical picking resulted in more or less than 100 being picked. Perhaps worse, when the material was placed in a location within the warehouse, the location was incorrectly reported, resulting in two pieces of dirty data (one location says it has some inventory but has none and a second location says it has no inventory but it does have some.)

The transaction may be correct, but the recording of the transaction is delayed so that for a period of time, the data does not reflect reality. This timing problem results in data being temporarily dirty. No harm, unless some decision will be based upon the dirty data. Our systems do not have to be "real time" but they do have to be "right time" to avoid decision making on dirty data.

A piece of data may be dirty because it was derived from dirty data. When a credit check uses an incorrect number for the total amount outstanding, it is usually because one of the outstanding invoices is dirty. The total amount outstanding was derived from individual invoices.

Software can sometimes be blamed for dirty data. If a flaw or bug exists in the software, it may result in corruption to the data. In this case, the result is typically that many pieces of data are dirty, all in the same way.

How do we know we have dirty data?

Dirty data is often a sleeping problem, one that can wake up at any time. Dirty data does not always get detected; it causes problems that are minor enough to remain hidden. It does not mean that the dirty data is not causing problems, it may be that the data is well hidden. The dirty data may be used in a way that causes other data to become dirty but not to be detected. For example, if a lead-time is incorrect, order calculations will be incorrect also, resulting in either over stocking or out of stocks. The out of stock condition will alarm us, but the over stocking will usually continue unnoticed.

Infrequently, the dirty data may create error conditions, alerting someone of the problem. All to often, the error conditions are not reported and the data remains dirty.

Spot checks are used to detect some problems. Auditors send out verification letters to customers or suppliers. Cycle counts or physical inventory gives us a precise picture of reality and the process allows us to compare the system to this reality.

If we are lucky, we get feedback from users and trading partners telling us about problems.

What is the impact of dirty data?

Dirty data may mean a dead-end to business value. Even worse, it can have a negative impact on business value. Dirty data can cause minor problems or be catastrophic. A catastrophic problem would be losing a customer or having to take a major financial write-off due to inventory problems. Less of an impact is carrying too much inventory (carrying too little can mean a loss of revenue). Even less of an impact is an invoice going to the wrong department at a customer site but the customer routing it correctly to fix your mistake, again and again. The impression you leave with the customer is that you are out of control or that you do not care.

If you are giving trading partners access to your data, what is the impression that you are leaving? When you open your collaboration door, the trading partner sees the inside of your company - good or bad. Internal problems can quickly become external problems.

Perhaps worse, if you convert dirty data to be used with a new system, what happens? You will have problems with the new system but it will be very difficult to determine the cause of the problems, the dirty data or the system itself.

What can we do about dirty data?

We need to be on the lookout for dirty data. When it is detected, we need to both fix the data and, more importantly, the problem that caused it to be dirty in the first place. We need to seek out dirty data and fix it before it results in business problems. Business Intelligence systems can help find dirty data by putting it in front of people who can judge it best, in the form of information. These people can locate logical inconsistencies (a number being too large or too small for example.)

In the most extreme examples, we must undertake a cleansing process. We must proactively seek out the correct information and take steps to correct it. This may mean a physical inventory, or a campaign to get all customers to validate their name and address. This may mean a program to compare the information in two systems, find where they disagree and to settle the data (and political problems) that exist.


Dirty data causes problems - large and small, catastrophic and insignificant. With today's IT budget, big cash outlays for the acquisition of new hardware or software are limited. Maybe now is the time to use your resources to search out dirty data and fix the problem - before your trading partners or auditors find them.

Olin Thompson, a principal of Process ERP Partners, has over 25 years experience as an executive in the software industry with the last 17 in process industry related ERP, SCP, and e-business related segments. Olin has been called "the Father of Process ERP." He is a frequent author and an award-winning speaker on topics of gaining value from ERP, SCP, e-commerce and the impact of technology on industry. He can be reached at Olin@ProcessERP.com .

Many more articles in The CIO Refresher in The CEO Refresher Archives


Copyright 2004 by Olin Thompson. All rights reserved.

Current Issue - Archives - CEO Links - News - Conferences - Recommended Reading