dirty data and the database problem
Who isn't underwhelmed by the claims made for 'big data'.
The two problems which will never go away are 1) proliferating databases
and 2) unclean data.
Unclean data is the biggest problem with any plan to compare and utilise data. It is a purely human problem. Bad inputting, bad collection, multiple sources- these are the reasons for unclean data.
Data has to be inputted to fit the database in question. Transfering it to a new database you find it needs a different format, a different setting. But you input it as best you can- and so dirty data gets into the system.
Next- the number of databases is always rising. This violates the first rule of databases which is: only one database for one set of data. Why? because one of the databases will be neglected and the data will soon 'not match'. The classic case of this is when you have two address lists and forget to update both with every change of address. PRETTY SOON you won't know which database is correct.
The number of databases is always rising. So data is always at risk. We will always have this problem...