Please enable JavaScript.
Coggle requires JavaScript to display documents.
DATA QUALITY - Coggle Diagram
DATA QUALITY
COMMON QUALITY MISTAKES
Structural Errors
Outliers
Unit Inconsistencies
Different Representations
Missing Data
incorrect Formats
Duplicates
Typographical Errors
Misclassification
Obsolete data
Non-compliant data
CAUSES OF POOR DATA QUALITY
Manual data entry
Automatic data entry
Data source
Data ageing
Technical issues
Copying
Migrations and conversion
System changes
ATTRIBUTES OF GOOD DATA
COMPLETENESS Do you have all the data?
VALIDITY Are these values acceptable?
UNIQUENESS Are there duplicated or redundant entries?
ACCURACY How well does the data describe reality?
CONSISTENCY Are all versions of the data the same?
RELEVANCE Does the dta support the objective
TIMELINESS The degree to which the data represents a particular time
CLEAN DATA
DELETE
RECORD
REPLACE
PLACEHOLDERS
PARSE
NORMALIZATION
it is a way to clean the unstructured data
1ST NORMAL FORM
All of the data in a field must mean the same thing.
Each field/cell must contain only one item
There are no repeating columns storing the same data.
Each row must be unique
Eliminates unnecessary data duplication
Improves consistency and therefore ease of use
Aligns data to real-world entities
Helps structure databases and tables
WAYS TO CHECK DATA QUALITY
VALIDATION
Range
Lookup lists
Access/security
Compliance
Ownership
Responsibility
Formats
Consistenciy
Data type
Uniqueness
CROSS-CHECKING
Validation Check
Cross check agains some record
identify Inconsistencies
triangulation
AUDITING
Ensure data is fit for purpose
Content
Storage
Quality
Use
Management