Please enable JavaScript.
Coggle requires JavaScript to display documents.
Databases and Data Management - Coggle Diagram
Databases and Data Management
Normalization
Goals of Normalization
Avoid redundancies
Data Redundancy
: occurs when the same piece of data exists in multiple places
Maintain consistency
Data inconsistency
: the same data exists in different formats in multiple tables
Avoid anomalies
Anomalies
Insert
: inability to add data to the database due to absence of other data
Delete
: unintended loss of data due to deletion of other data
Update
: data inconsistency that results from data redundancy and a partial update
Select
:
1NF
Conditions
Values are
atomic
Rows are
unique
Order of columns is insignificant
Each column has a unique name
Can have insert, delete, update and select anomaly
2NF
Conditions
Table must be in first NF
Every non-key column must be
fully functionally dependent
on the entire primary key
Can have insert, delete and update anomaly
3NF
Table must be in second NF
Every non-key column must be non-transitively dependent on the primary key (no functional dependencies between non-key columns)
NoSQL vs SQL
How NoSQL addresses shortcomings of relational databases
Predefined schema (normalized)
difficult to support processing of unstructured data
e.g. cannot add a field to a small no. of records, have to add it for the whole table
however, can perform complex queries using JOINs!
Does not usually support hierarchical data storage
hierarchical data storage: less frequently-used data is moved to cheaper, slower storage devices
therefore cost of storing data in relational DB is more expensive
Vertically Scalable
improving performance of relational DB server requires upgrading existing server with faster processors & more memory (depends on capacity of machine)
NoSQL DB (horizontally scalable) can be improved by increasing the number of servers (relatively cheaper)
Stored in a server
database unavailable when the server fails
NoSQL DB designed to take advantage of multiple servers (if one server fails, other servers can continue to support)
Differences between relational databases & NoSQL databases
Relational: fixed, predefined schema
NoSQL: no predefined schema
Relational: contain tables, fixed data types for each field
NoSQL: MongoDB, flexible data types
Relational: data represented in tables and rows
NoSQL: collections of documents
Relational: JOINs used to get data across tables
NoSQL: no such joins, cannot do complex queries
Applications of SQL and NoSQL databases
SQL
should be used if:
data has fixed schema
complex and varied queries frequently performed
atomicity, consistency, isolation & durability (ACID) properties are critical to the database
will have a high number of simultaneous transactions
e.g. banking, order processing, flight reservations
NoSQL
should be used if:
data has a dynamic schema (unstructured with flexible data types)
data storage needs to be performed quickly
there will be a very large amount of data
e.g. social apps, learning machines, archiving data
Features of a Database
Data Integrity:
ensures data is
consistent
(all records get updated) and
accurate
(they get updated correctly
Data Validation:
rules
on how the data should be formatted (length, format)
Data Verification:
comparing 2 pieces of data (input data and actual source)
Concurrent Access
uses locking mechanism
especially useful when using/creating web app
Data Security
Access Control (WHO can access WHAT)
Backup
for recovery should you
delete it accidentally
weekly full backup and daily differential backup
Archive
for space management & long term retention
media used for archiving is relatively cheaper than that for backup
Centralised management and use of schema to present different view to end users ensures a more secure system
Data Redundancy
DDL/DML
allows CRUD operations to be performed using a standardised language
Attributes
Table
Record
Field
Keys
Primary
Secondary
Foreign
Composite
Personal Data Protection Act (PDPA)
Personal Data:
data, whether true or not, about an individual who can be identified from that data; or from that data and other info to which the organisation has access to
Need for privacy and integrity of data
Data Privacy:
requirement for data to be accessed or disclosed to authorised persons only
What is the PDPA?
A data protection law comprising various rules that govern the collection, use, disclosure and care of personal data.
It recognises both
rights of individuals to protect their personal data (including rights of access & correction)
needs of organisations to collect, use or disclose personal data for legitimate and reasonable purposes
Consent:
org must get indiv's knowledge & consent to use their data
Notification:
org must inform indiv of purposes of using their data
Appropriateness:
org may use personal data only for purposes that would be considered appropriate to a reasonable person under the given circumstances
Accountability:
orgs must make info about their personal data protection policies available on request, as well as the info of the representatives responsible for answering questions related to the org's use of data