Please enable JavaScript.
Coggle requires JavaScript to display documents.
Database (def ((bad design database (insertion anomaly (can't insert…
Database
def
-
-
ACID
-
consistent
ensure that after any transaction is made on the database, the database will still be in a valid or consistent state
ensure consistency
Add primary keys, and foreign keys
-
-
-
-
-
-
-
durable
ensure that once a transaction has been committed, the changes will remain in the database
ACID model is a way to increase data integrity which refer to the accuracy and consistency of data stored in a database.
-
bad design database
insertion anomaly
-
-
ex: if customer_id and order_ird in the same table, customer need to order to have an order_id so that someone can insert that customer_id with that order_id into a row. However, business logic allow to add customer without ordering something.
anomaly
problems that can occur in poorly planned, un-normalised databases where all the data is stored in one table
-
-
update anomaly
since same information is recorded in multiple rows, there are multiple updates that need to be made.
if these update are not successfully completed across all rows, then an inconsistency occurs.
-
-
-
DDL
data definition language
-
DDL commands: Create, Alter, Drop
-
Database Engine
underlying software the DBMS utilizes to allow us to perform CRUD operation though API provided by most DBMS
Database Server
A computer that is dedicated to running database server software and stores large amount of data is a server
-
Keys
-
-
Composite Key
A set of two or more columns that can uniquely identify a row, but the columns by themselves cannot uniquely identify a row
-
join
A SQL join clause combines columns fromm one or more tables in a relational database. It creates a set that can be saved as a table or used as it is.A JOIN is a mean for combining columns form one or more tables y using values common to each
-
data integrity
refers to the accuracy and consistency of data stored in a database, data warehouse, data
-
MongoDB
Operation
create
insertOne
insertOne({"title":"Rocky", "year":"1976"})
insertMany
ordered
-
when hit err, it will stop inserting
unordered
by adding ,{ ordered: false })
when hit err, it will skip
insertMany( [ { "imdb":"tt0081"},{"imdb":"tt21551"} ] )
-
-
same content, different _id insertion
-
read
exact
one key
one key, value is object, object.items == 10
-
-
one key, value is array, array contain "Ryan"
-
one key, value is array, matching array
find({actors: ["Ryan","Lee"]})
-
-
multiple key
find({student_id: 4, type: "homework"})
query selectors
comparation
-
-
-
-
-
-
db.test.find({ runtime: { $gte: 90, $lte: 120 } }
-
-
-
logical
$or
find( { $or : [ {quantity: { $lt: 20 } }, {price : 10 } ] } )
-
-
-
-
-
-
-
update
updateOne
db.collection.updateOne( <filter> , <update> , <options> )
db.test.updateOne( { item: "paper" }, { $set : { "size":"cm", status: "p" } , $currentDate : {lastModified: true } } )
updateMany
db.collection.updateMany( <filter> , <update> , <options> )
db.test.updateMany( { "qty" : { $lt : 50 } }, { $set : { "size":"cm", status: "p" } , $currentDate : {lastModified: true } } )
-
-
-
SQL
-
SYNTAX
-
-
create data
INSERT INTO table_name
-
VALUES ('Trung', 30, 'CTO')
read data
-
-
WHERE (condition)
-
operator
-
-
-
-
-
-
-
IN
WHERE first_nam IN ('Jon','Shan','Laura')
-
-
-
-
-
Design and Performance
indexing
-
create another data structure which holds the field value, and pointer to the record it relates to. This index structure is then sorted, allowing Binrary Searches to be performed on it.
-
When you insert, modify, or delete data in indexed columns, the DBMS must update the index as well as the base table.
This may slow down data modification operations,
especially if the tables have a lot of rows.
-
Good practice
create indexes for foreign keys and for other columns that are used frequently for queries that apply logical criteria to data
If you find that update speed is severely affected, you may choose at a later time to delete some of the indexes you created.
avoid indexes on columns that contain nondiscriminatory data.
Nondiscriminatory columns have only a few values throughout the entire table, such as Boolean columns that contain only true and false
clustering
-
-
disk page of data are minimal size unit that disk read data. Cluster allow you to get data that are often stored on the same disk page to speed up data access.
rows in a table maybe scattered across several disk pages, but matching primary and foreign keys are usally on the same page/
To define the cluster, you specify a column or columns on which the DBMS should form the cluster and the tables that should be included
Because clustering involves physical placement of data in a file, a table can be clustered on only one column or combination of columns
clustering can slow down the performance of operations that require a scan of the entire table because clustering may mean that the rows of any given table are scattered throughout many disk pages.
-
-
partitioning
-
involves the splitting of larges tables into smaller ones so that the DBMS does not need to retrieve as much data at any one time
-
vertical partitioning
-
Benefit: the rows in the smaller table will be physically closer together, the smaller table will take up fewer disk pages and this support faster retrieval
drawback: query that require data from both tables must join tables and this is a relatively slow operation
Relational database
Normalization
-
-
-
Goal of normalization
Free the database of modification anomalies where you could update email address in one row, but not update in another and leave it inconsistent within database
-
Avoid bias toward any particular access pattern ( only matter in relational database, in mongodb if you not bias, you are bad on all of them )
-
-
Design support
DFD ( dataflow diagram)
-
component
external entity
outside system that send or receives data, communicating with the system being diagrammed
process
any process that change the data, sort or receive data, producing an output
-
data flow
the route that data takes between the external entities, processes and data stores
Design process
-
create level 1 DFD
-
-
how the operation read, retrieve data from data base
SQL vs noSQL
SQL
relational make heavy use of indexing for performance. On very large database, this can have an adverse affect on performance or even reduce the optimal number of rows per DB/Shard requiring more hard ware
-
If you don't have lots of foreign relationships, relational databases might not be ideal (document storage)
If you are need to use aggregate functions and feel the need to query data in complex ways that can not be achieved through embeds or simple relations in Mongo, that when you know it'time to use a relational database
-
noSQL
no rigid schema, if you need a certain table to have an extra field, you just change it
Databases like MongoDB are great when you usually know where your data is. This is great if, for example, you have Post and Comments and generally, you aren't going to be displaying comments outside the context of a post, so it makes sense that comments be contained within a post( that way you get all the comments for the post without needing to query a separate table)
easy to partition but often what is called eventual consistency. If you are building a messaging site that ok, a bank probably not
-