Please enable JavaScript.
Coggle requires JavaScript to display documents.
CMPE 226 Final (No SQL (RDBMS vs NoSQL (type of data: structure vs semi…
CMPE 226 Final
No SQL
CAP
-
-
-
三者只能选二,CA: mysql, AP: AWS Dynamo, CP: MongoDB
Eventual consistency
when no updates occurs for a long time, eventually all updates will propagate through the system and all nodes be consistent, used by most noSQL
-
BASE: basic available, soft state, eventual consistency
-
RDBMS vs NoSQL
-
-
-
Pros: C, Txn, joins vs schema-less, scalability, availability
Cons: scalability, availability vs join, no transaction
HBase: strong consistent reads and writes, not eventual consistency, all operations are at low level, auto-sharding, master node
-
MongoDB: ACID at document level, replica sets, auto sharding(range-based, hash-based, user-defined)
concurrency control
2 phase locking
locking: new locks acquired, including upgrade, no release
unlocking: existing locks released including downgrade, no lock can be acquired
-
-
deadlock: each txn in a set of 2 or more txns awaits for some item locked by other txn, solution: wait-die, wound-wait, always aborts the younger one and restart with the same timestamp
starvation: when a txn consistently awaits or restart and never get a chance to proceed further, wait-die and wound-wait guarantee no starvation
basic Timestamp ordering
T issue write:
if read_TS(X) > TS(T) or write_TS(X) > TS(T), abort and rolllback,
restart a new T with a new TS, else write X and set write_TS(X) to TS(T)
T issue read:
if write_TS(X) > TS(T), abort and rollback, restart a new T with a new TS, else read X and set read_TS(X) to max(current read_TS(X), TS(T))
-
-
distributed databases
replication: multiple copies of data, stored in different sites, for faster retrieval and fault tolerance
Pros: availability, parallelism, reduced data transfer
Cons: increased cost of update, increased complexity of concurrency control
-
data transparency: fragmentation, replication, locality,
distributed transaction
txn coordinator: start execution originate at site, distribute txns, coordinate termination
txn manager: maintain log for recovery, coordinate concurrency txn execution
2 phase commit
-
upon receive, if yes, log ready and send back ready; otherwise log no and send back abort
upon receive, if all ready, Ci log commit and send back commit decision, if not, Ci log abort and send back abort decision.
-
2 PC failure
site failure
fails before send ready, Ci assumes abort
fails after send ready, Ci ignores it, because sites have its log to decide
-
-
ready, consult Ci
Ci decide commit, redo, log commit
Ci decide abort, undo, log abort
-
-
-
View, Index, Procedure, Trigger
Views
single table derived from other tables, virtual table, not necessarily has persistent data in physical form
why? restrict column use, hide query details, restrict insert/update with check option, provide backward compatible interface
-
stored procedure
segment of declarative SQL statements stored in database catalog, has in and out parameters, flow control, under schema
Pros: reusability of code, reduced traffic, stronger security, easier maintenance
Cons: learning curve, difficult to migrate to different DBMS
triggers
-
pros: cascade changes, guard against incorrect changes, central enforcement rules
-
Project 1
AWS DynamoDB
always-on, high availability, scalability, no ACID guarantee, availability > consistency
-
Spanner: google
high consistency, availability, SQL, horizontal scale, 2PL, evolution on bigtable with temporal multiversion store
BigTable
high availability, partition tolerance, high scale, for PB of data
fast indexing, bloom filter
row keys, column families, timestamps
-