Please enable JavaScript.
Coggle requires JavaScript to display documents.
2.1 - Data Storage and Querying (Indexes (Primary Index records…
2.1 - Data Storage and Querying
Disk Assembly
Access time
= seek time + rotational delay + transfer
time
File Organization
Unordered
Serial or Heap
Insertion
- FAST
Retrieval
- SLOW
Update/Delete
- SLOW
Hashed or Direct or Random
Insertion
- FAST
Retrieval
- FAST
Update/Delete
- FAST
Ordered
Sequential (SAM)
Insertion
- SLOW
Retrieval
- SLOW
Update/Delete
- SLOW
Indexed Sequential (ISAM)
Insertion
- MEDIUM
Retrieval
- FAST
Update/Delete
- MEDIUM
Query Optimization
1. Execution Strategy
A plan for retrieving the result of the
query from the internal database files
2. Query Code Generator
Generates the
code to execute the plan.
3. Run-time Database Processor
Runs the query
code (compiled or interpreted) to produce the query
result
Searching Algorithm Examples
Linear search (brute force)
Binary search
Primary index or hash key
Clustering index
Using secondary index (B-tree) on an equality comparison
Indexes
Primary Index
records ordered on a key field
One index per block
Sparse Index
Clustering Index
Ordered on a non key field
Separate index records for each distinct value
Sparse Index (If no duplicates, then Dense Index)
B+ Tree Index
secondary index on equality comparison
In a B + -tree, data pointers are stored only at the leaf nodes of the tree; hence, the structure of leaf nodes differs from the structure of internal nodes.The leaf nodes have an entry for every value of the search field,along with a data pointer to the record (or to the block that contains this record) if the search field is a key field
Secondary Index
Secondary means of accessing a data file for which
some primary access already exists
Either Dense or Sparse Index
Multi Level Indexes
If the first index uses more than 1 block of storage another index can be created for the base index with indexes for each block of the first index
Can have 2 or more levels
Query Optimization
Query Optimization Techniques
Systematic Estimating
Calculates cost the different execution strategies and to choose the execution plan with the lowest cost estimate.
Cost Estimation
Cost Components
Access cost to secondary storage
Storage cost
Computation cost
Memory usage cost
Communication cost
Cost Functions
Heuristics
typically reorder the operations in a query tree or determine an order for executing the operations specified by a query graph.
Transformation rules for relational
algebra operations
2. commutative of σ (Selection)
3. commutative of σ with π
4. commutative of σ with X (Cartesian Product) or ⋈ (join)
5. commuting σ with set operations
1. cascade of σ (Selection)