Please enable JavaScript.
Coggle requires JavaScript to display documents.
basics of functional dependencies and normalization for RD(cha 14), RD…
basics of functional dependencies and normalization for RD(cha 14)
复习
relation schema
relational
database schema
attributes are grouped by
common sense
mapping a database schema design from a conceptual
data model such as the ER or enhanced-ER (EER) data model.
still need some formal way of analyzing why one grouping of attributes into a relation schema may be better than another.
measures of appropriateness or goodness
to measure the quality of the design
evaluating relational schemas for design quality
tells us why one set of groupings of attributes into relation schemas is better than another.
two levels
logical (or conceptual) level—how users interpret the relation schemas
and the meaning of their attributes.
enables users to understand clearly the meaning of the data in the relations, and hence to formulate their queries correctly.
schemas of both base relations
and views (virtual relations).
implementation (or physical storage) level—how the tuples in a base relation are stored and updated.
applies only to schemas of base relations
Informal Design Guidelines
for Relation Schemas
Reducing the redundant information in tuples
minimize the storage space used by the base relations (and hence the corresponding files).
Storing natural joins of base relations leads to an additional problem referred to as update anomalies
insertion anomalies
these guidelines may sometimes have to be violated in order to improve the performance of certain queries
一致性质量与效率的权衡
If EMP_DEPT is used as a stored relation (known otherwise as a materialized view)
the anomalies in
EMP_DEPT must be noted and accounted for
using triggers or
stored procedures that would make automatic updates
deletion anomalies
modification anomalies
Reducing the NULL values in tuples
waste space at the storage level
how to account for them when aggregate operations such as COUNT or SUM are applied
SELECT and JOIN operations involve comparisons; if NULL values are present, the results may become unpredictable
太对了
multiple interpretations
The attribute does not apply to this tuple. For example, Visa_status may not apply to U.S. students.
not applicable; applicable but absent; not know anything
The attribute value for this tuple is unknown. For example, the Date_of_birth
may be unknown for an employee.
The value is known but absent; that is, it has not been recorded yet.
avoid placing attributes in a base relation whose
values may frequently be NULL. If NULLs are unavoidable, make sure that they apply in exceptional cases only and do not apply to a majority of tuples in the relation.
Using space efficiently and avoiding joins with NULL values are the two overriding
criteria that determine whether to include the columns that may have NULLs in a
relation or to have a separate relation for those columns (with the appropriate key
columns).
Clear Semantics to Attributes in Relations
Do not combine attributes from multiple entity types and relationship types into a single relation
may be used as views, but they
cause problems when used as base relation
Disallowing the possibility of generating spurious tuples
spurious tuples because they represent spurious information that is not valid.
Decomposing EMP_PROJ into EMP_LOCS and EMP_PROJ1 is undesirable because when we JOIN them back using NATURAL JOIN, we do not get the correct original information
Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations because joining on such attributes may produce spurious tuples
Functional Dependencies
definition
functional dependency, denoted by X → Y, between two sets of
attributes X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is that, for any two
tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y].
The abbreviation for functional dependency is FD or f.d.
The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand side.
A functional dependency is a property of the semantics or meaning of the
attributes.
??
they relate to one another—to specify the functional
dependencies that should hold on all relation states (extensions) r of R.
an FD cannot be inferred automatically from a
given relation extension r
an FD a property of the relation schema R, not of a particular
legal relation state r of R.
All we can say is that a certain FD may exist if it holds in that particular extension.
1 more item...
Relation extensions r(R) that satisfy the functional dependency constraints are called legal relation states (or legal extensions) of R.
Certain FDs can be specified without referring to a specific relation, but as a property of those attributes given their commonly understood meaning
Normal Forms Based on Primary Keys
assume that a set of functional dependencies is given for each
relation, and that each relation has a designated primary key
combined with the tests (conditions) for normal forms
drives normalization process for relational schema design
get relations: approaches
Perform a conceptual schema design using a conceptual model such as ER or EER and map the conceptual design into a set of relations
Design the relations based on external knowledge derived from an existing implementation of files or forms or reports.
Normalization of Relations
based on a single analytical tool: the
functional dependencies among the attributes of a relation
不断拆分
Definition
The normal form of a relation refers to the highest normal form
condition that it meets, and hence indicates the degree to which it has been normalized.
Normal forms, when considered in isolation from other factors, do not guarantee a good database design
two properties
nonadditive join or lossless join property
spurious tuple generation problem
The nonadditive join property is extremely critical and must be achieved at any cost
dependency preservation property
???ch15
Practical Use of Normal Forms
need not normalize to the
highest possible normal form
for performance reasons
incurs the corresponding penalties of dealing with the anomalies
Denormalization
is the process of storing the join of higher normal
form relations as a base relation, which is in a lower normal form.
Definitions of Keys and Attributes Participating in Keys
superkey
uniquely identify tuples; can have redundant cols
key is a SK w/o redundant cols
candidate keys
1 primary key
others secondary keys
An attribute of relation schema R is called a
prime attribute
of R if
it is a member of some candidate key of R.
An attribute is called
nonprime
if it is not a prime attribute—that is, if it is not a member of any candidate key
First Normal Form
only attribute values permitted by 1NF are single atomic (or indivisible) values.
the domain of an attribute must include only atomic (simple, indivisible) values
that the value of any attribute in a tuple must be a single value from the domain of that attribute.
否则破坏functionally dependent
three main techniques to achieve first normal form
Remove the attribute Dlocations that violates 1NF and place it in a separate
relation DEPT_LOCATIONS along with the primary key Dnumber of
This decomposes the non-1NF relation into two 1NF relations
disallows multivalued attributes that are themselves composite
Decomposition and primary key propagation
1 more item...
Expand the key so that there will be a separate tuple in the original
DEPARTMENT relation for each location
disadvantage of introducing redundancy in
the relation and hence is rarely adopted
If a maximum number of values is known for the attribute—for example, if it
is known that at most three locations can exist for a department—replace the Dlocations attribute by three atomic attributes: Dlocation1, Dlocation2, and
Dlocation3
disadvantage of introducing NULL values if
most departments have fewer than three locations
introduces spurious semantics about the ordering among the location values; that ordering is not originally intended.
best to avoid this alternative.
Querying on this attribute becomes more
difficult
BLOB (binary large object) or CLOB (character large object) data type using SQL the object is treated as an atomic, single-valued attribute and hence it maintains the 1NF status of the relation
Second Normal Form
based on the concept of full functional dependency.
full functional dependency if removal of any
attribute A from X means that the dependency does not hold anymore
partial dependency if some attribute A ε X can be removed
from X and the dependency still holds
Definition. A relation schema R is in 2NF if every nonprime attribute A in R is
fully functionally dependent on the primary key of R.
If a relation schema is not in 2NF, it can be second normalized or 2NF normalized into a number of 2NF relations
nonprime attributes are associated only with the part of the primary key on which they are fully functionally dependent
Third Normal Form
Definition. According to Codd’s original definition, a relation schema R is in
3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary key
We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and ED2
A NATURAL JOIN operation on ED1 and ED2 will recover the original
relation EMP_DEPT without generating spurious tuples
based on the concept of transitive dependency
any functional dependency in which the
left-hand side is
part (a proper subset)
of the primary key, or any functional dependency in which the
left-hand side is a nonkey attribute
, is a problematic FD
cause the update
anomalies
???
General Definitions of Second
and Third Normal Forms
take all candidate keys of a relation into account.
not affect the definition of 1NF since it is independent of keys and functional depenencies
a general definition of prime attribute
an attribute that is part of any
candidate key will be considered as prime.
Partial and full functional dependencies and transitive dependencies will now be considered with respect to all candidate key
Definition. A relation schema R is in second normal form (2NF) if every
nonprime attribute A in R is not partially dependent on any key of R
A relation schema R is in 2NF if every nonprime attribute A
in R is fully functionally dependent on every key of R.
Definition. A relation schema R is in third normal form (3NF) if, whenever a
nontrivial functional dependency X → A holds in R, either (a) X is a superkey of R, or (b) A is a prime attribute of R.
LOTS1 violates 3NF because Price is transitively dependent on each of the
candidate keys of LOTS1 via the nonprime attribute Area
This general definition can be applied directly to test whether a relation schema is in 3NF; it does not have to go through 2NF first.
if a relation passes the general 3NF test, then it automatically passes the 2NF test.
就是说,所有的nonprime attributes依赖且完全依赖且无传递依赖全部的key;prime atributes则无所谓
It allows certain functional dependencies to slip through or escape in that they are OK with the 3NF definition
not “caught” by the 3NF definition even though they may be potentially problematic.
Boyce-Codd Normal Form
every relation in BCNF is also in 3NF;
however, a relation in 3NF is not necessarily in BCNF
most relation schemas that are in 3NF are also in BCNF
Only if there exists some f.d. X → A that holds in a relation schema R with X not being a superkey and A being a prime attribute will R be in 3NF but not in BCNF
Definition. A relation schema R is in BCNF if whenever a nontrivial functional
dependency X → A holds in R, then X is a superkey of R.
Decomposition of Relations not in BCNF
NJB (Nonadditive Join Test for Binary Decompositions)
NJB (Nonadditive Join Test for Binary Decompositions). A decomposition
D = {R1, R2} of R has the lossless (nonadditive) join property with respect to a set of functional dependencies F on R if and only if eithe
The FD ((R1 ∩ R2) → (R1 − R2)) is in F+
The FD ((R1 ∩ R2) → (R2 − R1)) is in F+
Let R be the relation not in BCNF, let X ⊆ R, and let X → A be the FD that
causes a violation of BCNF. R may be decomposed into two relations:
R –A
XA
If either R –A or XA. is not in BCNF, repeat the process.
Multivalued Dependency
and Fourth Normal Form
keep the relation
state consistent
avoid any spurious relationship between the two independent attributes
two independent 1:N relationships A:B and A:C are mixed in the same relation, R(A, B, C)
an MVD may arise.
all-key relation (with key made up of all attributes) and therefore has no f.d.’s and as such qualifies to be a BCNF relation
there is an obvious redundancy
Whenever X →→ Y holds, we say that X multidetermines Y. Because of the symmetry in the definition, whenever X →→ Y holds in R, so does X →→ Z. Hence, X →→ Y
implies X →→ Z and therefore it is sometimes written as X →→ Y|Z.
trivial MVD
nontrivial
MVD
in BCNF because no functional
dependencies hold
define a fourth normal form that
is stronger than BCNF
tend to be all-key relations
An MVD X →→ Y in R is called a trivial MVD if (a) Y is a subset of X, or (b) X ∪ Y = R
fourth normal form (4NF)
A relation schema R is in 4NF with respect to a set of dependencies
F (that includes functional dependencies and multivalued dependencies) if, for every nontrivial multivalued dependency X →→ Y in F+,21 X is a
superkey
for R
An all-key relation is always in BCNF since it has no FDs.
■ An all-key relation such as the EMP relation in Figure 14.15(a), which has no FDs but has the MVD Ename →→ Pname | Dname, is not in 4NF.
■ A relation that is not in 4NF due to a nontrivial MVD must be decomposed to convert it into a set of relations in 4NF.
■ The decomposition removes the redundancy caused by the MVD.
Join Dependencies and Fifth Normal Form
repeated binary decomposition during the process of normalization to achieve 1NF, 2NF, 3NF, and BCNF
These binary decompositions must obey the NJB property for which we introduced a test
chieving 4NF typically involves eliminating MVDs by repeated binary decompositions as well.
there may be a nonadditive join decomposition into
more than two relation schemas.
database design
bottom-up design methodology (also called
design by synthesis)
top-down design methodology (also called design by
analysis)
starts with a number of groupings of attributes into relations that exist
together naturally
The implicit goals of the design activity are information preservation and minimum redundancy.
Minimizing redundancy implies minimizing redundant storage of the same information and reducing the need for multiple updates to maintain consistency across multiple copies of the same information
information preservation
in terms of maintaining all concepts, including attribute types, entity types, and relationship types as well as generalization/specialization relationships, which are described using a model such as the EER model
RD design algorithms and further dependencies(char 15.1 15.2)
Inference Rules, Equivalence, and Minimal Cover
other dependencies can
be inferred or deduced from the FDs in F
Definition:
An FD X → Y is inferred from or implied
by a set of dependencies F specified on R if X → Y holds in every legal relation state r of R; that is, whenever r satisfies all the dependencies in F, X → Y also holds in r.
closure
Definition. Formally, the set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F; it is denoted by F+
inference rules??
Armstrong’s axioms.
IR1 (reflexive rule)2: If X ⊇ Y, then X →Y.
Formally, a functional dependency X → Y is trivial if
X ⊇ Y; otherwise, it is nontrivial.
IR2 (augmentation rule)3: {X → Y} |=XZ → YZ.
IR3 (transitive rule): {X → Y, Y → Z} |=X → Z
corolary
IR4 (decomposition, or projective, rule): {X → YZ} |=X → Y.
IR5 (union, or additive, rule): {X → Y, X → Z} |=X → YZ.
IR6 (pseudotransitive rule): {X → Y, WY → Z} |=WX → Z
F |=X → Y to denote that the functional dependency X → Y is inferred from the set of functional dependencies F.
practice
then IR1, IR2, and IR3 are used to infer additional functional dependencies that will also hold on R
A systematic way to determine these additional functional dependencies is first to determine each set of attributes X that appears as a left-hand side of some functional dependency in F
then to determine the set of all attributes that are dependent on X.
Definition. For each such set of attributes X, we determine the set X+ of attributes that are functionally determined by X based on F; X+ is called th
e closure of X under F
Typically, database designers first specify the set of functional dependencies F that can easily be determined from the semantics of the attributes of R;
Minimal Sets of Functional Dependencies
Definition: An attribute in a functional dependency is considered an extraneous attribute if we can remove it without changing the closure of the set of dependencies.
Definition. A minimal cover of a set of functional dependencies E is a minimal set of dependencies (in the standard canonical form5 and without redundancy) that is equivalent to E. We can always find at least one minimal cover F for any set of dependencies E
Properties of Relational Decompositions
Relation Decomposition and Insufficiency of Normal Forms
make sure that each attribute in R will appear in at least one relation
schema Ri in the decomposition so that no attributes are lost
attribute preservation condition of a decomposition.
Another goal is to have each indi
vidual relation Ri in the decomposition D be in BCNF or 3NF
not sufficient to guarantee a good database design on its own
Dependency Preservation Property
of a Decomposition
Claim 1. It is always possible to find a dependency-preserving decomposition
D with respect to F such that each relation Ri in D is in 3NF
??
Nonadditive (Lossless) Join Property
of a Decomposition
no spurious tuples are generated when a NATURAL
JOIN operation is applied to the relations resulting from the decomposition
Once a row consists only of a symbols, we conclude that the decomposition has the nonadditive join property, and we can stop applying the functional dependencies
不会增加spurious tuples