Please enable JavaScript.

Coggle requires JavaScript to display documents.

basics of functional dependencies and normalization for RD(cha 14), RD…

- - - - measures of appropriateness or goodness
        to measure the quality of the design
        
        evaluating relational schemas for design quality
        
        tells us why one set of groupings of attributes into relation schemas is better than another.
        
        two levels
        
        logical (or conceptual) level—how users interpret the relation schemas
        and the meaning of their attributes.
        
        enables users to understand clearly the meaning of the data in the relations, and hence to formulate their queries correctly.
        
        schemas of both base relations
        and views (virtual relations).
        
        implementation (or physical storage) level—how the tuples in a base relation are stored and updated.
        
        applies only to schemas of base relations
- - - - minimize the storage space used by the base relations (and hence the corresponding files).
      - Storing natural joins of base relations leads to an additional problem referred to as update anomalies
        
        insertion anomalies
        
        these guidelines may sometimes have to be violated in order to improve the performance of certain queries
        
        一致性质量与效率的权衡
        
        If EMP_DEPT is used as a stored relation (known otherwise as a materialized view)
        
        the anomalies in
        EMP_DEPT must be noted and accounted for
        
        using triggers or
        stored procedures that would make automatic updates
        
        deletion anomalies
        
        modification anomalies
    - - waste space at the storage level
      - how to account for them when aggregate operations such as COUNT or SUM are applied
      - SELECT and JOIN operations involve comparisons; if NULL values are present, the results may become unpredictable
        
        太对了
      - multiple interpretations
        
        The attribute does not apply to this tuple. For example, Visa_status may not apply to U.S. students.
        
        not applicable; applicable but absent; not know anything
        
        The attribute value for this tuple is unknown. For example, the Date_of_birth
        may be unknown for an employee.
        
        The value is known but absent; that is, it has not been recorded yet.
      - avoid placing attributes in a base relation whose
        values may frequently be NULL. If NULLs are unavoidable, make sure that they apply in exceptional cases only and do not apply to a majority of tuples in the relation.
        
        Using space efficiently and avoiding joins with NULL values are the two overriding
        criteria that determine whether to include the columns that may have NULLs in a
        
        relation or to have a separate relation for those columns (with the appropriate key
        columns).
    - - Do not combine attributes from multiple entity types and relationship types into a single relation
        
        may be used as views, but they
        cause problems when used as base relation
    - - spurious tuples because they represent spurious information that is not valid.
        
        Decomposing EMP_PROJ into EMP_LOCS and EMP_PROJ1 is undesirable because when we JOIN them back using NATURAL JOIN, we do not get the correct original information
      - Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations because joining on such attributes may produce spurious tuples
  - - - functional dependency, denoted by X → Y, between two sets of
        attributes X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is that, for any two
        tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y].
        
        The abbreviation for functional dependency is FD or f.d.
        
        The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand side.
        
        A functional dependency is a property of the semantics or meaning of the
        attributes.
        
        ？？
        
        they relate to one another—to specify the functional
        dependencies that should hold on all relation states (extensions) r of R.
        
        an FD cannot be inferred automatically from a
        given relation extension r
        
        an FD a property of the relation schema R, not of a particular
        legal relation state r of R.
        
        All we can say is that a certain FD may exist if it holds in that particular extension.
        
        1 more item...
        
        Relation extensions r(R) that satisfy the functional dependency constraints are called legal relation states (or legal extensions) of R.
        
        Certain FDs can be specified without referring to a specific relation, but as a property of those attributes given their commonly understood meaning
  - - - combined with the tests (conditions) for normal forms
        
        drives normalization process for relational schema design
      - get relations: approaches
        
        Perform a conceptual schema design using a conceptual model such as ER or EER and map the conceptual design into a set of relations
        
        Design the relations based on external knowledge derived from an existing implementation of files or forms or reports.
    - - Normalization of Relations
        
        based on a single analytical tool: the
        functional dependencies among the attributes of a relation
        
        不断拆分
        
        Definition
        
        The normal form of a relation refers to the highest normal form
        condition that it meets, and hence indicates the degree to which it has been normalized.
        
        Normal forms, when considered in isolation from other factors, do not guarantee a good database design
        
        two properties
        
        nonadditive join or lossless join property
        
        spurious tuple generation problem
        
        The nonadditive join property is extremely critical and must be achieved at any cost
        
        dependency preservation property
        
        ？？？ch15
      - Practical Use of Normal Forms
        
        need not normalize to the
        highest possible normal form
        
        for performance reasons
        
        incurs the corresponding penalties of dealing with the anomalies
        
        Denormalization is the process of storing the join of higher normal
        form relations as a base relation, which is in a lower normal form.
      - Definitions of Keys and Attributes Participating in Keys
        
        superkey
        
        uniquely identify tuples; can have redundant cols
        
        key is a SK w/o redundant cols
        
        candidate keys
        
        1 primary key
        
        others secondary keys
        
        An attribute of relation schema R is called a prime attribute of R if
        it is a member of some candidate key of R.
        
        An attribute is called nonprime if it is not a prime attribute—that is, if it is not a member of any candidate key
      - First Normal Form
        
        only attribute values permitted by 1NF are single atomic (or indivisible) values.
        
        the domain of an attribute must include only atomic (simple, indivisible) values
        
        that the value of any attribute in a tuple must be a single value from the domain of that attribute.
        
        否则破坏functionally dependent
        
        three main techniques to achieve first normal form
        
        Remove the attribute Dlocations that violates 1NF and place it in a separate
        relation DEPT_LOCATIONS along with the primary key Dnumber of
        
        This decomposes the non-1NF relation into two 1NF relations
        
        disallows multivalued attributes that are themselves composite
        
        Decomposition and primary key propagation
        
        1 more item...
        
        Expand the key so that there will be a separate tuple in the original
        DEPARTMENT relation for each location
        
        disadvantage of introducing redundancy in
        the relation and hence is rarely adopted
        
        If a maximum number of values is known for the attribute—for example, if it
        is known that at most three locations can exist for a department—replace the Dlocations attribute by three atomic attributes: Dlocation1, Dlocation2, and
        Dlocation3
        
        disadvantage of introducing NULL values if
        most departments have fewer than three locations
        
        introduces spurious semantics about the ordering among the location values; that ordering is not originally intended.
        
        best to avoid this alternative.
        
        Querying on this attribute becomes more
        difficult
        
        BLOB (binary large object) or CLOB (character large object) data type using SQL the object is treated as an atomic, single-valued attribute and hence it maintains the 1NF status of the relation
      - Second Normal Form
        
        based on the concept of full functional dependency.
        
        full functional dependency if removal of any
        attribute A from X means that the dependency does not hold anymore
        
        partial dependency if some attribute A ε X can be removed
        from X and the dependency still holds
        
        Definition. A relation schema R is in 2NF if every nonprime attribute A in R is
        fully functionally dependent on the primary key of R.
        
        If a relation schema is not in 2NF, it can be second normalized or 2NF normalized into a number of 2NF relations
        
        nonprime attributes are associated only with the part of the primary key on which they are fully functionally dependent
      - Third Normal Form
        
        Definition. According to Codd’s original definition, a relation schema R is in
        3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary key
        
        We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and ED2
        
        A NATURAL JOIN operation on ED1 and ED2 will recover the original
        relation EMP_DEPT without generating spurious tuples
        
        based on the concept of transitive dependency
      - any functional dependency in which the left-hand side is
        part (a proper subset) of the primary key, or any functional dependency in which the left-hand side is a nonkey attribute, is a problematic FD
        
        cause the update
        anomalies
        
        ？？？
  - - - not affect the definition of 1NF since it is independent of keys and functional depenencies
    - - an attribute that is part of any
        candidate key will be considered as prime.
    - - A relation schema R is in 2NF if every nonprime attribute A
        in R is fully functionally dependent on every key of R.
    - - LOTS1 violates 3NF because Price is transitively dependent on each of the
        candidate keys of LOTS1 via the nonprime attribute Area
      - This general definition can be applied directly to test whether a relation schema is in 3NF; it does not have to go through 2NF first.
        
        if a relation passes the general 3NF test, then it automatically passes the 2NF test.
        
        就是说，所有的nonprime attributes依赖且完全依赖且无传递依赖全部的key；prime atributes则无所谓
        
        It allows certain functional dependencies to slip through or escape in that they are OK with the 3NF definition
        
        not “caught” by the 3NF definition even though they may be potentially problematic.
    - - every relation in BCNF is also in 3NF;
        however, a relation in 3NF is not necessarily in BCNF
        
        most relation schemas that are in 3NF are also in BCNF
        
        Only if there exists some f.d. X → A that holds in a relation schema R with X not being a superkey and A being a prime attribute will R be in 3NF but not in BCNF
      - Definition. A relation schema R is in BCNF if whenever a nontrivial functional
        dependency X → A holds in R, then X is a superkey of R.
      - Decomposition of Relations not in BCNF
        
        NJB (Nonadditive Join Test for Binary Decompositions)
        
        NJB (Nonadditive Join Test for Binary Decompositions). A decomposition
        D = {R1, R2} of R has the lossless (nonadditive) join property with respect to a set of functional dependencies F on R if and only if eithe
        
        The FD ((R1 ∩ R2) → (R1 − R2)) is in F+
        
        The FD ((R1 ∩ R2) → (R2 − R1)) is in F+
        
        Let R be the relation not in BCNF, let X ⊆ R, and let X → A be the FD that
        causes a violation of BCNF. R may be decomposed into two relations:
        R –A
        XA
        
        If either R –A or XA. is not in BCNF, repeat the process.
  - - - avoid any spurious relationship between the two independent attributes
        
        two independent 1:N relationships A:B and A:C are mixed in the same relation, R(A, B, C)
        
        an MVD may arise.
        
        all-key relation (with key made up of all attributes) and therefore has no f.d.’s and as such qualifies to be a BCNF relation
        
        there is an obvious redundancy
        
        Whenever X →→ Y holds, we say that X multidetermines Y. Because of the symmetry in the definition, whenever X →→ Y holds in R, so does X →→ Z. Hence, X →→ Y
        implies X →→ Z and therefore it is sometimes written as X →→ Y|Z.
    - - nontrivial
        MVD
        
        in BCNF because no functional
        dependencies hold
        
        define a fourth normal form that
        is stronger than BCNF
      - tend to be all-key relations
        
        An MVD X →→ Y in R is called a trivial MVD if (a) Y is a subset of X, or (b) X ∪ Y = R
    - - A relation schema R is in 4NF with respect to a set of dependencies
        F (that includes functional dependencies and multivalued dependencies) if, for every nontrivial multivalued dependency X →→ Y in F+,21 X is a superkey for R
        
        An all-key relation is always in BCNF since it has no FDs.
        
        ■ An all-key relation such as the EMP relation in Figure 14.15(a), which has no FDs but has the MVD Ename →→ Pname | Dname, is not in 4NF.
        
        ■ A relation that is not in 4NF due to a nontrivial MVD must be decomposed to convert it into a set of relations in 4NF.
        
        ■ The decomposition removes the redundancy caused by the MVD.
  - - - These binary decompositions must obey the NJB property for which we introduced a test
        
        chieving 4NF typically involves eliminating MVDs by repeated binary decompositions as well.
- - - - closure
        
        Definition. Formally, the set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F; it is denoted by F+
        
        inference rules？？
        
        Armstrong’s axioms.
        
        IR1 (reflexive rule)2: If X ⊇ Y, then X →Y.
        
        Formally, a functional dependency X → Y is trivial if
        X ⊇ Y; otherwise, it is nontrivial.
        
        IR2 (augmentation rule)3: {X → Y} |=XZ → YZ.
        
        IR3 (transitive rule): {X → Y, Y → Z} |=X → Z
        
        corolary
        
        IR4 (decomposition, or projective, rule): {X → YZ} |=X → Y.
        
        IR5 (union, or additive, rule): {X → Y, X → Z} |=X → YZ.
        
        IR6 (pseudotransitive rule): {X → Y, WY → Z} |=WX → Z
        
        F |=X → Y to denote that the functional dependency X → Y is inferred from the set of functional dependencies F.
        
        practice
        
        then IR1, IR2, and IR3 are used to infer additional functional dependencies that will also hold on R
        
        A systematic way to determine these additional functional dependencies is first to determine each set of attributes X that appears as a left-hand side of some functional dependency in F
        
        then to determine the set of all attributes that are dependent on X.
        
        Definition. For each such set of attributes X, we determine the set X+ of attributes that are functionally determined by X based on F; X+ is called the closure of X under F
        
        Typically, database designers first specify the set of functional dependencies F that can easily be determined from the semantics of the attributes of R;
      - Minimal Sets of Functional Dependencies
        
        Definition: An attribute in a functional dependency is considered an extraneous attribute if we can remove it without changing the closure of the set of dependencies.
        
        Definition. A minimal cover of a set of functional dependencies E is a minimal set of dependencies (in the standard canonical form5 and without redundancy) that is equivalent to E. We can always find at least one minimal cover F for any set of dependencies E
- - - - attribute preservation condition of a decomposition.
    - - not sufficient to guarantee a good database design on its own
  - - - ??