Please enable JavaScript.

Coggle requires JavaScript to display documents.

Data Mining - Coggle Diagram

- - - - Initial centroids are often chosen randomly
      - The centroid is (typically) the mean of the points in the cluster.
      - ‘Closeness’ is measured by Euclidean distance, cosine similarity,
        correlation, etc
      - K-means will converge for common similarity measures
        mentioned above
      - Most of the convergence happens in the first few iterations
      - Complexity is O( n K I * d ) n = number of points, K = number of clusters, I = number of iterations, d = number of attribute
    - - Most common measure is Sum of Squared Error (SSE)
      - Given two clusters, we can choose the one with the smallest error
      - Given two clusters, we can choose the one with the smallest error
    - - PROBLEM : If there are K ‘real’ clusters then the chance of selecting one
        centroid from each cluster is small.
      - SOLUTIONS
        
        Multiple runs
        
        Sample and use hierarchical clustering to determine initial centroids
        
        Select more than k initial centroids and then select among these initial centroids
        
        Bisecting K-means
        
        The bisecting K-means algorithm is a straightforward extension of the basic k-means algorithm
        
        Bisecting K-means has less trouble with initialization because it performs several trial bisections
        
        clustering results are refined using the centroids from bisecting k-means as initial centroids for the basic k-means algorithm
    - - Simple and can be used for a wide variety of data types
      - Quite efficient, even though multiple runs are often performed
      - Some variants like bisecting K-means are less susceptible to initialization problems
    - - Cannot handle non-globular clusters or clusters of different sizes and densities
      - Has trouble clustering data that has outliers
  - - - Details
        
        There are two basic approaches
        
        Aglomerative
        
        Start with the points as individual clusters and at each step,
        
        merge the closest pair of clusters. This requires defining a notion of cluster
        
        proximity
        
        Divisive
        
        Start with one, all inclusive cluster and at each step, split a cluster until
        
        only singleton clusters of individual points remain. In this case, we need to decide
        
        which clusters to split at each stage and how to do the splitting.
        
        A hierarchical clustering is often displayed graphically using a tree-like diagram called
        the dendrogram,
        
        Dendogram : displays both the cluster-subcluster relationships and the order in which the clusters were merged (agglomerative) or split (divisive)
        
        Produces a set of nested clusters organized as a
        hierarchical tree
      - Strengths
        
        Do not have to assume any particular number of
        clusters
        
        Any desired number of clusters can be obtained by
        ‘cutting’ the dendogram at the proper level
        
        They may correspond to meaningful taxonomies
        
        Example in biological sciences (e.g., animal kingdom,
        phylogeny reconstruction, …)
        
        These algorithms are used as the underlying application
        
        Some studies suggest that these algorithms can produce better qualify cluster
      - Aglomerative clustering algorithm
        
        More popular hierarchical clustering technique
        
        Basic algorithm is straightforward
        
        Key operation is the computation of the proximity of two
        clusters
        
        MIN
        
        single link
        
        defines cluster proximity as the proximity between the closest two points that are in
        
        different clusters or using graph terms, the shortest edge between two nodes in different
        
        subsets of nodes
        
        Strength
        
        Can handle non-elliptical shapes
        
        Limitation
        
        Sensitive to noise and outliers
        
        MAX
        
        complete link
        
        takes the proximity between the farthest two points in different clusters to be the cluster
        
        proximity or using graph terms, the longest edge between two nodes in different subsets of
        
        nodes
        
        Strength
        
        Less susceptible to noise and outliers
        
        Limitation
        
        Tends to break large clusters
        
        Biased towards globular clusters
        
        AVERAGE
        
        defines cluster proximity to be the average pairwise proximities (average
        length of edges) of all pairs of points from different clusters
        
        Strength
        
        Compromise between single and complete links
        
        Less susceptible to noise and outliers
        
        Limitation
        
        Biased towards globular clusters
        
        Distance Between Centroids
        
        Other methods driven by an objective
        function
      - Limitations
        
        High computation and storage requirement
        
        One a decision is made to combine two clusters, it cannot be undone
        
        No objective function is directly minimized
        
        Sensitive to noise and outliers
        
        Difficult gandling different sized cluster and convex shapes
        
        Breaking large clusters
  - - - Core points
      - Border points
      - Noise points
    - - Uses a density based definition of a cluster – relatively resistant to noise and can handle
        clusters of arbitrary shapes and sizes
    - - Has trouble when the clusters have widely varying densities
      - Has trouble with high dimensional data, as density (distance) is more difficult to define for
        such data
      - Can be expensive when the computation of nearest neighbors requires computing all
        pairwise proximities, as is usually the case for high-dimensional data
  - - - Accuracy
      - Precision
      - Recall
      - Purity
      - Entropy
        
        The degree to which each cluster consists of objects of a
        single class.
        
        Higger value: denotes poor clustering result
        
        Lower values: denotes better clustering result
    - - Cluster Cohesion
        
        How closely related object are in a cluster
      - Cluster separation
        
        How distinct or well-separated is a cluster from other cluisters
      - SSE
  - - - Partitional clustering
        
        A division of data objects into non-overlapping subsets
      - Hierarchical clustering
        
        A set of nested clusters organized as a hierarchical tree
        
        Clusters are permitted to have sub-clusters
        
        Each node (cluster) in the tree is the union of its children (sub-clusters) and the root node is the cluster containing all the objects
      - Well-separated cluster
      - Center-based cluster
      - Contiguous clusters
      - Density-based cluster
      - Property or conceptual
      - Described by an objective function
- - - - User profile
      - Usage history
    - - Recommendation
      - Gain knowledge about preferences
    - - Item based
        
        New movie released
        
        Get salient features and match with other items
      - User based
        
        User created an account
        
        Doen not have user history
        
        Match profile with other users
  - - - Complex conditional dependency architecture
    - - Hidden belief
      - Stochastic output
    - - Wild guess about preferences
  - - - Wild guess
- - - - S set of states
      - A set of Actions
      - T transition matrix function
      - V valuation of states
      - R Rewards, S * A = R
      - Vπ is the value function for the policy starting from a given state
      - Qπ is the value function of the policy for performing an action, starting from a given state.
    - - If a subsets of States and Actions
    - - Evaluate policy
      - Acumulate the reward over time
    - - Bellman equation
        
        recursive way
  - - - more general than supervised/unsupervised learning
      - learn from interaction w/ environment to achieve a goal
      - transitions and rewards usually not available
      - how to change the policy based on experience
      - how to explore the environment
      - RL doesn’t assume that you have a model, MC assume you have a model
  - - - use value functions to structure the search for good policies
      - need a perfect model of the environment
    - - policy evaluation: compute Vp from p
      - policy improvement: improve p based on Vp
  - - - don’t need full knowledge of environment
        
        just experience, or simulated experience
      - • but similar to DP
        
        policy evaluation, policy improvement
      - averaging sample returns
        
        defined only for episodic tasks
    - - don’t need model of environment
      - learn from sample episodes or simulated experience
      - can concentrate on “important” states
      - need to maintain exploration
- - - - Frequency of occurrence
        of an itemset
      - s({Milk,
        Bread,Diaper}) = 2
    - - Fraction of transactions
        that contain an itemset
      - s({Milk, Bread,
        Diaper}) = 2/5
    - - An itemset whose
        
        support is greater
        
        than or equal to a
        
        minsup threshold
    - - An implication expression of the form X ---> Y
      - Example:
        {Milk, Diaper} ® {Beer}
    - - Support (s)
        
        Fraction of transactions that contain both X and Y
        
        May occur simply by chance
      - Confidence (c)
        
        Measures how often items in Y appear in transactions that contain X
        
        Measures the reliability of the inference made by a rule
  - - - List all possible association
        rules
      - Computationally prohibitive!
        
        Aternatives
        
        Reduce number of
        candidate Itemset (Apriori principle)
        
        Reduce number of
        comparisons
      - Compute support and
        confidence for each rule
      - Prune rules that fail minsup
        and minconf thresholds
      - Association rule
        
        Total number of rules extracted from a dataset that contains d items is
  - - - Pruned
        supersets
    - - For generation
      - For prunning (infrequent)
    - - Choice of minimum
        support threshold
      - Dimensionality of the dataset
      - Size of database
      - Average transaction
        width