Please enable JavaScript.

Coggle requires JavaScript to display documents.

Natural Language Processing Big Picture - Coggle Diagram

- - - - splits sentences/text into words
      - word = token
      - e.g
        
        Mr.
        
        Jason
        
        goes
        
        to the
        
        beach
    - - finding patterns in a tokenized text
      - implementation of
        
        Finite State Machine (FSM)
- - - - scans token in sequence
      - in order
    - - Hidden Markov Model (HMM)
      - Conditional Random FIelds
      - Maximum Entropy (MaxEnt)
      - Deep Learning (DL)
  - - - scans token not in sequence
    - - Support Vector Machines (SVM)
      - Decision Trees (DT) / Random Forests (RF)
      - Naive Bays
- - - - token that has different meanings in different context
    - - "bank"
  - - - token that has different meanings in different context
      - the context is difference in sentence strucutre
    - - "Did you find the answer using google?"
      - "Did you find the answer that has google in it?"
  - - - token that has different meanings because of big abstraction
      - the context is difference in abstraction, a big abstraction
    - - "come join us in a cup of coffee"
- - - - advantages
        
        the most common
        
        finding relations between informations is easier
        
        better performance (time-wise)
      - disadvantage
        
        Out of Vocabulary Words (OOV)
        
        running != run
        
        the testing dataset does not exist in training dataset
        
        solution: by sub-word / by character
    - - the best
      - uses stemming / lemmatization
    - - advantages
        
        no OOV
      - disadvantages
        
        hard to find relations between informations
        
        solution: by sub-word
  - - - plainly chopping words into subwords
      - chops end/beginning of words (suffix/prefix)
    - - performance (time-wise)
    - - "riding" -> "rid"/"ridi"
      - chopped words could have no meaning at all
        
        solution: lemmatization
  - - - just like stemming
      - chopped words must have meaning
    - - performance
  - - - classifying words in a sentence to POS
      - subject, verb, obect, etc
    - - better understanding by the machine learning
      - avoids missunderstanding
        
        "I love you honey" and "Honey let's make love" is different
    - - Hidden Markov Model