NLP

Classify

Extract

Summarize

Extractive

Abstractive

Zipf's law

Bag-of-Words (BOW model)

it is just a count vectorization

Milk is good and not expensive
Milk is expensive and not good
BOW model things both are the same information

Sequence Modeling

n-grams

Hidden Markov Model

Conditional Random fields

Conventional Neural Nets

Why Probality

Bayes rule:
$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$

where $P(A)$ is the prior probability of $A$ , $P(B)$ is the prior probability of $B$ , $P(A|B)$ is the posterior probability of $A$ given $B$ , and $P(B|A)$ is the likelihood of $B$ given $A$ .

p(the lady is beautiful) > p(beautiful the is lady)

$ p(w_i) = \frac{C(w_i)}{\sum_{w\in Vocab}C(w) } $

Perplexity score
Perplexity score is used to determine how the model is confused with the given text. The usually score between 0 and 1. The lower the perplexity score, the better the model is.

Divide the data into 3 standard section

Training

Heldout

Testing

Smoothing

Backoff

class based models

Laplace smoothing

Add K smoothing

Interpolation
Mix of different ngrams with lower order like 4gram, trigram & unigam

Kneser-Ney Smoothing

Nelder–Mead method

Splitting dataset

Training

Heldout

Testing

to allow hyper parameters to be experimented with

Discriminative models

Mutual Information

Information Gain

Entropy

amount of uncertainty in a distribution

Logistic Regression

$ \sigma(z) = \frac{1}{1 + e^{-z}} $

Loss function: cross-entropy

Optimization algorithm: gradient descent

Support Vector Machine (SVM)

SVM

Sequence Tagging

POS Tagging

Named Entity tagging/Named Entity Recognition (NER):

Dialogue Act tagging

noun, verb, pronoun, preposition, adjective, adverb,
conjunction, article

Semantics

First Order Logic Semantics

Logical symbols

Non-logical symbool

Quantifiers

eg: John, Mary, Vegetarian, Food

Model Consists the following elements

Domain: a set of individuals/symbols
Properties
3.

Higher Order Logic

The Lambda Notation