Please enable JavaScript.
Coggle requires JavaScript to display documents.
Natural Language Processing, Natural Language Processing - Coggle Diagram
Natural Language Processing
Natural language processing (NLP)
Natural language processing is a branch of artificial intelligence that enables computers to comprehend, generate, and manipulate human language. Natural language processing has the ability to interrogate the data with natural language text or voice.
Computer science
Algorithms, data structures, software development, and computational theory
It creates systems that "learn" and "think" like humans
Ex: chatbot
Cognitive psychology
How the human mind processes information: memory, learning, perception, and language
How people understand, produce, and learn language
Linguistic
The structure, meaning, and use of language
Phonetics
A writing system in which every spoken sound corresponds to exactly one written symbol, and each symbol represents only one sound
Phonology
Studies the sound systems of languages.
Focuses on rules, patterns, and organization of sounds in specific languages.
Morphology
Discourse analysis: Study of how language is used in context.
Narrative theory: Understanding the number of voices or perspectives in a text.
Resources
Lutkevich, B. (2021). What is Natural Language Processing? An Introduction to NLP. [online] TechTarget. Available at:
https://www.techtarget.com/searchenterpriseai/definition/natural-language-processing-NLP
.
Wikipedia. (2020). History of natural language processing. [online] Available at:
https://en.wikipedia.org/wiki/History_of_natural_language_processing
. (Wikipedia, 2020)
Runestone.academy. (2025). Formal and Natural Languages. [online] Available at:
https://runestone.academy/ns/books/published/foppff/general-intro_formal-and-natural-languages.html
. (Runestone.academy, 2025)
Bek (2020). Syntax, Semantics and Pragmatics: What is the Difference? | AVSP. [online] Alex V. Speech Pathology. Available at:
https://avspeechpathology.com.au/education/syntax-semantics-and-pragmatics-what-is-the-difference
[Accessed 16 Apr. 2025].(Bek, 2020)
Google.com. (2024). machine learning advantages and disadvantages - Google Search. [online] Available at:
https://www.google.com/search?q=machine+learning+advantages+and+disadvantages.(Google.com
, 2024)
Google.com. (2024). Sentiment analysis co-reference match pronouns with referring expression - Google Search. [online] Available at:
https://www.google.com/search?q=Sentiment+analysis+co-reference+match++pronouns+with+referring++expression.(Google.com
, 2024)
History
1970 Terry Winograd builds SHRDLU, an early NLP system that understands commands in a virtual block world. Combined syntax, semantics, and basic reasoning
1980s Rise of rule-based system, Systems relied heavily on handwritten grammar rules.
Focus on parsing and syntactic analysis (sentence structure)
1988 , The IBM speech group introduces statistical machine translation, laying the foundation for modern NLP
1990s ,Use of large corpora (text datasets), NLP gets more accurate by learning from data
2000s Rise ML into NLP, adopts machine learning models
2008-2013, introduction of Word2Vec (2013) by Google.
Words are now represented in vector space to capture meaning
Deep learning, particularly neural networks, revolutionized NLP.
2013s-Present, Transformer models, like GPT, have made significant strides in NLP
The field is continuously evolving, with ongoing research on new architectures, applications, and ethical considerations.
1950 Turin Test , to see if a machine can imitate human conversation
1954 Georgetown-IBM Experiment First successful machine translation (Russian to English)
1966 ELIZA , Joseph Weizenbaum creates ELIZA, a chatbot that simulates a therapist
1958 Noam Chomsky, revolutionized linguistic with "Universal grammar" a rule-based system of syntactic structures
2013s-Present, Transformer models, like GPT, have made significant strides in NLP
The field is continuously evolving, with ongoing research on new architectures, applications, and ethical considerations.
Language
Language is a system of communication that uses symbols and rules (grammar) to express ideas, thoughts, emotions, and information
Formal language
A formal language is a structured set of symbols and rules, used mainly in mathematics, logic, and computer science.
Example: Python, Java, C++
Used for programming and logic
Very strict and structured
Communication modes
Collaboration
Working together using language to achieve a shared goal
Requires clear communication, planning, and often negotiation
Used in teamwork, group projects, business meetings
Conversation
Interactive and informal exchange between two or more people
(Focuses on turn-taking, listening, and responding)
Communication
General exchange of information
(verbal, non-verbal, or written)
Co-creation
Creating something new together through shared ideas and language
Involves idea generation and joint problem-solving
Natural Language
Natural languages are the languages that people speak, such as English, Spanish, Korean, and Mandarin Chinese.
Evolved naturally over thousands of years
Rich and expressive: can show emotion, sarcasm
Complex structure: grammar, idioms, tone, context
Can be spoken, written, or signed
Lexicon
Analyzing individual words based on their role in a sentence, their meanings, and how they relate to other words
Non-standard English
Any variety of English that differs from Standard English in grammar, vocabulary, or pronunciation
ex: regional dialects, slang, informal speech
“Ain’t” instead of “isn’t”
“Gonna” instead of “going to”
Neologism
A newly coined word or expression, often created to name new ideas, trends, or technology
Ex: Selfie
Phrases whose meaning isn't literal, but understood culturally or contextually
examples: “Kick the bucket” = to die
“Break the ice” = to start a conversation
Syntax
Syntax studies the rules governing the arrangement of words in sentences
It examines sentence structure, including sentence types, clause relations, and syntactic variation
Syntax
Analyzes sentence patterns, word order, and structure
Determines what makes a sentence correct or incorrect
Focuses on grammar rules
Semantics
The study of meaning in language
(what words, phrases, and sentences actually mean)
literal meaning (dictionary meaning)
Analyzes word relationships (synonyms, antonyms)
Deals with ambiguity, reference, and truth
Pragmatics
Pragmatics is the study of how language is used in context, especially how meaning depends on the speaker, listener, situation, and intention
Discourse analysis
The study of language in use (how people use language in real communication, beyond individual sentences)
Used in NLP systems
ChatGPT, Siri, Google Assistant (conversation, intent, syntax)
Grammarly (syntax & grammar checking)
Google Translate (syntax + semantics)
Search engines (semantics + discourse understanding)
Machine Learning
Machine Learning is a branch of Artificial Intelligence (AI) where machines learn from data to make decisions or predictions without being explicitly programmed for every task.
Workflow
Collect data (e.g., emails labeled spam/not spam)
Train model on that data
Evaluate the model's performance
Use the trained model to make predictions on new data
Advantages
ML can handle repetitive tasks without human effort (e.g., spam filtering, data entry)
Models can learn from data and improve their accuracy over time, leading to better predictions and decisions.
Great at analyzing massive datasets that are too complex for humans to process
Machine learning can be used to personalize customer experiences, leading to increased satisfaction and loyalty.
Ex: Used in healthcare, finance, education, self-driving cars, language processing,
Disadvantages
ML models often require large amounts of quality data to perform well
Training models (especially deep learning) can require powerful hardware and time.
Model may perform well on training data but poorly on new, unseen data
Some machine learning models, particularly deep learning, are difficult to interpret. Understanding how they arrive at certain decisions can be challenging, raising ethical concerns (black boxes problem)
N-Gram character model
A sequence of n consecutive items (characters, words) in a text.
Optical character recognition
Can be used to correct errors introduced by OCR by analyzing the likely character sequences given a context.
Scanning, text extraction
Spam
Can be used to identify spam emails or messages by analyzing the frequency of certain character sequences that are common in spam.
Email, SMS
Name entity recognition (NER)
N-grams can help identify and extract named entities from text by analyzing the patterns of characters that tend to occur in entity names.
(Person, Location, Organization, Date, Money)
Search, chatbots, analysis
POS tagging
Can assist in identifying the grammatical role of words (nouns, verbs, adjectives, etc.) within a sentence by looking at the character sequences of the words.
like translation, question answering, grammar checking
Good Progress
Sentimental Analysis
This involves analyzing text to determine the expressed sentiment or opinion, whether it's positive, negative, or neutral. Advancements include improved machine learning models and techniques for handling nuanced sentiment, like aspect-based sentiment analysis
Identify trends in public opinion
Data Collection: Sentiment analysis begins with gathering relevant textual data. This might involve scraping social media platforms like Twitter, Facebook, or Instagram, or collecting data from news articles, blog posts, or online review sites.
Coreference resolution
Coreference resolution involves identifying all words or phrases (like pronouns, noun phrases, or proper nouns) that refer to the same entity within a text.
Word sense disambiguation (WSD)
It is very common in languages to see situations where the same word will have completely different meanings based on the context in which they occur.
Parsing - grammatical analysis of sentence - ambiguous
Natural Language Processing