Please enable JavaScript.
Coggle requires JavaScript to display documents.
Week 8: Text and Web Analytics (Text Mining Terminology (Concepts,…
Week 8: Text and Web Analytics
Data Mining versus Text Mining
Data Mining: Structured data in databases
Text Mining: Unstructured data e.g. word documents, PDF files, text excerpts, XML files, and so on.
Similarities
Seek for novel and useful patterns
Semi-automated process
Differences: Nature of data
Texting Mining Applications
Security Applications
Biomedical
Marketing
Academic
Text Mining Application Area
Summarization
Categorization
Topic tracking
Clustering
Information extraction
Concept linking
Question aswering
Text Mining Terminology
Concepts
Stemming
Terms
Stop words (and include words)
Corpus (and corpora)
Synonyms (and polysemes)
Unstructured or semistructured data
Tokenizing
Term dictionary
Word frequency
Part-of-speech tagging
Morphology
Term-by-document matrix
Singular value decompostition
Natural Language Processing (NLP)
A very important component in text mining
A subfield of artificial intelligence and computational linguistics
Structuring of a collection of text
The study of understanding the natural human language
Considers grammatical and semantic constraints as well as context
Challenges
Text segmentation
Syntactic ambiguities
Part-of-speech tagging
Imperfect or irregular input
Speech acts and semantic analysis
WordNet
Sentiment Analysis
Need automation to be completed
Text Mining Process
Step 2: Create the term-by-Document Matrix (TDM)
Step 3: Extract patterns/knowledge
Step 1: Establish the corpus
Web Mining
Process of discovering intrinsic relationships from Web data (textual, linkage, or usage)
KDD for Web Mining
Step 3: Data Preparation
Step 4: Modelling
Step 5: Evaluation
Step 2: Data Understanding
Step 6: Deployment
Step 1: Business Understanding