Please enable JavaScript.
Coggle requires JavaScript to display documents.
Text and Web Analytics (Text Mining Application Area ( Information…
Text and Web Analytics
-
Text Analytics
includes information retrieval, information extraction, data mining and web mining
-
-
Web MIning
-
Data is in HTML, XML, text format
is the process of discovering intrinsic relationships from web data (textual, linkage, or usage)
-
Web Mining
- The extraction of useful information from Web pages (textual content)
- Data collection via web crawlers eg Googlebot
- Used for competitive intelligence, information/news/opinion collection, sentiment analysis, and automated data collection
-
Used for competitive intelligence, information/news/ opinion collection, sentiment analysis, and automated data collection
- The development of useful information from the links included in the web documents
- Web pages include hyperlinks
- Authoritative pages
- Hubs : List of recommended links to authoritative pages
- Hyperlink-induced topic search (HITS) algorithm
-
-
Extraction of information from clickstream analysis of web server logs generated through web Page visits and transactions
- data stored in server access logs, referrer logs, agent logs, and client-side cookies
- User characteristics and usage profiles
- metadata, such as page attributes, content attributes, and usage
data
-
-
KDD for Web Mining
-
Typical Web Server Log File
- User’s host address
- Date & Time
- Request
- Status
- Bytes
- Referring Page
- Browser Type
-
-
- Usage profiles for clusters
-
-
Summary statistics of Web site activities complement
the interpretation and evaluation of data mining
results
Statistics can be produced by Web server log
analyzers (e.g. awstats, webalizer, etc)
- How often a Web site is visited
- How many times an individual fill a shopping cart but fail to complete transaction
- Which web site products are best and worst sellers
- Targeted marketing Communications
-
-
-
- Webpage usability optimization
Adapt the indexing structure of a Web site to better reflect the paths followed by typical users and the changes in needs of users over time
- Personalisation of web pages based on user profiles
-
-
-
Text Mining Applications
- Marketing
- Increase cross-selling and up-selling by analyzing call-center data
- Blogs, user reviews of products reveal user sentiments
- Customer relationship management to increase overall lifetime
value of customer
- Security Applications
- Spam filtering
- Deception detection
- Biomedical
- DNA analysis, analysis of gene expression etc
- Academic
- Retrieval of information to answer specific queries
Text Mining terminology
- Unstructured or semistructured data
-
-
-
-
-
-
-
-
-
-
-
-
- Singular value decomposition
-
WordNet
-
A laboriously hand-coded database of English words, their definitions, sets of synonyms, and various semantic relations between synonym sets
-
-
-
-
Sentiment Analysis
-
-
-
-
-
-
Positive or negative, explicit or implicit
A lexicon (catalog of words, their synonyms and meanings) is created eg Wordnet
-
-
-
- Both seek for novel and useful patterns
- Both are semi-automated processes
-
Data Preparation extracts relevant data from web server logs and create a session file suitable for data mining