Please enable JavaScript.
Coggle requires JavaScript to display documents.
Text and Web Analytics (Text Mining (Application area (topic tracking…
Text and Web Analytics
Text Mining
Concepts
Text Analytics: Includes information retrieval, information extraction, data mining and web mining
-
-
-
Challenges
-
-
-
- Text contains acronyms, abbreviations, misspellings
eg. customer, cust, customar, csmr
- Imperfect or irregular input
-
-
Application area
topic tracking
Based on a user profile and documents, test mining can predict other documents of interest to the user
-
-
-
-
-
-
-
Sentiment Analysis
Overview
- Settled opinion reflective of one's feeling
- Integral part of CRM and cust experience management system
- Positive or negative, Explicit or implicit
- lexicon is created
- answers qn: what do ppl feel abt certain topics?
Application
- Voice of the customer (VOC): gets data from full set of customer touch point eg. emails, surveys, call centre notes, social media postings
- Voice of the market (VOM): Understanding aggregate opinions and trends
Web Mining
Overview
- It is the process of discovering intrinsic relationships from web data
- Largest repository of data
- Data is in HTML, XML & text format
Challenges
- Too big, complex & dynamic
- Not specific to a domain
Web Structure Mining
- Authoritative pages
- Hubs: List of recommended links to authoritative pages
- Hyperlink-induced topic search (HITS) algorithm
- Source: the unified resource locator (URL) links contained in web pages
KDD
- Understand business goals
- Plausible goals for web-based mining: improve usability of website eg. decreasing the average no. of pages visited by a customer before a purchase transaction
- Data understanding and preparation
Issue
- Differentiate individual user sessions
- host addresses are not useful (cos multiple users can access to a site from the same host)
- able to differentiate different user sessions
- Identify unwanted log file entries
- single user page request usually generates multiple log file entries from several types of servers
- need to have technique to identify unwanted log file entries
-
-
Session file
Data preparation extracts relevant data from web server logs and create a session file suitable for data mining
- Interpret results of a web-based mining session using association rule mining and clustering
- Summary statistics of website activities complement the interpretation and evaluation of data mining results
- Statistics can be produced by web server log analysers (eg. awstats, webalizer)
- Targeted Marketing Communications
- set up online advertising promotions
- send emails to promote products of likely interest to a selected group of registered customers
- group products that are likely to be purchased together
- Webpage Usability Optimization
- adapt the indexing structure of a website (to better reflect the paths followed by typical users)
- Personalisation of web pages based on user profile
- implement strategy to personalize website
- force user to register at website
- data mining to automate personalization based on actual user behaviour
Web Usage Mining
- Extraction of information from clickstream analysis of web server logs generated through web age visits and transactions
- Source: the detailed description of a web site's visits (sequence of clicks by session)
Web Content Mining
- Extraction of useful information from web pages
- Collect data using web crawlers
-