Please enable JavaScript.
Coggle requires JavaScript to display documents.
Text and Web Analytics (Web Mining (Web Mining Challenges (The Web is too…
Text and Web Analytics
-
Text Analytics
includes information retrieval, information
extraction, data mining and web mining
Text Mining Applications
- Marketing
- Increase cross-selling and up-selling by analyzing call-center
data
- Blogs, user reviews of products reveal user sentiments
- Customer relationship management to increase overall lifetime value of customer
- Security Applications
- Spam filtering
- Deception detection
- Biomedical
- DNA analysis, analysis of gene expression etc
- Academic
- Retrieval of information to answer specific queries
Application Area
- Information extraction
- Identification of key phrases and relationships within text by looking for patterns
- Topic tracking
- Based on a user profile and documents,text mining can predict other documents of interest to the user by looking for patterns
- Summarization
- to save time for the reader
- Categorization
- identifying the main themes of a document and placing it into a predefined set of categories
- Clustering
- group similar documents together
- Concept linking
- connects related documents by identifying their shared concepts
- Question answering
- finding best answer to a given question through knowledge-driven pattern matching
Text Mining Process
Step 1: Establish the corpus
- Collect all relevant unstructured data (E.g. textual documents , web pages)
- Digitize, standardize the collection
- Place the collection in a common place
Step 2:Create the Term-by-Document Matrix
- Goal : to create TDM where the cells are filled with the most appropriate indices
Step 3: Extract patterns/knowledge
- Classification (text categorization)
- Clustering (natural groupings of text)
- Improve search recall & precision
- Association
- Confidence: % of documents that include concepts of C
- Support : % of documents that include both A & C
- Trend Analysis
Web Mining
-
- Web is the largest repository of data
- Data is in HTML, XML, text format
- Web usage Mining Applications
- Determine the lifetime value of clients
- Design cross-marketing strategies across products.
- Target electronic ads and coupons at user groups based on user access patterns
- Present dynamic information to users based on their interests
and profiles
Web Mining Challenges
- The Web is too big for effective data mining
- The Web is too complex
- The Web is too dynamic
- The Web is not specific to a domain
- The Web has everything
-
-
-
Sentiment Analysis
- A settled opinion reflective of one’s feelings
- An integral part of CRM and customer
experience management systems
- Positive or negative, explicit or implicit
Sentiment Analysis Applications
- Voice of the customer (VOC) gets data from full
set of customer touch points eg. survey,emails
- VOC is a key element of customer experience
management initiates
- Voice of the market (VOM):understanding
aggregate opinions and trends.
- VOM is about knowing what stakeholders are
saying about your products and services.
Web Content Mining
- The extraction of useful information from Webpages (textual content)
- Data collection via web crawlers eg Googlebot
- Used for competitive intelligence, sentiment
analysis, and automated data collection
Web Structure Mining
- The development of useful information from the
links included in the Web documents
- Web pages include hyperlinks
- Authoritative pages
- Hubs : List of recommended links to authoritative
pages
- Hyperlink-induced topic search (HITS) algorithm.
Web Usage Mining (Web Analytics)
- Extraction of information from clickstream analysis of web server logs generated through Web page
visits and transactions
- data stored in server access logs, referrer logs, agent logs,and client-side cookies
- user characteristics and usage profiles
- metadata, such as page attributes, content attributes, and usage data
- Clickstream data
- Clickstream analysis of web server logs
Session File
- Data preparation extracts relevant data from web
server logs and create asession file suitable for data
mining.
eg. Instance 1: P5 -> P4 -> P10 -> P3 -> P15 -> P2 -> P1
1. Business Understanding
- Plausible goals for web based mining
- Improve usability of web site - decrease the average number of pages visited by a customer before a purchase transaction
- Personalise web pages for customers
- Determine products for sale at a web site that tend to be purchased or viewed together
2. Data Understanding
- Typical Web server Log File:
- User’s host address
- Date & Time
- Request
- Status
- Bytes
- Referring Page
- Browser Type
3. Data Preparation
Issues with data preparation
- 1st issue: Differentiate individual user
session
- Host addresses are not useful because multiple users may access a site from the same host
- May be able to differentiate different user sessions by combining host addresses with the referring page
- 2nd Issue:Identify unwanted log file entries
- A single user page request oftentimes
generates multiple log file entries from several
types of servers (e.g. image servers)
- Must have a technique to identify unwanted log file entries so that they do not become part of the session file
- 3rd Issue: Data transformation
- New attributes can be added to user session
records to improve prediction of outcome
- eg.
- Average purchase amount of repeat customers
- Time of most recent transaction
4. Modelling*
Modeling Techniques for web-based mining:
- Association Rule Mining
- Unsupervised clustering
- Usage profiles for clusters
- Form clusters of similar user session instances to personalise pages viewed by web site users
5.Evaluation
- Interpret results of a web-based mining session using association rule mining and clustering.
-Summary statistics of Web site activities complement the interpretation and evaluation of data mining results.
- Statistics can be produced by Web server log
analyzers eg.
- How often a Web site is visited
-How many times an individual fill a shopping cart but fail to complete transaction
6.Deployment
Possible actions to take based on results of web-based mining:
- Targeted Marketing Communications
- Set up online advertising promotions
- Send e-mail to promote products of likely interest to a select group of registered customers
- Webpage Usability Optimisation:
- Adapt the indexing structure of a Web site to better reflect the paths followed by typical users and the changes in needs of users over time
- Personalisation of web pages based on user profiles
- Implement strategy to personalize web pages
- Manual : force users to register at web site
- Data mining to automate personalisation based on actual user behaviour