Please enable JavaScript.
Coggle requires JavaScript to display documents.
WebM (Text Mining (Applications (Application area (->Information…
WebM
Text Mining
A semi-automated process of extracting knowledge from unstructured data sources
a.k.a. text data mining or knowledge discovery in textual databases
Applications
-
Marketing
Increase cross-selling and upselling by analysing call-center data Blogs, User reviews of products reveal user sentiments CRM to increase overall lifetime value of customer
-
-
Application area
->Information extraction: identification of key phrases and relationships within text by looking for patterns
->Topic tracking: Based on a user profile and documents, text mining can predict other documents of interest to the user
-
->Categorization: identifying the main themes of a document and placing it into a predefined set of categories
-
-
->Question answering: finding best answer to a given question through knowledge-driven pattern matching
-
-
-
information retrieval, information extraction, data mining and web mining
KDD FOR WEB MINING
- Business Understanding Plausible goals for web-based mining: e.g. Improve usability of web site Determine product for sale at website that tend of be purchased of viewed together Personalised web page for customers
2.Data Understanding and Preparation for web-based mining Sources (LOGS,Flat files, Operational DB ->Data Preparation -> Session File ->Data Mining algorithm ->Learner Model (LOG File consist the referring page, user's host address, date and time, request type, status of request, bytes, browser type/agent)3.Data Preparation**4. Modeling**
Associative rule mining Unsupervised clustering (Usage profiles for clusters, Form clusters of similar user session instances to personalise pages viewed by web site users) Session file
Data Preparation Extracts relevant data from web server logs and create a session file suitable for data mining 5)Evaluation (Interpret results of web-based mining session using associative rule mining and clustering)
Associative rule generated Confidence Support Interpretation Clustering Technique Using summary statistics of website activities in interpreting clusters formed
6.Deployment Possible actions to take based on results of web-based mining Targeted Marketing Communications Webpage usability optimisation Personlisation of web pages based on user profiles
-
-
Differentiate individual user sessions (Host address not useful as multiple users may access the site from same host. Cookies is the solution)
-
-
-
Text analytics
Text mining process
-
->Collect all the relevant unstructured data(Text, HTML, XML etc.)
-
-
-
-
-
(Output for this stage is a flat file called term-document matrix where the cells are populated with the term frequencies/APPROPRIATE INDICES)
(E.G. Document 1 Term Investment Risk appears once, Document 2 appears twice. Populated 3 occurrence)
-