Please enable JavaScript.
Coggle requires JavaScript to display documents.
AUTOMATIC NEWS ARTICLES CLASSIFICATION IN INDONESIAN LANGUAGE BY USING…
AUTOMATIC NEWS ARTICLES CLASSIFICATION IN INDONESIAN LANGUAGE BY USING NAIVE BAYES CLASSIFIER METHOD
Information
DOCUMENTS CLASSIFICATION
Method
Classification
Classification process
Preprocessing
Stopword
Eliminating terms that frequently appears and are not useful in information rediscovery with the purpose of filter words that are found frequently and have low value and are not useful in information retrieval
-
Words weighting
Every words that have been processed through preprocessing document phase then its weightings is computed to generate words that can represent a category.
-
Case folding
-
To unify words, eliminate noise, and lessen the vocabulary volume
Stemming,
Process of cutting or eliminating affixes in a word (prefixes, suffixes, infixes, and
confixes)
-
Words Weighting
Every words that have been processed through preprocessing document phase then its weightings is computed to generate words that can represent a category
Naive Bayes
More optimum than other representational classifiers [6] such as K-means, Clustering and Cosine Similarity.
-
Variants
Supervised
Document classification that has a learning method with training document in the form of learning document
Unsupervised
Method applied independently without training nor teaching used to analyze structure and inter data relationships
Phase
Learning phase
-
Learning phase has a module that resembles the classification phase. The only difference is that learning phase does not execute a classification module, but only generate documents consisting of words to characterize a category
Classification phase
Process
-
Process of document reprocessing, initiated by reading sentences from document texts
Words, after going through preprocessing documents phases,
are then ranked with weighting process.
-
The position of words in this method plays an important role in classification process, thus stemming method is needed (affixes elimination). Stemming is used to find the basic form of words with affixes
Other factors affecting the words position is learning document because the method that will be used in the application construction is a part of supervised document, a classification method using the training phase
Automatic news classification, the classification of news into a category, is very much needed to analyze news in an effective and efficient way
-
-
Result
System is able to generate such accuracy in delivering news articles classification with the average Recall value of 92.87% and Precission value of 91.16%.