Naïve Bayes Machine Learning Classification with R Programming: A case study of binary data sets
Abstract
This analytical review paper clearly explains Naïve Bayes machine learning techniques for simple probabilistic classification based on bayes theorem with the assumption of independence between the characteristics using r programming. Although there is large gap between which algorithm is suitable for data analysis when there was large categorical variable to be predict the value in research data. The model is trained in the training data set to make predictions on the test data sets for the implementation of the Naïve Bayes classification. The uniqueness of the technique is that gets new information and tries to make a better forecast by considering the new evidence when the input variable is of largely categorical in nature that is quite similar to how our human mind works while selecting proper judgement from various alternative of choices and can be applied in the neuronal network of the human brain does using r programming. Here researcher takes binary.csv data sets of 400 observations of 4 dependent attributes of educational data sets. Admit is dependent variable of gre, score gpa and rank of previous grade which ultimately determine whether student will be admitted or not for next program. Initially the gra and gpa variables has 0.36 percent significant in the association with rank categorical variable. The box plot and density plot demonstrate the data overlap between admitted and not admitted data sets. The naïve Bayes classification model classify the large data with 0.68 percent for not admitted where as 0.31 percent were admitted. The confusion matrix, and the prediction were calculated with 0.68 percent accuracy when 95 percent confidence interval. Similarly, the training accuracy is increased from 29 percent to 32 percent when naïve Bayes algorithm method as use kernel is equal to TRUE that ultimately decrease misclassification errors in the binary data sets.
References
Dodge, M. (2008). Understanding Cyberspace Cartographies:. A thesis submitted for the degree of Doctor of Philosophy.
D. Liske, "Lyric Analysis: Predictive Analytics using Machine Learning with R," 2018.
A. Sharma, "How Different are Conventional Programming and Machine Learning?," Tags: Machine Learning, Programming, 2018.
J. Brownlee, "Supervised and Unsupervised Machine Learning Algorithms," 2016.
M. Sidana, "Types of classification algorithms in Machine Learning," 2017
M. D. B. J. M. D. L. R. C. P. C. A. S. B. B. Troy CS, "Sequence - Evolution - Function: Computational Approaches in Comparative Genomics.," 2001.
J. P. Jiawei Han, "Learn more about Naïve Bayesian Classifier," 2012.
L. S. R. Arboretti Giancristofaro, MODEL PERFORMANCE ANALYSIS AND MODEL VALIDATION, 2003.
S. Arora, "Data Science vs. Big Data vs. Data Analytics," 2015.
B. Deshpande, "2 main differences between classification and regression trees," 2011.
R. Khan, "Naive Bayes Classifier: theory and R example," 2017.
M. D. S. D. K. Malik Yousef, Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants, 2016.