Date of Award

5-17-2022

Document Type

Masters Project

Abstract

Naive Bayes is an application of Bayes theorem in which the likelihood function is factored into marginals by making the assumption that the variables are independent. Naive Bayes is typically used for classification problems in which the goal is to find the class with the largest probability given the data on hand. When the data on hand are continuous real numbers we can further assume they are class conditionally normally distributed, which is a particular version of Naive Bayes called Gaussian Naive Bayes. This paper explores when Gaussian Naive Bayes classification problems work well vs when they do not. Typically when assumptions are not valid, valid conclusions cannot be drawn. However, Naive Bayes is known to be robust even when the independence assumption is not met. We show using simulations that binary classification accuracy of Naive Bayes is much more sensitive to differences in the class conditional marginal distributions than the correlation between predictors. Additionally we show that Naive Bayes completely fails when predictors are generated using a Gumbel copula and compare results with a general Bayes classifier and the K-Nearest Neighbors classifier.

Handle

http://hdl.handle.net/11122/14710

Share

COinS