Naive Bayes and Bias in Data Science

A key aspect of science, including data science, is the ability to remove bias from analyses. Provost and Fawcett made a great point stating that “while humans are quite good at using our knowledge and common sense to recognize whether evidence is likely to be for or against, humans are notoriously bad at estimating the precise strength of the evidence”. Naïve Bayes classifiers are appealing because they work to remove the human bias from an analysis. By utilizing Bayes’ Rule, aka Bayes’ theorem, and its math around probabilities, and start with the assumption that two events are independent from each other, we get the Naïve Bayes classifier. If the two variables do turn out to be dependent it will just double the values that are positive or negative, making it even easier to see the impacts. This method can help identify the class with the highest probability for a new example. It is easy to compute and use, but since it focuses on the highest probability only, it does not take into consideration costs and benefits, so should be used with caution when making decisions. Naïve Bayes is best used for when the trying to rank things instead of obtaining their specific value.

A good example of a time when Naïve Bayes would be a good method to use would be on where, instead of trying to determine different targets, you are trying to determine which segment would generate a given value (Provost and Fawcett 2013, 248). It would be good to use Naïve Bayes methods when looking into whether a loan applicant would default on their loan for example. By assuming all the information elements are independent from each other it would then be possible to rank which factors in a large collection of attributes predict whether someone would default or not. If a new applicant fell into the categories that provided the most probability that this person would lead to default, then the applicant could be denied. Removing bias is critical for any good science. Utilization of Naive Bayes can help a variety of classification problems in data modeling.

Author: Logan Callen

References

Provost, Foster and Tom Fawcett. 2013. Data Science for Business. 2nd Edition. California: O’Reilly Media, Inc.

Author: Logan Callen

References

0 comments on “Naive Bayes and Bias in Data Science”Add yours →

Leave a Reply Cancel reply