In addition to the standard techniques in data science, ensemble modeling is an additional technique that stood out to me as interesting and useful. Models have errors that can be generally categorized in three categories: inherent randomness, bias, and variance (Provost and Fawcett 2013, 309). Every model is going to have a different blend of how each of these error categories impact their overall errors. An ensemble model that is built upon multiple different models can be a useful way to minimize these errors. Since each model has their inherent biases, ensemble models combining different predictions can lead to those errors cancelling out and a stronger result (Provost and Fawcett 2013, 309).
I find this technique interesting because it can take models built by experts in different fields and combine them to create a much more complex and dynamic model that is better at predicting real world complexity with less errors. The most interesting application of this method is the IPCC’s climate reporting. Ice core, tree ring, and atmospheric data models all have different issues with the data and biases in each model. However, when all these types of models are combined a clearer picture of climate change and anthropogenic effects are seen. While each model may have variability and assumptions within it, they all begin to independently highlight a core truth. When these models are placed into an ensemble model those assumption biases, and randomness in the datasets, begin to cancel out and bring out clearer patterns that are more robust. I think ensemble modeling is a very useful approach in handling extremely complex data science projects because it allows the problem to be broken down into pieces and different experts can tackle each specific sub-model with their business understandings to achieve the greatest results. Being a musician, I like to think of ensemble modeling like an ensemble or orchestra in music. Many different players, with some of their own flaws and timbre, coming together to create one giant piece of work that tells a story grander than just any one part could.
Author: Logan Callen
Provost, Foster and Tom Fawcett. 2013. Data Science for Business. 2nd Edition. California: O’Reilly Media, Inc.