When performing data mining, there are a lot of different algorithms that a person could use to gain insight into the issue in question. However, most of these different algorithms fall into a limited set of fundamental types. Examples of these fundamental types are classification, regression, similarity matching, and clustering for example (Provost and Fawcett 2013, 20). These different algorithm types have led to some important improvements in the sustainability industry.
An important aspect of many sustainability projects related to energy is whether the up-front costs will provide a return on investment. Since energy use variability is influenced heavily by the weather, it can be difficult for certain energy projects to prove results on a year over year basis. For example, when installing LED lights, if the usage goes down due to the reduced energy needs for lighting, but that year is extremely hot compared to the previous year and air conditioning use spikes, it could lead to a year with higher energy use even though the lighting energy needs were reduced. By using regression analysis, energy managers can remove weather variability from their data to develop a curve for expected energy use for the time period after the installation using heating and cooling degree day data. They can then see whether their energy use came in above or below those weather normalized expectations of energy use. The difference in usage that energy efficiency projects bring can then be measured and verified in a standard way that allows savings to be calculated. These savings can then be used to properly support and communicate the return on investment for energy efficiency projects, leading to greater reductions in energy use and greenhouse gas emissions. This use of regression algorithms has led to many energy efficiency projects being implemented across the globe.
A fundamental algorithm type that I think could be utilized to solve neglected issues would be the classification type. Many energy and water users across the country are low-income and have trouble affording the different commodities that are required for life. By using demographic, billing, and other assistance program data, different citizens could be classified as needing help with their commodity bills even if they haven’t moved into a past-due status on their bills yet. This early identification of struggling families could help remedy the issues before they continue to grow which could potentially lead to homelessness. Using data mining techniques, and particularly classification, to help vulnerable populations is a focus I see great value and opportunity in.
Author: Logan Callen
Provost, Foster and Tom Fawcett. 2013. Data Science for Business. 2nd Edition. California: O’Reilly Media, Inc.