Similarity is a concept that is well experienced in everyday life, however, when looking at things from a data perspective the meaning becomes more precise. The first step in determining similarity between objects is to represent them as data (Foster and Fawcett 2013, 142). Once something is represented as data it becomes possible to use mathematical concepts of distance to plot them in spatial dimensions. The closer two points of information are within that plotted frame of space the more similar they are.
A good example of what this looks like in practice can be seen in the COVID-19 small business aid programs. If someone were trying to compare a large set of businesses to determine which ones were more similar and in need of help, the collection of attributes of those businesses could be analyzed. If aid is trying to be focused on businesses with less than 500 employees that were in the greatest need, certain attributes of the known businesses in the greatest distress could be specified as primary descriptors of the analysis. When looking at different variables we could potentially see that a small tech firm would likely not need funding as much as a small restaurant. They are not similar in respect to ability to telecommute or impact from other business closures in their supply chain for example even though they are similar size of employees. However, if a different business applies for funding, those attributes could be looked at to see if this new business was similar. If a barbershop were to apply, they would also not be able to telecommute and would be impacted by lack of supply shipments. This new case would be similar to the known case that was chosen to receive funding and would then be able to be approved as well. In this way, data on businesses could be used to properly prioritize a variety of different businesses in the queue for economic aid. Similarity in data modeling is a helpful method for determining how to classify new pieces of information.
Author: Logan Callen
Provost, Foster and Tom Fawcett. 2013. Data Science for Business. 2nd Edition. California: O’Reilly Media, Inc.