Structured data consists of formatted data fields that hold pieces of information about transactional records like a list of purchases on a credit card. Unstructured data, on the other hand, is data like text-strings, images, and audio or video recordings that isn’t stored in a fixed record length format. Historically, most data analytics have been performed on structured data due to the simplification of the process and tools needed, even though unstructured data makes up most data.
According to IBM, nearly 80% of data is unstructured data. Things like x-rays, photos, audio and video, as well as text like social media posts, make up a large portion of the data but are not as easy to process and have lagged in use behind structured data for many reasons. Structured data is typically data records in table format that can then be manipulated using a variety of common tools like Microsoft’s Excel or through hand calculations. Unstructured data typically requires advanced software, or custom software solutions, to tie the data together like x-rays being connected to a patient’s medical history for example. While structured data can take up a lot of server space, unstructured data requires even more space for storage which leads to much higher costs to properly utilize as well. Structured data can have issues with data quality; however, unstructured data is much more complex and can lead to large data reliability issues when processed incorrectly as well.
While unstructured data may be more difficult and costly to work with, it can create large benefits to an organization. More comprehensive datasets allow decisions to occur quicker (Shacklett 2017). A lot of time the unstructured data is already needed and used, like in the case of x-rays for medical use, so implementing systems for unstructured data can reduce time to process items where there weren’t proper tools in place previously. Unstructured data can also be added onto current structured data sets, like the x-ray being tied to patient data, so it doesn’t require a full rebuild of a system and can be layered on as an addition to enhance the structured data. Another example of utilizing unstructured data would be around social media posts about businesses. By taking GPS information from photos in a social media post and linking them to word trends from text analytics, better restaurant or business reviews can be gleaned from more places than just official review sites that require people to sign-up and participate.
While unstructured data may not have been as easy to work with historically, new tools are opening innovative options to add unstructured data into a business world that has historically been built around structured data. Unstructured data is likely to provide many new innovations as tools evolve and businesses get more comfortable with data analytics projects moving forward.
Author: Logan Callen
Schneider, Christie. 2016. “The biggest data challenges that you might not even know you have.” IBM, February 18, 2020. https://www.ibm.com/blogs/watson/2016/05/biggest-data-challenges-might-not-even-know/
Shacklett, Mary. 2017. “Unstructured Data: A Cheat Sheet.” TechRepublic, February 18, 2020. https://www.techrepublic.com/article/unstructured-data-the-smart-persons-guide/