The Cross Industry Standard Process for Data Mining (CRISP-DM) is a formalized structure for defining data mining project processes. This creates a process and structure for implementing data mining projects that creates consistency, repeatability, and objectiveness (Provost and Fawcett 2013, 26). The process is comprehensive, from the pre-planning and ideation phase all the way to deployment, and includes iterations through the cycle for further project refinement. I think that the CRISP-DM model is useful for ensuring that data projects are setup to deliver the most value generating results, however, the term data mining could benefit from being replaced with a less ambiguous label.
By starting the process with both business and data understanding sections, the CRISP-DM model ensures that the data project will be structured in a way that will ensure the proper problem is being solved. It will also ensure that the costs and benefits of the data are included in the early planning stages to see whether further investment in data is needed or whether the project should be pursued at all. After the data modeling is completed, it also ensures that the results are properly evaluated to ensure that the results are valid and reliable so that the results can actually be used to realize a return on investment (Provost and Fawcett 2013, 32). This process also allows the project team to report their status to executives in a way that indicates progress is being made through the process even if results are not available, ensuring that there is an objective understanding of the project’s status. I think the critical questions it requires in the early phases will be valuable for use in projects I work on. I also think the structured guideline the process provides will be helpful regarding understanding the progress made and future tasks required in a project.
The term data mining conjures imagery of people digging in the earth to find gems and precious metals in my mind. This makes me think that data mining is about finding hidden value in data. However, data mining is more then just finding gems in piles of data, it is more about automating the exploration of data that leads to knowledge and pattern finding (Provost and Fawcett 2013, 35). These patterns and insights can then be used to generate value. I found the term KDD, knowledge discovery and data mining, a lot more interesting of a concept. It seems like a better term for data mining would be automated knowledge discovery since the process creates results that are then used to create return on investment. The CRISP-DM model is a useful way to ensure objective and consistent repeatable project results for data mining projects. While the term data mining may create misunderstandings around what activities are taking place due to misunderstandings around the terminology, the processes involved lead to value generating results no matter what the process is called.
Author: Logan Callen
Provost, Foster and Tom Fawcett. 2013. Data Science for Business. 2nd Edition. California: O’Reilly Media, Inc.
Image: By Kenneth Jensen – https://commons.wikimedia.org/w/index.php?curid=24930610