The discovery process entails gathering data from all specified internal and external sources that will assist you in answering the business question.
Data sources include:
Data might have a variety of inconsistencies, such as missing values, blank columns, and improper data formats, which must be cleaned. Before you can model, you must first process, investigate, and condition the data. Your predictions will be better if your data is clean.
Determine the approach and technique for drawing the relationship between input variables at this stage. Model planning is done with the use of several statistical methods and visualisation tools. Some of the tools used for this are SQL analytic services, R, and SAS/access.
The actual model construction process begins in this step. Data scientists distribute datasets for training and testing in this section. The training data set is subjected to techniques such as association, classification, and clustering. The model is tested against the "testing" dataset once it has been created.
Deliver the final baselined model, along with reports, code, and technical papers, at this step. After extensive testing, the model is put into a real-time production environment.
The major findings are disseminated to all stakeholders at this point. This aids you in determining whether the project's outcomes are a success or a failure based on the inputs from the stakeholders.