Predictive modeling, also called predictive analytics, is a mathematical process that seeks to predict future events or outcomes by analyzing patterns that are likely to forecast future results. The goal of predictive modeling is to answer this question: “Based on known past behavior, what is most likely to happen in the future?
Once data has been collected, the analyst selects and trains statistical models, using historical data. Although it may be tempting to think that big data makes predictive models more accurate, statistical theorems show that, after a certain point, feeding more data into a predictive analytics model does not improve accuracy. The old saying "All models are wrong, but some are useful" is often mentioned in terms of relying solely on predictive models to determine future action.
In many use cases, including weather predictions, multiple models are run simultaneously and results are aggregated to create one final prediction. This approach is known as ensemble modeling. As additional data becomes available, the statistical analysis will either be validated or revised.
Applications of predictive modeling
Predictive modeling is often associated with meteorology and customer retention and has many applications in business.
One of the most common uses of predictive modeling is in online advertising and marketing. Modelers use web surfers' historical data, running it through algorithms to determine what kinds of products users might be interested in and what they are likely to click on.
Bayesian spam filters use predictive modeling to identify the probability that a given message is spam. In fraud detection, predictive modeling is used to identify outliers in a data set that point toward fraudulent activity. And in customer relationship management (CRM), predictive modeling is used to target messaging to customers who are most likely to make a purchase. Other applications include capacity planning, change management, disaster recovery (DR), engineering, physical and digital security management and city planning.
Analyzing representative portions of the available information -- sampling -- can help speed development time on models and enable them to be deployed more quickly.
Once data scientists gather this sample data, they must select the right model. Linear regressions are among the simplest types of predictive models. Linear models essentially take two variables that are correlated -- one independent and the other dependent -- and plot one on the x-axis and one on the y-axis. The model applies a best fit line to the resulting data points. Data scientists can use this to predict future occurrences of the dependent variable.
Other more complex predictive models include decision trees, k-means clustering and Bayesian inference, to name just a few potential methods.
The most complex area of predictive modeling is the neural network. This type of machine learning model independently reviews large volumes of labeled data in search of correlations between variables in the data. It can detect even subtle correlations that only emerge after reviewing millions of data points. The algorithm can then make inferences about unlabeled data files that are similar in type to the data set it trained on. Neural networks form the basis of many of today's examples of artificial intelligence (AI), including image recognition, smart assistants and natural language generation (NLG).
Predictive modeling considerations
One of the most frequently overlooked challenges of predictive modeling is acquiring the right data to use when developing algorithms. By some estimates, data scientists spend about 80% of their time on this step.
While predictive modeling is often considered to be primarily a mathematical problem, users must plan for the technical and organizational barriers that might prevent them from getting the data they need. Often, systems that store useful data are not connected directly to centralized data warehouses. Also, some lines of business may feel that the data they manage is their asset, and they may not share it freely with data science teams.
Another potential stumbling block for predictive modeling initiatives is making sure projects address real business challenges. Sometimes, data scientists discover correlations that seem interesting at the time and build algorithms to investigate the correlation further. However, just because they find something that is statistically significant doesn't mean it presents an insight the business can use. Predictive modeling initiatives need to have a solid foundation of business relevance.