Sergey Nivens - Fotolia
Data science is more than just the sexiest job of the 21st century. It's starting to eclipse that glitzy moniker to become a discipline from which businesses are extracting substantial operational value.
Speaking at the Gartner Data and Analytics Summit in Grapevine, Texas, Alan Jacobson, director of global analytics at Ford Motor Co., said a focus on data science has paid large dividends for the company.
Starting around 2015, the company hired a chief data science officer and pulled about 200 data analysts out of business units to form a centralized data science team. Today, that team works in a consultative role in every area of the business. This means analysts are working on everything from vehicle production issues to supporting NASCAR racing teams that use Ford cars.
Data science for operational efficiency
One area where data science has been particularly valuable has been in improving manufacturing efficiency. Jacobson described how each day manufacturing managers will get a list of the vehicles they are supposed to produce. Determining the order of production used to be a manual process. Managers would use intuition in putting together the production schedule.
But nobody can keep track of all the variables involved, such as specialized tools that might be needed on the line for certain vehicles and the differences in time it takes to produce certain cars. And plants often fell behind schedule. This delayed the delivery of new vehicles and cost the company money.
Jacobson and his team learned about this problem and built an algorithm for a Canadian production facility that takes into account all the variables that go into producing each vehicle. The algorithm delivers a production schedule that is optimized to keep vehicles moving along the production line with minimal retooling. Jacobson said it took seven days to build the algorithm and another seven to test it before deploying it into production. Now, this production facility hits its targets every day, saving the company money.
"It doesn't take too many jobs like that before you're interested in having a lot of data scientists, because they pay for themselves very quickly," he said.
Data science for product development
For credit-reporting bureau Equifax Inc., the value in data science comes from improving one of its primary products: credit scoring.
In an interview at the conference, Peter Maynard, senior vice president of data and analytics at Equifax, based in Atlanta, said credit scores are traditionally a snapshot of consumers' credit profiles, taking into consideration only factors as they exist at the time the score is created. Using more trend and historical data could deepen scores, allowing lenders to see how consumers use credit over time. But Maynard said credit-reporting bureaus traditionally haven't used trended data, because it adds a layer of complexity to scores.
Machine learning models like neural networks could make sense of this complexity, but for regulatory reasons, reporting bureaus need to be able to provide simple justification for the scores they give. It generally isn't possible to easily interpret why a machine learning model gave the answer it did, so credit-reporting bureaus like Equifax have traditionally stayed away from them.
Maynard put a member of his data science team on the problem, asking the team member not just to develop a neural network that assesses historical credit data, but that also provides reason codes for the decisions it makes -- something that standard-issue machine learning models don't do. Maynard said this change, led by data science, improves the company's main product substantially.
"Enabling better decisions through analytics -- this is something the financial industry has been waiting around a decade for," he said. "This is the difference between a snapshot and a digital video."
Pay attention to the data
Of course, these kinds of benefits don't happen simply by putting a business problem in front of a data scientist and asking him or her to solve it. Speaking at the conference, Scott Zoldi, chief analytics officer at FICO, based in San Jose, Calif., said data quality is key when developing data science practices.
Zoldi said the software vendor uses artificial intelligence and machine learning in developing credit-scoring models and the software it sells. But he said the data used in these models often has more to do with their success than the models themselves.
For example, he pointed out that machine learning models are really good at finding correlations between variables and spotting trends. But any high-school student taking a stats class can tell you that correlation does not equal causation. Old and outdated data, or data combined from disparate sources, can present any number of relationships. But that doesn't mean they mean anything in a business sense.
Zoldi said this isn't something understood by all new data scientists, many of whom will run complicated analyses against whatever data they have available, trusting that the model will compensate for any flaws in the data. But Zoldi said companies should spend more effort acquiring high-quality data, rather than expanding their data science team just to do it.
"I've learned a healthy respect for this over my career," he said. "If you can't get to causation, at least get to understanding and make sure model recommendations are believable."
Get the whole team involved to get data science right
Data scientists aren't a necessary component of all analytics teams
Analytics execs offer advice on hiring data scientists