adam121 - Fotolia

Developing AI apps free from bias crucial to avoid analytics errors

Biased data samples or model development practices can derail any company interested in using AI tools and diminish the technology's return on investment.

When you talk about artificial intelligence, you have to talk about biases and how they might affect a model.

Biases can affect an enterprise's use of AI in two distinct ways. The first way relates to the effectiveness of models. Maybe a data scientist has a mental model of how he or she thinks the world works, but the model turns out to be invalid. Developing AI applications around it will lead to disappointing results.

"There are all kinds of ways that AI can reflect the biases of those who collected the data, so we need to think critically about how data sets are collected," said Madeleine Clare Elish, a researcher at the Data & Society Research Institute, in a presentation at the recent O'Reilly AI Conference in New York.

Elish said that when AI is applied to areas like targeted marketing or customer service, this kind of bias is essentially an inconvenience. Models won't deliver good results, but at the end of the day, no one gets hurt.

The second type of bias, though, can be more impactful to people. Elish talked about how AI is increasingly seeping into areas like insurance, credit scoring and criminal justice. Here, biases, whether they result from unrepresentative data samples or from unconscious partialities of developers, can have much more severe effects.

Using AI to tackle one form of bias

Another area where biased AI systems can hurt people is hiring. But in this realm, AI can also be a tool to fight against biases. Lindsey Zuloaga is a data scientist at HireVue Inc., a company in South Jordan, Utah, that's looking to apply AI to reduce the impact of biases when making hiring decisions. In an interview at the conference, she said AI can help evaluate candidates in a more objective way by reducing the unconscious reliance human interviewers might have on things like tone of voice or appearance.

"I think it's important that people are judged on their merits," Zuloaga said. "You want things to be fair. But in the hiring process, things are really unfair."

The HireVue platform works by recording videos of candidates answering job interview questions on their own time. AI algorithms then evaluate candidates on predefined criteria. Businesses using the platform are asked after the fact how things are going with workers hired through it in order to sharpen recommendations over time.

Theoretically, bias could creep into this process. For example, hirers could send a feedback rating as positive only for hires who fit in an organization, which could be another way of expressing racial or gender biases.

But Zuloaga said she and other data scientists at the company try to avoid this type of situation by making the deep learning algorithms underpinning the system interpretable. Most neural networks function as a black box -- the reasons for their recommendations are unclear to users. But by engineering in explanations to these models when developing AI applications, Zuloaga and her team can go back and make sure the algorithm is only recommending candidates strictly based on their expected job performance.

"We all have these biases, so that serves as a big source of inspiration," Zuloaga said. "I think there is a lot of power in diversity just for having the power of different opinions."

Developing AI for good takes good data

Often, biased AI models aren't problematic because of anything an engineer put in the model. The problem might come from the data itself. AI and deep learning models are really good at inferring relationships between variables that may be subtle. But, in some cases, this is undesirable. For example, where a person lives can often be used as a proxy for inferring their race.

This is why making sure that model-training data is representative of the population being modeled and includes only necessary data fields is so important, said Lashon Booker, a senior principal scientist at The MITRE Corp.'s IT center in McLean, Va.

That may sound obvious, but it's actually a huge challenge, Booker said in a presentation at the conference. Since big data came in vogue a few years ago, enterprises have amassed huge troves of data. When it comes to training deep learning models, this is generally a good thing. However, the issue of removing potential sources of bias from large data sets can be difficult when you don't actually know what features in the data set the deep learning algorithm will build a model around, he noted.

Ensuring that data is collected in a way that represents the population being modeled and removing any known sources of bias right off the bat can help. "The data you have available for training might make this more challenging than you'd expect," Booker said. "Big data might not be your friend."

Next Steps

What is the difference between deep learning and machine learning?

Lines of business may need to be sold on value of AI

It takes a village to deploy artificial intelligence in the enterprise

Dig Deeper on Big data and machine learning