olly - Fotolia
Say you're training an image recognition system to identify U.S. presidents. The historical data reveals a pattern of males, so the algorithm concludes that only men are presidents. It won't recognize a female in that role, even though it's a probable outcome in future elections.
This latent bias is one of the many types of biases that challenge data scientists today. If the machine learning data set they use in an AI project isn't neutral -- and it's safe to say almost no data is -- the outcomes can actually amplify discrimination and bias in machine learning data sets.
Visual recognition technologies that label images require vast amounts of labeled data, which largely comes from the web. You can imagine the dangers in that -- and researchers at the University of Washington and University of Virginia confirmed one poignant example of gender bias in a recent report.
They found that when a visual semantic role labeling system sees a spatula, it labels the utensil as a cooking tool, but it's also likely to refer to the person in the kitchen holding that tool as a woman -- even when the training data image is a man. Without properly quantifying and reducing this type of correlation, machine learning tools will magnify stereotypes, the researchers concluded.
So while AI projects hold immense benefits, companies evaluating AI initiatives must also understand the dangers associated with creating systems that deliver biased results. In its November 2017 report, "Predicts 2018: Artificial Intelligence," Gartner warned that data bias can have devastating and highly public impacts on AI outcomes. They cautioned executives to ensure accountability and transparency in their methodologies.
Correcting bias in machine learning
AI vendors, including Google, IBM and Microsoft, say they're working to solve the data bias problem -- to the extent that it can be solved -- so that their AI-as-a-service systems are trustworthy for users everywhere. That's no small accomplishment, because what appears unbiased to a programmer in China or India may be viewed as negative bias to a programmer in the U.S. When a programmer attempts to manually "fix" a bias or prejudice, the cognitive bias of that programmer is introduced, and it affects the outcomes.
"Nearly all data has bias, and if you try to eliminate it, you introduce another type of bias," explained Anthony Scriffignano, senior vice president and chief data scientist at Dun & Bradstreet, which provides commercial data, analytics and insights to businesses globally. “You often can’t eliminate it, so it’s important to understand it and the impact it has on the decision you are making.” Further, it’s important to clearly articulate what the bias in the data is -- a step many data scientists skip, because people work from the assumption that their data is completely true and appropriate to the analysis, Scriffignano said.
But pinpointing bias in machine learning can be complicated, especially when the algorithm is hidden -- as with black boxes -- and where biases were introduced at some point along the way. And there are certainly scenarios where data bias is ignored, or hidden, which leads to their perpetuation.
Jeff AdamsCEO, Cobalt Speech & Language
As mathematician and data scientist Cathy O'Neil said in her Ted2017 Talk, "Algorithms are opinions, embedded in code." In that talk, she highlighted the problem of technologists laundering data by hiding "ugly truths" in black box algorithms. That type of data is used to build "secret, important and destructive" algorithms as a means to wealth and power -- algorithms she calls "weapons of math destruction."
Jeff Adams, CEO of Cobalt Speech & Language, a language technology company that helps companies build products that incorporate natural language interfaces, knows the problems of data bias all too well. "We run into the problem of data bias every single week; it is the front-and-center issue," he said.
Creating a conversational interface requires data that represents all the dialects, cultural turns of phrase and slang people use during the course of a conversation, plus analysis of the person's tone and intent, which, in the case of sarcasm, is the opposite of what the person actually means.
Take the example of AI chatbots designed to scan email systems and automatically set appointments. While useful in theory, this type of automation can be a problem if the conversation isn't interpreted correctly. "I'm terrified by the idea of a system automatically setting up a meeting with someone based on an email conversation, because I may be saying 'Yes, let's meet,' but really I'm thinking 'Hell no,'" Scriffignano said during his presentation at December's AI World in Boston.
What’s missing from machine learning data sets creates bias
Beyond understanding the intent of the person speaking, data scientists have to ensure there's adequate data from both genders so that speech recognition tools work equally well for men and women. Other factors also play a role. There tends to be more energy in the signal processing of a male voice, which can help accuracy, while women, culturally or biologically, might speak clearer than men, which could swing the pendulum the other way, Adams said.
Indeed, finding and training algorithms using real, representative machine learning data sets is the hardest component of natural language processing and other AI-related applications. It's important for data scientists to recognize the data gaps to avoid making conclusions based on incomplete data.
Anthony Scriffignanosenior vice president and chief data scientist, Dun & Bradstreet
In a time when executives rely so heavily on data to make decisions, not having the data necessary to develop analysis can be crippling. When Brexit happened, for example, a number of Dun & Bradstreet's clients asked for a macroeconomic opinion on it, but there simply wasn't historical data to build an analysis from; it was far more nuanced, Scriffignano said.
"We are talking about something historically unprecedented," he explained. "We could have taken our data and thrown it into a machine learning algorithm, but there was danger in doing that. We had to resist all the people who said, 'Why don't we just.' ... It could have been malpractice."
In another example, Adams recalled building a medical dictation application using the data in tens of thousands of reports that were pulled from part of one year. His team built a machine learning model and put it to work, only to realize one month was missing from the machine learning data sets.
"We blindly built the model and thought it was good until we got to 'February,' and it didn't recognize that word," Adams said. "It's a story of thinking you've done everything right, but it's hard to get good coverage of whatever you are trying to represent."
Adams previously worked for a voicemail-to-text transcription company called Yap, which used speech recognition models trained on carefully spoken English developed in a lab. "The problem is, when people leave a voicemail for friends, they don't speak colloquially," Adams noted. "So, for people who spoke nonstandard dialects of English, someone very urban, say, the voicemail transcriptions [didn't work]."
People imagine there's a typical version of whatever they're trying to build, but any particular thing involves specifics, Adams said. "You imagine what people leaving a message might say -- some generic statement, like 'I'm running late.' But in reality, everything is specific," he said. Someone from the pharmacy might leave a voicemail saying your doxycycline is ready, and they provide instructions for taking it. Or the call might be a contractor explaining some additional work that needs to be done, along with your options and the cost. "In other words, there is no such thing as typical, so it's incredibly hard to create models that cover all the possible scenarios," he said.
Good data takes time -- and money
How to overcome that challenge is the $64,000 question. But one way is to collect the right types of data and recognize that the initial versions of models are going to underperform on some aspects -- especially if you relied on data that didn't come from the real world, Adams acknowledged. "You'll always be handicapped when you try to take data artificially created or from a domain that doesn't match what you are doing," he said.
Taking the time to ensure you have enough of the right data is critical for AI applications. But if that length of time stretches beyond six months, company executives looking for a quick ROI won't be pleased, according to data scientists at AI World. Ideally, corporate executives give data scientists enough time to ensure the AI product they're developing will work well right out of the gate for all types of people. "When we talk to a company, we have to be upfront about the trade-offs, the pros and cons. We explain that if we launch too early, [the product] will be substandard," Adams said.
Yap was acquired by Amazon in 2011, and through that acquisition, Adams became part of the team that developed the Amazon Echo. He said Amazon wouldn't release a substandard product and took the time necessary to ensure that it worked properly for all types of users. The company spent more than a year working on the Echo internally; hundreds of Amazon employees took the device home and used it as much as possible ahead of its release, he said, adding, "We collected all the data and didn't release it until we were sure it would work."