A novel approach for pitting algorithms against each other, called generative adversarial networks, or GANs, is showing promise for improving AI accuracy and for automatically generating objects that typically require human creativity.
Yann LeCun, director of AI research at Facebook and professor at New York University, wrote that GANs and the variations now being proposed are "the most interesting idea in the last 10 years in machine learning."
The reason for the excitement is the technique promises to automate the process of training algorithms in AI applications. The use of GANs makes it possible to improve the internal representation in AI algorithms for detecting important data features in the face of noise and other variations. In the long run, GANs could also help improve the algorithms used for discriminating fake objects from the real thing and improve results for detecting objects in poor images or speech in noisy environments.
The core idea behind GANs was introduced by Ian Goodfellow, who's now staff research scientist at Google Brain, and associates from the University of Montreal in 2014. He used GANs to generate realistic fake photos. Since then, variations of the core idea have been applied to audio and text.
How generative adversarial networks work
The basic idea behind GANs is a data scientist sets up a competing set of discriminative algorithms -- for example, models that detect an object in an image -- and generative algorithms for building simulations. As another example, a discriminative algorithm would try to decide whether a given email is spam based on the distribution of words in the message. Then, the generative algorithms would try to craft a spam message that would not be detected as a fake by the latest discriminative algorithm and use that message as a model for what spam might look like.
The main benefit of GANs is a neural network only interprets the world through the lens of its training data. But GANs develop a deeper understanding of the world by generating new data and learning through the process.
"This is a very profound difference, as it means the AI can generate previously unseen data that may be completely novel and unique, but agrees with the same type and class of data in the real world in terms of statistical properties," said Babak Rasolzadeh, vice president of applied AI at Clusterone, a company based in Palo Alto, Calif., which provides a distributed deep learning platform that runs in the cloud or on premises.
Rasolzadeh said he sees the most value for this approach in generating data associated with a particular form of input, such as automatically generating a cartoon to go with a script. GANs can also create high-resolution images from low-resolution ones.
It's still early for the commercial applications of GAN algorithms in AI applications. Rasolzadeh said he has only seen them successfully used in super-resolution image generation and text-to-image synthesis. And these have only been in academia.
The biggest challenge is getting GANs to generalize an overall data structure for a given input class. With images of flowers, the generative algorithm might output a jumble of the local and global features of a flower in the wrong place -- transposing petals and leaves, for example.
"This doesn't just apply to 2D and 3D image data, but to any type of input data structure that has complex variation of structure," Rasolzadeh said. "It's still hard for GANs to learn these types of implicit structures in object classes well." He said he expects services and platforms for machine learning could help reduce the complexities of rolling out GANs as better practices are codified for implementing them.
Generative adversarial networks variants for audio
Data scientists have started experimenting with variations on the GAN approach to help overcome some of the limitations of early GANs. For example, deep convolutional generative adversarial networks, or DCGANs, introduce a topological constraint and use an operation -- transposition -- that's the opposite of convolution, which is what GANs use. Also, they don't connect across multiple layers of neurons in a deep neural network in the same way GANs do. "DCGANs typically yield better results on image-generation type of problems," Rasolzadeh said.
Researchers at University of California, San Diego, have done some initial work on WaveGAN, which applies the basic ideas of GANs to generate realistic speech audio from a much smaller sample of real human speech. Lyrebird.ai has demonstrated an alternative approach for mimicking voices, like that of former President Barack Obama. Google has been demoing another approach, called WaveNet, which also shows promise in generating simulated voices.
Rasolzadeh said Lyrebird seems to be faster at generating voices than WaveNet, and Lyrebird claims to be language-agnostic, which WaveNet is not. WaveNet uses dilated convolutional neural networks that allow for training over thousands of timesteps of the raw audio input to map from text to speech during generation time. "GANs, however, generate samples faster than WaveNet, because you don't need to generate the samples sequentially," Rasolzadeh said.
Text generation lagging
The performance of generating realistic text is lagging behind other applications of algorithms in AI, like image generation, said Or Levi, founder and CEO of AdVerif.ai, an ad verification and fake news detection service. "When you compare GANs to other deep network text generation baselines, the performance gain is marginal, even in recent papers."
Levi has been experimenting with trying to teach GANs to generate jokes based on a data set containing existing jokes as a side project. "Even though it could generate well-formed sentences, a person could still notice, based on the semantics, that it's not human-generated, especially in texts with multiple sentences."
In the long run, Levi said he believes text generation will be a major breakthrough for GANs and could quickly be exploited for generating fake news -- a potentially troubling application of the technology.
"As it stands, human fact-checkers are only scratching the surface of fakes on the web, so you can imagine that having to deal with automatically generated texts will only strengthen the case for automated solutions," Levi said.