adversarial machine learning

Contributor(s): Alexander Gillis

Adversarial machine learning is a technique used in machine learning to fool or misguide a model with malicious input. While adversarial machine learning can be used in a variety of applications, this technique is most commonly used to execute an attack or cause a malfunction in a machine learning system. The same instance of an attack can be changed easily to work on multiple models of different datasets or architectures.

Adversarial machine learning can be considered as either a white or black box attack. In a white box attack, the attacker knows the inner workings of the model being used and in a black box attack, the attacker only knows the outputs of the model.

Machine learning models are trained using large datasets pertaining to the subject being learned about. As an example, if an automotive company wanted to teach their automated car how to identify a stop sign,  then that company may feed thousands of pictures of stop signs through a machine learning algorithm. A malicious attack such as adversarial machine learning could be employed against that machine learning algorithm, exploiting the algorithms input data (in this case images of stop signs) to misinterpret that data, causing the overall system to then misidentify stop signs when deployed in either practice or production.

Types of adversarial machine learning attacks

Adversarial machine learning attacks can be classified as either misclassification inputs or data poisoning. Misclassification inputs are the more common variant, where attackers hide malicious content in the filters of a machine learning algorithm. The goal of this attack is for the system to misclassify a specific dataset. Backdoor Trojan attacks can be used to do this after a systems deployment.

Data poisoning is when an attacker attempts to modify the machine learning process by placing inaccurate data into a dataset, making the outputs less accurate. The goal of this type of attack is to compromise the machine learning process and to minimize the algorithm’s usefulness.

Defenses against adversarial machine learning

Currently, there is not a concrete way for defending against adversarial machine learning; however, there are a few techniques which can help prevent an attack of this type from happening. Such techniques include adversarial training, defensive distillation.

Adversarial training is a process where examples adversarial instances are introduced to the model and labeled as threatening. This process can be useful in preventing further adversarial machine learning attacks from occurring, but require large amounts of maintenance.

Defensive distillation aims to make a machine learning algorithm more flexible by having one model predict the outputs of another model which was trained earlier.  This approach can identify unknown threats. It is similar in thought to generative adversarial networks (GAN), which sets up two neural networks together to speed up machine learning processes—in the idea that two machine learning models are used together.  

This was last updated in July 2019

Continue Reading About adversarial machine learning

Dig Deeper on AI ethics issues