29

Is there any research on the development of attacks against artificial intelligence systems?

For example, is there a way to generate a letter "A", which every human being in this world can recognize but, if it is shown to the state-of-the-art character recognition system, this system will fail to recognize it? Or spoken audio which can be easily recognized by everyone but will fail on the state-of-the-art speech recognition system.

If there exists such a thing, is this technology a theory-based science (mathematics proved) or an experimental science (randomly add different types of noise and feed into the AI system and see how it works)? Where can I find such material?

DukeZhou
  • 6,209
  • 5
  • 27
  • 54

8 Answers8

29

Yes, there is some research on this topic, which can be called adversarial machine learning, which is more an experimental field.

An adversarial example is an input similar to the ones used to train the model, but that leads the model to produce an unexpected outcome. For example, consider an artificial neural network (ANN) trained to distinguish between oranges and apples. You are then given an image of an apple similar to another image used to train the ANN, but that is slightly blurred. Then you pass it to the ANN, which unexpectedly predicts the object to be an orange.

Several machine learning and optimization methods have been used to detect the boundary behaviour of machine learning models, that is, the unexpected behaviour of the model that produces different outcomes given two slightly different inputs (but that correspond to the same object). For example, evolutionary algorithms have been used to develop tests for self-driving cars. See, for example, Automatically testing self-driving cars with search-based procedural content generation (2019) by Alessio Gambi et al.

nbro
  • 42,615
  • 12
  • 119
  • 217
12

Sometimes if the rules used by an AI to identify characters are discovered, and if the rules used by a human being to identify the same characters are different, it is possible to design characters that are recognized by a human being but not recognized by an AI. However, if the human being and AI both use the same rules, they will recognize the same characters equally well.

A student I advised once trained a neural network to recognize a set of numerals, then used a genetic algorithm to alter the shapes and connectivity of the numerals so that a human could still recognize them but the neural network could not. Of course, if he had then re-trained the neural network using the expanded set of numerals, it probably would have been able to recognize the new ones.

S. McGrew
  • 373
  • 1
  • 8
11

Yes there are, for instance one pixel attacks described in

Su, J.; Vargas, D.V.; Kouichi, S. One pixel attack for fooling deep neural networks. arXiv:1710.08864

One pixels attacks are attacks in which changing one pixel in input image can strongly affect the results.

Ray
  • 344
  • 4
  • 12
internetofmine
  • 211
  • 1
  • 3
5

Here's an example:

In his recent book The Fall, Stephenson wrote about smartglasses that that project a pattern over the facial features to foil recognition algorithms (which seems not only feasible but likely;)

Here's an article from our sponsors, Adversarial AI: As New Attack Vector Opens, Researchers Aim to Defend Against It which includes this graphic of "Five ways AI hacks can lead to real world problems".

The article references the conference on The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation, where you can download the full report.

I'm assuming many such examples exist in the real world, and will amend this link-based answer as I find them. Good question!

DukeZhou
  • 6,209
  • 5
  • 27
  • 54
4

Isn't that essentially what chess does? For example, A human can recognize that a Ruy exchange offers white great winning chances (because of pawn structure) by move 4 while an engine would take several hours of brute force calculation to understand the same idea.

user30348
  • 41
  • 1
4

There are many insightful comments and answers so far. I want to illustrate my idea of "color blindness test" more. Maybe it's a hint to lead us to the truth.

Imagine there are two people here. One is colorblind (AI) and another one is non-colorblind (human). If we show them a normal number "6", both of them can easily recognize it as number 6. Now, if we show them a delicately designed colorful number "6", only human can recognize it as number 6 while AI will recognize it as number 8. The interesting of this analogy is that we can not teach/train colorblind people to recognize this delicately designed colorful number "6" because of natural difference, which I believe is also the case between AI and human. AI gets results from computation while human gets results from "mind". Therefore, like @S. McGrew's answer, if we can find the fundamental difference between AI and human of how we read things, then this question is answered.

3

Here's a live demo: https://www.labsix.org/physical-objects-that-fool-neural-nets/

Recall that neural nets are trained by feeding in the training data, evaluating the net, and using the error between the observed and the intended output to adjust the weights and bring the observed output closer to the intended. Most attacks have been on the observation that you can, instead of updating the weights, update the input neurons. That is, permute the image. However, this attack is very finnicky. It falls apart when the permuted image is scaled, rotated, blurred, or otherwise altered. That's clearly a cat to us, but guacamole to the neural net. But a slight rotation and the net starts classifying it correctly again.

However recent breakthroughs allow actual objects presented to a real camera to be reliably misclassified. That's clearly a turtle, albeit with a wonky pattern on its shell. But that net is convinced it's a rifle from practically every angle.

TomatoCo
  • 131
  • 1
2

There are some research at least on the "foolability" of neural networks, that gives insight on potential high risk of neural nets even when they "seem" 99.99% acurate.

A very good paper on this is in Nature: https://www.nature.com/articles/d41586-019-03013-5

In a nutshell:

It shows diverse exemples of fooling neural networks/AIs, for exemple one where a few bits of scotch tape places on a "Stop" sign changes it, for the neural net, into a "limited to 40" sign... (whereas a human would still see a "Stop" sign!).

And also 2 striking exemples of turning an animal into another by just adding invisible (for humans!) colored dots, (turning in the exemple a Panda into a Gibbon, where a human hardly see anything different so still sees a Panda).

Then they elaborate on diverse research venues, involving for exemple ways to try to prevent such attacks.

The whole page is a good read to any AI researcher and shows lots of troubling problems (especially for automated systems such as cars, and soon maybe armaments).


An exerpt relevant to the question:

Hendrycks and his colleagues have suggested quantifying a DNN’s robustness against making errors by testing how it performs against a large range of adversarial examples. However, training a network to withstand one kind of attack could weaken it against others, they say. And researchers led by Pushmeet Kohli at Google DeepMind in London are trying to inoculate DNNs against making mistakes. Many adversarial attacks work by making tiny tweaks to the component parts of an input — such as subtly altering the colour of pixels in an image — until this tips a DNN over into a misclassification. Kohli’s team has suggested that a robust DNN should not change its output as a result of small changes in its input, and that this property might be mathematically incorporated into the network, constraining how it learns.

For the moment, however, no one has a fix on the overall problem of brittle AIs. The root of the issue, says Bengio, is that DNNs don’t have a good model of how to pick out what matters. When an AI sees a doctored image of a lion as a library, a person still sees a lion because they have a mental model of the animal that rests on a set of high-level features — ears, a tail, a mane and so on — that lets them abstract away from low-level arbitrary or incidental details. “We know from prior experience which features are the salient ones,” says Bengio. “And that comes from a deep understanding of the structure of the world.”


Another excerpt, near the end:

"Researchers in the field say they are making progress in fixing deep learning’s flaws, but acknowledge that they’re still groping for new techniques to make the process less brittle. There is not much theory behind deep learning, says Song. “If something doesn’t work, it’s difficult to figure out why,” she says. “The whole field is still very empirical. You just have to try things.”"