2

In order to check, whether the visitor of the page is a human, and not an AI many web pages and applications have a checking procedure, known as CAPTCHA. These tasks are intended to be simple for people, but unsolvable for machines.

However, often some text recognition challenges are difficult, like discerning badly, overlapping digits, or telling whether the bus is on the captcha.

As far as I understand, so far, robustness against adversarial attacks is an unsolved problem. Moreover, adversarial perturbations are rather generalizable and transferrable to various architectures (according to https://youtu.be/CIfsB_EYsVI?t=3226). This phenomenon is relevant not only to DNN but for simpler linear models.

With the current state of affairs, it seems to be a good idea, to make CAPTCHAs from these adversarial examples, and the classification problem would be simple for human, without the need to make several attempts to pass this test, but hard for AI.

There is some research in this field and proposed solutions, but they seem not to be very popular.

Are there some other problems with this approach, or the owners of the websites (applications) prefer not to rely on this approach?

2 Answers2

2

I think the problem is that this type of attack will only work for the model that was used to produce the perturbations. These perturbations are computed by backpropagating an error for an image of, say, a panda, but with the true label "airplane".

In other words, perturbations are nothing more than gradients indicating in which direction each pixel needs to be changed to make the panda look like an airplane for that particular model. Since the same model will have different weights after each training, this attack will only work for the model used to generate the gradients.

Here is an illustrative example of this idea when training a generator in a GAN:

enter image description here

Update

While we can transfer an adversarial attack from one model to another, this is only possible under strict constraints. To successfully generate perturbations for the target model, we first need to know the dataset that was used to train it. We also need to know the architecture including the activation and loss functions as well as the hyperparameters of this model. Here is a work in which the authors take a closer look at this topic.

Even though it is possible, in my opinion, using CAPTCHAs does not make sense as these attacks may not work in the real world. For example, if we apply this attack to a road sign to trick the autopilot in a vehicle, the lighting conditions and camera perspective can significantly affect the classification.

Aray Karjauv
  • 987
  • 8
  • 15
0

Because the examples are fit to a particular ML model and if you train using different parameters they probably won't be valid.

FourierFlux
  • 847
  • 1
  • 7
  • 17