-2

How successful are the state-of-the-art (2023) email filters really?

Some references claim that spam detection may reach high accuracy in test settings, but I've thought that email filtering should essentially be as large problem as filtering a whole language. Because one cannot guarantee anything about strings in emails. There will simply be too many possible combinations of strings.

Spam filters might work, but I doubt they can ever reach 100% accurate labeling. And if they don't, they cannot useful, because even losing few important message can be intolerable.

mavavilj
  • 105
  • 2

2 Answers2

1

No, AI solves literally the problem of the "too many possible combinations"

Indeed, already nowadays most of the spam is managed by some AI system, combined with some rule based on the origin of emails and certifications of the domain, however, the main problem are adversarial attack

Adversarial attacks are special string that fools the AI to believe that an email is save even if it's clear for any human that it's not. Recently a paper has shown that there is even generalization on these adversarial examples over multiple AIs, so a single string might break multiple systems at once

(Gmail already uses AI to detect spams, and every time you mark an email as spam, Google uses that information to improve the AI system)

Alberto
  • 2,863
  • 5
  • 12
0

While "human-made labelling" may not achieve perfect accuracy, it's worth considering alternative approaches. One such option involves employing AI-driven labeling techniques, often referred to as self-labelling. This method involves training the model on data that it labels autonomously.

Furthermore, it's worth exploring the inclusion of supplementary information that models can leverage to enhance their predictions. This may encompass data points like dates, email senders, and other pertinent details. Such additional context can significantly contribute to the model's overall predictive capabilities.