7

How can I create a filter to block these with fail2ban?

    476 Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
    892 ltx71 - (http://ltx71.com/)
    5367 Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)
   6449 Barkrowler/0.9 (+http://www.exensa.com/crawl)

This list come out from this:

sudo cat /var/log/apache2/access.log | awk -F\" '{print $6}' | sort | uniq -c | sort -n

I've tried apache-badbot.conf, but it does not seem to work ...

alebal
  • 473

1 Answers1

9

The correct way to deal with annoying bots is to block them in "robots.txt". But your comments indicate they're ignoring that directive. Blocking by user-agent will ultimately be a cat and mouse game, but if you want to do it you want the following.

So, you need to enable the apache-badbots jail that reads the Apache access log if you haven't already. Create the file /etc/fail2ban/jail.d/apache-badbots.local with the contents:

[apache-badbots]
enabled = true

The main portion of the apache-badbots jail is defined in /etc/fail2ban/jail.conf so all you have to do is enable it.

Next, modify the apache-badbots filter to include your bots. Edit /etc/fail2ban/filter.d/apache-badbots.conf. In it there is a particular line for custom bots:

badbotscustom = EmailCollector|WebEMailExtrac|TrackBack/1\.02|sogou music spider

The bots are specified using a regular expression. Either replace those or tack yours on the end separated with |s.

badbotscustom = EmailCollector|WebEMailExtrac|TrackBack/1\.02|sogou music spider|BLEXBot|ltx71|DotBot|Barkrowler
# OR
badbotscustom = BLEXBot|ltx71|DotBot|Barkrowler

Next, you'll want to modify the failregex line so that the regular expression matches any part of the user agent, not just the whole thing. Change the line:

failregex = ^<HOST> -.*"(GET|POST).*HTTP.*"(?:%(badbots)s|%(badbotscustom)s)"$

to (note the two additional .*):

failregex = ^<HOST> -.*"(GET|POST).*HTTP.*".*(?:%(badbots)s|%(badbotscustom)s).*"$

Finally, reload the fail2ban configurations.

sudo fail2ban-client reload

This information may be helpful for reference.

Looking at /etc/fail2ban/filter.d/apache-badbots.conf on an update to date Ubuntu 16.04 server I have, it looks outdated. In particular there's this comment:

# DEV Notes:
# List of bad bots fetched from http://www.user-agents.org
# Generated on Thu Nov  7 14:23:35 PST 2013 by files/gen_badbots.

I generated a new one from the fail2ban git repository, but it still didn't include those bots (maybe the source is outdated or incomplete). If you're curious, you can generate a new one with following.

git clone https://github.com/fail2ban/fail2ban
cd fail2ban/
./files/gen_badbots

The new file will be available at config/filter.d/apache-badbots.conf (here on Github). If you want to use it replace /etc/fail2ban/filter.d/apache-badbots.conf with it.

For reference, this is the definition of apache-badbots from /etc/fail2ban/jail.conf.

[apache-badbots]
# Ban hosts which agent identifies spammer robots crawling the web
# for email addresses. The mail outputs are buffered.
port     = http,https
logpath  = %(apache_access_log)s
bantime  = 172800
maxretry = 1

The %(apache_access_log)s variable comes from /etc/fail2ban/paths-debian.conf and is defined as /var/log/apache2/*access.log.

For reference, here is the apache-badbots.conf that I generated (without modifications).

# Fail2Ban configuration file
#
# Regexp to catch known spambots and software alike. Please verify
# that it is your intent to block IPs which were driven by
# above mentioned bots.

[Definition]

badbotscustom = EmailCollector|WebEMailExtrac|TrackBack/1.02|sogou music spider badbots = Atomic_Email_Hunter/4.0|atSpider/1.0|autoemailspider|bwh3_user_agent|China Local Browse 2.6|ContactBot/0.2|ContentSmartz|DataCha0s/2.0|DBrowse 1.4b|DBrowse 1.4d|Demo Bot DOT 16b|Demo Bot Z 16b|DSurf15a 01|DSurf15a 71|DSurf15a 81|DSurf15a VA|EBrowse 1.4b|Educate Search VxB|EmailSiphon|EmailSpider|EmailWolf 1.00|ESurf15a 15|ExtractorPro|Franklin Locator 1.8|FSurf15a 01|Full Web Bot 0416B|Full Web Bot 0516B|Full Web Bot 2816B|8484 Boston Project v 1.0|Atomic_Email_Hunter/4.0|atSpider/1.0|autoemailspider|bwh3_user_agent|China Local Browse 2.6|ContactBot/0.2|ContentSmartz|DataCha0s/2.0|DBrowse 1.4b|DBrowse 1.4d|Demo Bot DOT 16b|Demo Bot Z 16b|DSurf15a 01|DSurf15a 71|DSurf15a 81|DSurf15a VA|EBrowse 1.4b|Educate Search VxB|EmailSiphon|EmailSpider|EmailWolf 1.00|ESurf15a 15|ExtractorPro|Franklin Locator 1.8|FSurf15a 01|Full Web Bot 0416B|Full Web Bot 0516B|Full Web Bot 2816B|8484 Boston Project v 1.0|Atomic_Email_Hunter/4.0|atSpider/1.0|autoemailspider|bwh3_user_agent|China Local Browse 2.6|ContactBot/0.2|ContentSmartz|DataCha0s/2.0|DBrowse 1.4b|DBrowse 1.4d|Demo Bot DOT 16b|Demo Bot Z 16b|DSurf15a 01|DSurf15a 71|DSurf15a 81|DSurf15a VA|EBrowse 1.4b|Educate Search VxB|EmailSiphon|EmailSpider|EmailWolf 1.00|ESurf15a 15|ExtractorPro|Franklin Locator 1.8|FSurf15a 01|Full Web Bot 0416B|Full Web Bot 0516B|Full Web Bot 2816B|8484 Boston Project v 1.0|Atomic_Email_Hunter/4.0|atSpider/1.0|autoemailspider|bwh3_user_agent|China Local Browse 2.6|ContactBot/0.2|ContentSmartz|DataCha0s/2.0|DBrowse 1.4b|DBrowse 1.4d|Demo Bot DOT 16b|Demo Bot Z 16b|DSurf15a 01|DSurf15a 71|DSurf15a 81|DSurf15a VA|EBrowse 1.4b|Educate Search VxB|EmailSiphon|EmailSpider|EmailWolf 1.00|ESurf15a 15|ExtractorPro|Franklin Locator 1.8|FSurf15a 01|Full Web Bot 0416B|Full Web Bot 0516B|Full Web Bot 2816B|8484 Boston Project v 1.0|Atomic_Email_Hunter/4.0|atSpider/1.0|autoemailspider|bwh3_user_agent|China Local Browse 2.6|ContactBot/0.2|ContentSmartz|DataCha0s/2.0|DBrowse 1.4b|DBrowse 1.4d|Demo Bot DOT 16b|Demo Bot Z 16b|DSurf15a 01|DSurf15a 71|DSurf15a 81|DSurf15a VA|EBrowse 1.4b|Educate Search VxB|EmailSiphon|EmailSpider|EmailWolf 1.00|ESurf15a 15|ExtractorPro|Franklin Locator 1.8|FSurf15a 01|Full Web Bot 0416B|Full Web Bot 0516B|Full Web Bot 2816B

failregex = ^<HOST> -."(GET|POST).HTTP.*"(?:%(badbots)s|%(badbotscustom)s)"$

ignoreregex =

DEV Notes:

List of bad bots fetched from http://www.user-agents.org

Generated on Sat Feb 9 12:59:57 EST 2019 by ./files/gen_badbots.

Author: Yaroslav Halchenko

Pablo Bianchi
  • 17,371
ohmu
  • 731