I often have to add new rules to the apache-badbots.conf file, and every time I have the doubt that it no longer works...
For example, this is my current apache-badbots.conf file:
[Definition]
badbotscustom = MQQBrowser|LieBaoFast|Mb2345Browser|zh-CN|python-requests|LinkpadBot|MegaIndex|Buck|SemrushBot|SeznamBot|JobboerseBot|AhrefsBot|AhrefsBot/6.1|MJ12bot|info@domaincrawler.com|SemrushBot/6~bl|cortex|Cliqzbot|Baiduspider|serpstatbot|Go 1.1 package http|Python-urllib|StormCrawler|archive.org_bot|CCBot|BLEXBot|ltx71|DotBot|EmailCollector|WebEMailExtrac|Track$
badbots = Atomic_Email_Hunter/4\.0|atSpider/1\.0|autoemailspider|bwh3_user_agent|China Local Browse 2\.6|ContactBot/0\.2|ContentSmartz|DataCha0s/2\.0|DBrowse 1\.4b|DBrow$
#failregex = ^<HOST> -.*"(GET|POST|HEAD).*HTTP.*"(?:%(badbots)s|%(badbotscustom)s)"$
failregex = ^<HOST> -.*"(GET|POST).*HTTP.*".*(?:%(badbots)s|%(badbotscustom)s).*"$
ignoreregex =
datepattern = ^[^\[]*\[({DATE})
{^LN-BEG}
Yesterday I added "MQQBrowser | LieBaoFast | Mb2345Browser | zh-CN" and today I see a lot of MQQBrowser and LieBaoFast in my access logs.
sudo cat /var/log/apache2/access.log | awk -F\" '{print $6}' | sort | uniq -c | sort -n
...
3408 Mozilla/5.0(Linux;U;Android 5.1.1;zh-CN;OPPO A33 Build/LMY47V) AppleWebKit/537.36(KHTML,like Gecko) Version/4.0 Chrome/40.0.2214.89 UCBrowser/11.7.0.953 Mobile Safari/537.36
3418 Mozilla/5.0(Linux;Android 5.1.1;OPPO A33 Build/LMY47V;wv) AppleWebKit/537.36(KHTML,link Gecko) Version/4.0 Chrome/42.0.2311.138 Mobile Safari/537.36 Mb2345Browser/9.0
3444 Mozilla/5.0 (Linux; Android 7.0; FRD-AL00 Build/HUAWEIFRD-AL00; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.49 Mobile MQQBrowser/6.2 TBS/043602 Safari/537.36 MicroMessenger/6.5.16.1120 NetType/WIFI Language/zh_CN
3473 Mozilla/5.0(Linux;Android 5.1.1;OPPO A33 Build/LMY47V;wv) AppleWebKit/537.36(KHTML,link Gecko) Version/4.0 Chrome/43.0.2357.121 Mobile Safari/537.36 LieBaoFast/4.51.3
What's wrong? It's working? Is there a way to tell if there's an error and what is the error?
I update because it still does not understand something, for example today I found other bots that should be banned in my logs.
Just to understand, this filter looks for the string I add in apache-badbots.conf in the server's access .log, and if it finds it adds a rule to fail2ban, right?
- So for example, is there a difference if I write "netEstate NE Crawler" or just "netEstate"?
- Why this string "
atSpider/1\.0" have all these slashes? - All the "." must be preceded by a slash? (
China Local Browse 2\.6|DataCha0s/2\.0|DBrowse 1\.4b) - Can an email be used as a string? (ex: info@domaincrawler.com)
- Strings with spaces like "
Go 1.1 package http" are correct or generate an error? - Can the "-" character be used? (ex: python-requests, Python-urllib)
- Can the "_" character be used? (ex: archive.org_bot)