High apache2 server load because of "semrush" bot

Postby gilthanaz » Mon Dec 19, 2016 3:35 pm

We've had an Apache2 server behave a bit weird lately, e.g. long loading times or capacity warnings. Checking the logs showed that a "semrush bot" was scanning all sites without any limits at maximum bandwidth. It appears to be some kind of marketing bot, and trash like that has to go. The bot also appears to often not respect the robots.txt file, thus we need a different method.

There are several approaches. Filtering by IP is not very useful as the bot may connect from any IP block that we're not aware of. Currently we're blocking it using .htaccess as can be found here: http://badbots.vps.tips/info/semrushbot

Code: Select all
# Bad bots filter code
# provided by http://badbots.vps.tips
SetEnvIfNoCase User-Agent "SemrushBot" bad_bots
   Order Allow,Deny
   Allow from all
   Deny from env=bad_bots

You'll still get a lot of log entries about the bot being blocked, but the capacity issues should be gone.
