Page 1 of 1

High apache2 server load because of "semrush" bot

Posted: Mon Dec 19, 2016 3:35 pm
by gilthanaz
[Problem]
We've had an Apache2 server behave a bit weird lately, e.g. long loading times or capacity warnings. Checking the logs showed that a "semrush bot" was scanning all sites without any limits at maximum bandwidth. It appears to be some kind of marketing bot, and trash like that has to go. The bot also appears to often not respect the robots.txt file, thus we need a different method.

[Solution]
There are several approaches. Filtering by IP is not very useful as the bot may connect from any IP block that we're not aware of. Currently we're blocking it using .htaccess as can be found here: http://badbots.vps.tips/info/semrushbot

Code: Select all

# Bad bots filter code
# provided by http://badbots.vps.tips
SetEnvIfNoCase User-Agent "SemrushBot" bad_bots
<Limit GET POST HEAD>
	Order Allow,Deny
	Allow from all
	Deny from env=bad_bots
</Limit>
You'll still get a lot of log entries about the bot being blocked, but the capacity issues should be gone.